The new block needs to be just above the "ThreadsListHandle tlh;"
line in order to preserve all of the existing checks...

More below...


On 7/16/18 2:22 PM, Hohensee, Paul wrote:

I believe you could move the code ahead of the call to validate_thread_id_array() because that method just checks for thread ids <= 0.

*diff -r 3ddf41505d54 src/hotspot/share/services/management.cpp*

*--- a/src/hotspot/share/services/management.cpp Sun Jun 03 23:33:00 2018 -0700*

*+++ b/src/hotspot/share/services/management.cpp Mon Jul 16 10:41:28 2018 -0700*

@@ -2084,11 +2083,19 @@

typeArrayOop sa = typeArrayOop(JNIHandles::resolve_non_null(sizeArray));

typeArrayHandle sizeArray_h(THREAD, sa);

+// Special-case current thread


The next line uses ids_ah, but validate_threads_id_array() has
not been called yet so you don't know whether ids_ah is valid
yet.

+int num_threads = ids_ah->length();


The original code that I posted used the existing THREADS
variable rather than a call to JavaThread::current() which
can be expensive.

+JavaThread* java_thread = JavaThread::current();

+if (num_threads == 1 && sizeArray_h->length() == 1 &&


The next line uses ids_ah, but validate_threads_id_array() has
not been called yetso you don't know whether ids_ah is valid
yet.


+ids_ah->long_at(0) == java_lang_Thread::thread_id(java_thread->threadObj())) {

+sizeArray_h->long_at_put(0, java_thread->cooked_allocated_bytes());

+return;

+}

+

// validate the thread id array

validate_thread_id_array(ids_ah, CHECK);

// sizeArray must be of the same length as the given array of thread IDs


It's not safe to move the next line before validate_thread_id_array().


-int num_threads = ids_ah->length();

if (num_threads != sizeArray_h->length()) {

THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),

"The length of the given long array does not match the length of "

If performance is good enough, and if you still want to add getCurrentThreadAllocatedBytes() (imo a good idea, since getCurrentThreadCpuTime() and getCurrentThreadUserTime() already exist), you could implement it by “getThreadAllocatedBytes(Thread::currentThread().getId())”. You might want also want to add getCurrentThread* methods to com.sun.management where they don’t currently exist: then we’d have a complete parallel method set.

Another approach to improving things is to fix the underlying problem with find_JavaThread_from_java_tid(). https://bugs.openjdk.java.net/browse/JDK-8185005 proposes doing that in a different context. We came up with a patch for JDK8 that uses an open addressed hashtable (one where the “bucket chain” is in the index array, see https://en.wikipedia.org/wiki/Hash_table#Open_addressing <https://en.wikipedia.org/wiki/Hash_table#Open_addressing>) to map Java tids to JavaThread*s. I’ve forward ported it to JDK12, see http://cr.openjdk.java.net/~phh/8185005/webrev.00/ <http://cr.openjdk.java.net/%7Ephh/8185005/webrev.00/>. The main disadvantage, of course, is that it’s yet another data structure that takes up memory. It’s really fast though and speeds up our profilers quite a bit. Perhaps we could replace the existing thread list with a variation on this map, since it’s quick to just run through the underlying array when you want to run through the threads.


Hmmm... That bug got closed as will-not-fix. I'm not sure why
the triage team decided that.

Dan


Thanks,

Paul

*From: *serviceability-dev <[email protected]> on behalf of "Daniel D. Daugherty" <[email protected]>
*Reply-To: *"[email protected]" <[email protected]>
*Date: *Friday, July 13, 2018 at 1:53 PM
*To: *Markus Gaisbauer <[email protected]>, "[email protected]" <[email protected]>, Erik Österlund <[email protected]>, Robbin Ehn <[email protected]>
*Subject: *Re: ThreadMXBean::getCurrentThreadAllocatedBytes

Markus,

I filed the following bug for you:

    JDK-8207266 ThreadMXBean::getThreadAllocatedBytes() can be quicker for self thread
https://bugs.openjdk.java.net/browse/JDK-8207266

Dan

On 7/13/18 4:46 PM, Daniel D. Daugherty wrote:

    On 7/13/18 2:44 PM, Daniel D. Daugherty wrote:

        On 7/13/18 12:35 PM, Markus Gaisbauer wrote:

            Hello,

            I am trying to use ThreadMXBean::getThreadAllocatedBytes
            (com.sun.management) to get the amount of allocated memory
            of the current thread in some performance critical code.

            Unfortunately, the current implementation can be rather
            slow and the duration of each call unpredictable. I ran a
            test in a JVM with 500 threads. Depending on which thread
            was queried, getThreadAllocatedBytes took between 100 ns
            and 2500 ns.

            The root cause of the problem is
            ThreadsList::find_JavaThread_from_java_tid which performs
            a linear scan through all Java threads in the current
            process. The more threads a JVM has, the slower it gets.
            In the worst case, the thread with the given TID is found
            as the last entry in the list.

            Before Java 10, the oldest thread is the slowest one to query.

            Since Java 10, the youngest thread is the slowest one to
            query. I think this was a side effect of introducing
            "Thread Safe Memory Reclamation (Thread-SMR) support".

                   Oldest Thread   Youngest Thread

            Java 8             8740 ns             76 ns

            Java 10             109 ns           2485 ns


        It is good to see that longest search is much faster. Erik and
        Robbin
        will be pleased since speeding up traversal of the ThreadsList
        was one
        of the things that we tried to do during the Thread-SMR project.

        A first step is get a new bug filed that documents the issue with
        ThreadMXBean::getThreadAllocatedBytes(). Perhaps Gary or Serguei
        will take care of that.

        Dan



            A common use case is to query the metric for the current
            thread (e.g. before and after performing some operation).
            This case can be optimized by introducing a new method:
            getCurrentThreadAllocatedBytes.

            I created a patch for http://hg.openjdk.java.net/jdk/jdk/
            <http://hg.openjdk.java.net/jdk/jdk/> and by using the new
            method I saw the following improvements in my test:

                   Oldest Thread   Youngest Thread

            Proposal            37 ns             37 ns

            This is a 60x improvement over the worst case of the
            current API. In the best case of the current API, the new
            method is still 3 times faster.

            // based on JVM_SetNativeThreadName in jvm.cpp.

            JVM_ENTRY(jlong,
            jmm_GetCurrentThreadAllocatedMemory(JNIEnv *env, jobject
            currentThread))

              // We don't use a ThreadsListHandle here because the
            current thread

              // must be alive.

              oop java_thread =
            JNIHandles::resolve_non_null(currentThread);

              JavaThread* thr = java_lang_Thread::thread(java_thread);

              if (thread == thr) {

                // only supported for the current thread

                return thr->cooked_allocated_bytes();

              }

              return -1;

            JVM_END

            The proposed method also fixes the problem, that
            getThreadAllocatedBytes itself allocates some memory on
            the current thread (two long arrays, 24 bytes) and
            therefore can slightly skew measurements. The new
            method, getCurrentThreadAllocatedBytes, returns exactly
            the same value if it is called twice without allocating
            any memory between those calls.

            I also built a variation of this method that could be used
            to query allocated memory more efficiently for anyone who
            already has a java.lang.Thread object:

            JVM_ENTRY(jlong, jmm_GetThreadAllocatedMemory(JNIEnv *env,
            jobject threadObj))

              // based on code proposed in threadSMR.hpp

              ThreadsListHandle tlh;

              JavaThread* thr = NULL;

              bool is_alive =
            tlh.cv_internal_thread_to_JavaThread(threadObj, &thr, NULL);

              if (is_alive) {

                return thr->cooked_allocated_bytes();

              }

              return -1;

            JVM_END

            This method took 70 ns in my test, which is 85% slower
            than GetCurrentThreadAllocatedMemory but still 30% faster
            than the best case of the current API. I currently have no
            immediate need for this second method, but I think it
            would also be a valueable addition to the API.

            I attached a patch for getCurrentThreadAllocatedBytes. I
            can create a second patch for also adding
            getThreadAllocatedMemory(java.lang.Thread) to the API.

            I am a first time contributor and I am not 100% sure what
            process I must follow to get a change like this into
            OpenJDK. Can someone have a look at my proposal and help
            me through the process?

            Best regards,

            Markus


    I believe this is the code that's causing you grief:

    open/src/hotspot/share/services/management.cpp:

    // Gets an array containing the amount of memory allocated on the Java
    // heap for a set of threads (in bytes).  Each element of the array is
    // the amount of memory allocated for the thread ID specified in the
    // corresponding entry in the given array of thread IDs; or -1 if the
    // thread does not exist or has terminated.
    JVM_ENTRY(void, jmm_GetThreadAllocatedMemory(JNIEnv *env,
    jlongArray ids,
    jlongArray sizeArray))
      // Check if threads is null
      if (ids == NULL || sizeArray == NULL) {
    THROW(vmSymbols::java_lang_NullPointerException());
      }

      ResourceMark rm(THREAD);
      typeArrayOop ta = typeArrayOop(JNIHandles::resolve_non_null(ids));
      typeArrayHandle ids_ah(THREAD, ta);

      typeArrayOop sa =
    typeArrayOop(JNIHandles::resolve_non_null(sizeArray));
      typeArrayHandle sizeArray_h(THREAD, sa);

      // validate the thread id array
      validate_thread_id_array(ids_ah, CHECK);

      // sizeArray must be of the same length as the given array of
    thread IDs
      int num_threads = ids_ah->length();
      if (num_threads != sizeArray_h->length()) {
    THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),
                  "The length of the given long array does not match
    the length of "
                  "the given array of thread IDs");
      }

      ThreadsListHandle tlh;
      for (int i = 0; i < num_threads; i++) {
        JavaThread* java_thread =
    tlh.list()->find_JavaThread_from_java_tid(ids_ah->long_at(i));
        if (java_thread != NULL) {
          sizeArray_h->long_at_put(i,
    java_thread->cooked_allocated_bytes());
        }
      }
    JVM_END


    Perhaps something like this above the "ThreadsListHandle tlh;" line:

      if (num_threads == 1 && THREAD->is_Java_thread()) {
        // Only asking for 1 thread so if we're a JavaThread, then
        // see if this request is for ourself.
        JavaThread* jt = THREAD;
        oop tobj = jt->threadObj();

        if (ids_ah->long_at(0) == java_lang_Thread::thread_id(tobj)) {
          // Return the info for ourself.
          sizeArray_h->long_at_put(0, jt->cooked_allocated_bytes());
          return;
        }
      }

    I haven't checked to see if this will even compile, but I
    think you'll get the idea.

    Dan




Reply via email to