Re: RFR: 8007806: Need a Throwables performance counter
On 02/26/2013 12:18 AM, Mandy Chung wrote: On 2/25/2013 8:30 AM, Nils Loodin wrote: I changed it to sun.throwables.numThrowables in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/ Is this better? However, bear in mind that it's not exactly specified where this is going to be incremented from yet. Nothing in this change states that it's going to be incremented from the Throwable constructor. I'm surprised to see adding a performance counter for an external instrumentation agent to use. I understand that we agree that dynamic bytecode instrumentation is a more appropriate approach to instrument and count the number of thrown throwables. Just to be clear, what I was trying to say that this review is just to add the counter, and the discussion on how and when to access it is something that I'd much rather have in context of a review of that code... But yes, we do agree that dynamic bytecode instrumentation is a more appropriate aproach to count the number of throwables. But that code and the following review would depend on this one. Regards, Nils Loodin
RE: RFR: 8007806: Need a Throwables performance counter
Just to be clear, what I was trying to say that this review is just to add the counter, and the discussion on how and when to access it is something that I'd much rather have in context of a review of that code... Then the counter name should not bind you to how it is counting. +1 for 'sun.throwables' as suggested by Peter. Jason
Re: RFR: 8007806: Need a Throwables performance counter
On 02/25/2013 12:03 AM, David Holmes wrote: I'm getting very confused regarding the different threads on this. But this webrev: http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/ shows the intrusive event tracing that I'm sure we said we would grudgingly accept for 7u but not for JDK 8. David et al: No! That showed the intrusive event tracing that we will not implement ever. It also has nothing to do with this code review, which is only to implement a performance counter. Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
On 02/24/2013 11:18 PM, David Holmes wrote: We've not-so-slightly hijacked Nils' thread here - apologies for that. David, Peter! Yes you did :) However, feel free to make it up to me by: 1. Suggest a good name for the counter 2. Plz to say yes to adding that counter 3. Profit! Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
On 24/02/2013 20:57, David Holmes wrote: Does jstat access these values directly or only via the synchronized methods? If the latter then the value can't be torn that way. The sync method will store the value in 2 32-bit registers, and the variable load in jstat will take two instructions, but nothing can touch those registers. The instrumentation/jvmstat buffer is a shared memory file so jstat (running in a different VM) just polls the counters so there isn't any Java-level synchronization. -Alan.
Re: RFR: 8007806: Need a Throwables performance counter
On 25/02/2013 09:23, Nils Loodin wrote: David et al: No! That showed the intrusive event tracing that we will not implement ever. It also has nothing to do with this code review, which is only to implement a performance counter. Just to confirm, this is the webrev that we should be reviewing: http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ and the only issue (I think) is the name of the counter where I think sun.* or jdk.* should be fine. -Alan.
Re: RFR: 8007806: Need a Throwables performance counter
On 02/25/2013 11:15 AM, Alan Bateman wrote: On 25/02/2013 09:23, Nils Loodin wrote: David et al: No! That showed the intrusive event tracing that we will not implement ever. It also has nothing to do with this code review, which is only to implement a performance counter. Just to confirm, this is the webrev that we should be reviewing: http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ and the only issue (I think) is the name of the counter where I think sun.* or jdk.* should be fine. -Alan. Does this mean that you approve of http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/ ? Regards, Nisse
Re: RFR: 8007806: Need a Throwables performance counter
On 02/25/2013 10:29 AM, Nils Loodin wrote: On 02/24/2013 11:18 PM, David Holmes wrote: We've not-so-slightly hijacked Nils' thread here - apologies for that. David, Peter! Yes you did :) However, feel free to make it up to me by: 1. Suggest a good name for the counter As Jason Mehrens pointed out in this thread, the thrownThrowables might be misleading if the counter is incremented in the Throwable constructors (or are you going to instrument the throw sites?). So it might better be called what it is: constructedThrowables. Yes, this is a common idiom: throw new ThrowableSubclass(...); ...but the number of thrown and constructed throwables may diverge considerably in situations like for example: - preallocated throwable instances used (and reused) as a form of long return (from lambdas). - deserialized throwables on RMI clients that are thrown on the server side, transfered over the wire and re-thrown on client side (de-seriailization does not call the constructor) 2. Plz to say yes to adding that counter It's not my call but I would be concerned about performance. Current set of counters that are updated from Java is not that critical, but having a counter of constructed objects updated via synchronized method (or a CAS) can have a negative impact on application performance. See what happened with HashMap constructor. The same could happen in situations where Throwable subclass is (miss)used for example to long return a result from... say a lambda (again). This is perfectly performant and scalable now (just override the Throwable.fillInStackTrace with empty implementation and you get a long return(value)). I'm not saying that this is a good practice, but such usages will emerge with lambdas. So although this thread is only about counter name, I think care should be put also on it's implementation. Perhaps with an option to disable instrumentation if requested from user? On a per-Throwable-subclass basis (like Logger levels)? Using external configuration (system properties) or special @DontTrace annotation? 3. Profit! :-) Regards, Peter Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
On 02/25/2013 02:30 PM, Peter Levart wrote: On 02/25/2013 10:29 AM, Nils Loodin wrote: On 02/24/2013 11:18 PM, David Holmes wrote: We've not-so-slightly hijacked Nils' thread here - apologies for that. David, Peter! Yes you did :) However, feel free to make it up to me by: 1. Suggest a good name for the counter As Jason Mehrens pointed out in this thread, the thrownThrowables might be misleading if the counter is incremented in the Throwable constructors (or are you going to instrument the throw sites?). So it might better be called what it is: constructedThrowables. Yes, this is a common idiom: throw new ThrowableSubclass(...); ...but the number of thrown and constructed throwables may diverge considerably in situations like for example: - preallocated throwable instances used (and reused) as a form of long return (from lambdas). - deserialized throwables on RMI clients that are thrown on the server side, transfered over the wire and re-thrown on client side (de-seriailization does not call the constructor) I changed it to sun.throwables.numThrowables in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/ Is this better? However, bear in mind that it's not exactly specified where this is going to be incremented from yet. Nothing in this change states that it's going to be incremented from the Throwable constructor. Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
On 02/25/2013 05:30 PM, Nils Loodin wrote: On 02/25/2013 02:30 PM, Peter Levart wrote: On 02/25/2013 10:29 AM, Nils Loodin wrote: On 02/24/2013 11:18 PM, David Holmes wrote: We've not-so-slightly hijacked Nils' thread here - apologies for that. David, Peter! Yes you did :) However, feel free to make it up to me by: 1. Suggest a good name for the counter As Jason Mehrens pointed out in this thread, the thrownThrowables might be misleading if the counter is incremented in the Throwable constructors (or are you going to instrument the throw sites?). So it might better be called what it is: constructedThrowables. Yes, this is a common idiom: throw new ThrowableSubclass(...); ...but the number of thrown and constructed throwables may diverge considerably in situations like for example: - preallocated throwable instances used (and reused) as a form of long return (from lambdas). - deserialized throwables on RMI clients that are thrown on the server side, transfered over the wire and re-thrown on client side (de-seriailization does not call the constructor) I changed it to sun.throwables.numThrowables in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/ Is this better? However, bear in mind that it's not exactly specified where this is going to be incremented from yet. Nothing in this change states that it's going to be incremented from the Throwable constructor. These are just suggestions... thinking loud ... Other hierarchical names for counters use two common forms (note plural/singular): sun.classloader.findClassTime sun.classloader.findClasses sun.classloader.parentDelegationTime or: java.threads.daemon java.threads.live java.threads.livePeak java.threads.started So by these forms, the name could be: sun.throwable.numThrowables or: sun.throwables.num[ber] might be later changed to (if instrumented to increment at construction time and/or throw time): sun.throwables.constructed sun.throwables.thrown until then, it could be simply: sun.throwables Regards, Peter Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
On 2/25/2013 8:30 AM, Nils Loodin wrote: I changed it to sun.throwables.numThrowables in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/ Is this better? However, bear in mind that it's not exactly specified where this is going to be incremented from yet. Nothing in this change states that it's going to be incremented from the Throwable constructor. I'm surprised to see adding a performance counter for an external instrumentation agent to use. I understand that we agree that dynamic bytecode instrumentation is a more appropriate approach to instrument and count the number of thrown throwables.I guess I am missing the big picture how you want the serviceability tools (jstat etc) to work with the perf counters and tracing events. Mandy
Re: RFR: 8007806: Need a Throwables performance counter
Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ Does anybody know the in-depth interpretation of the above? Is it only the particular Java/native type differences (for example, endianess of variables) that these two addressing modes might interpret differently or something else too? Regards, Peter On 02/24/2013 12:39 AM, David Holmes wrote: Peter, In your use of Unsafe you pass null as the object. I'm pretty certain you can't pass null here. Unsafe operates on fields or array elements. David On 24/02/2013 5:39 AM, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Index: jdk/src/share/classes/sun/misc/PerfCounter.java === --- jdk/src/share/classes/sun/misc/PerfCounter.java +++ jdk/src/share/classes/sun/misc/PerfCounter.java @@ -25,6 +25,8 @@ package sun.misc; +import sun.nio.ch.DirectBuffer; + import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.LongBuffer; @@ -50,6 +52,8 @@ public class PerfCounter { private static final Perf perf = AccessController.doPrivileged(new Perf.GetPerfAction()); +private static final Unsafe unsafe = +Unsafe.getUnsafe(); // Must match values defined in hotspot/src/share/vm/runtime/perfdata.hpp private final static int V_Constant = 1; @@ -59,12 +63,14 @@ private final String name; private final LongBuffer lb; +private final DirectBuffer db; private PerfCounter(String name, int type) { this.name = name; ByteBuffer bb = perf.createLong(name, U_None, type, 0L); bb.order(ByteOrder.nativeOrder()); this.lb = bb.asLongBuffer(); +this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null; } static PerfCounter newPerfCounter(String name) { @@ -79,23 +85,44 @@ /** * Returns the current value of the perf counter. */ -public synchronized long get() { +public long get() { +if (db != null) { +return unsafe.getLongVolatile(null, db.address()); +} +else { +synchronized (this) { -return lb.get(0); -} +return lb.get(0); +} +} +} /** * Sets the value of the perf counter to the given newValue. */ -public synchronized void set(long newValue) { +public void set(long newValue) { +if (db != null) { +unsafe.putOrderedLong(null, db.address(), newValue); +} +else { +synchronized (this) { -lb.put(0, newValue); -} +lb.put(0, newValue); +} +} +} /** * Adds the given value to the perf counter. */ -public synchronized void add(long value) { -long res = get() + value; +public void add(long value) { +if (db != null) { +unsafe.getAndAddLong(null, db.address(), value); +} +else { +synchronized (this) { +long res = lb.get(0) + value; -lb.put(0, res); +lb.put(0, res); +} +} } /** Testing the PerfCounter.increment() method in a loop on multiple threads sharing the same PerfCounter instance, for example, on a 4-core Intel i7 machine produces the following results: # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 8 # 1 threads, Tavg = 19.02 ns/op (? = 0.00 ns/op) 2 threads, Tavg =109.93 ns/op (? = 6.17 ns/op) 3 threads, Tavg =136.64 ns/op (? = 2.99 ns/op) 4 threads, Tavg =293.26 ns/op (? = 5.30 ns/op) 5 threads, Tavg =316.94 ns/op (? = 6.28 ns/op) 6 threads, Tavg =686.96 ns/op (? = 7.09 ns/op) 7 threads, Tavg =793.28 ns/op (? = 10.57 ns/op) 8 threads, Tavg =898.15 ns/op (? = 14.63 ns/op)
Re: RFR: 8007806: Need a Throwables performance counter
On 23/02/2013 23:35, David Holmes wrote: Before we rush down this path. Atomic operations on 64-bit types are not supported natively on all 32-bit platforms. Atomic updates on those platforms will degenerate into locking - which is the barrier to implementing frequently used counters in the first place. I understand, and clearly more work/investigation is required, but I think it's worth looking into, particularly if Nils and others are going are adding further counters and instrumenting code dynamic to use them. -Alan
Re: RFR: 8007806: Need a Throwables performance counter
On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset)); if (VM_Version::supports_cx8()) return (jlong)(Atomic::cmpxchg(x, addr, e)) == e; else { jboolean success = false; ObjectLocker ol(p, THREAD); if (*addr == e) { *addr = x; success = true; } return success; } UNSAFE_END David - Does anybody know the in-depth interpretation of the above? Is it only the particular Java/native type differences (for example, endianess of variables) that these two addressing modes might interpret differently or something else too? Regards, Peter On 02/24/2013 12:39 AM, David Holmes wrote: Peter, In your use of Unsafe you pass null as the object. I'm pretty certain you can't pass null here. Unsafe operates on fields or array elements. David On 24/02/2013 5:39 AM, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Index: jdk/src/share/classes/sun/misc/PerfCounter.java === --- jdk/src/share/classes/sun/misc/PerfCounter.java +++ jdk/src/share/classes/sun/misc/PerfCounter.java @@ -25,6 +25,8 @@ package sun.misc; +import sun.nio.ch.DirectBuffer; + import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.LongBuffer; @@ -50,6 +52,8 @@ public class PerfCounter { private static final Perf perf = AccessController.doPrivileged(new Perf.GetPerfAction()); +private static final Unsafe unsafe = +Unsafe.getUnsafe(); // Must match values defined in hotspot/src/share/vm/runtime/perfdata.hpp private final static int V_Constant = 1; @@ -59,12 +63,14 @@ private final String name; private final LongBuffer lb; +private final DirectBuffer db; private PerfCounter(String name, int type) { this.name = name; ByteBuffer bb = perf.createLong(name, U_None, type, 0L); bb.order(ByteOrder.nativeOrder()); this.lb = bb.asLongBuffer(); +this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null; } static PerfCounter newPerfCounter(String name) { @@ -79,23 +85,44 @@ /** * Returns the current value of the perf counter. */ -public synchronized long get() { +public long get() { +if (db != null) { +return unsafe.getLongVolatile(null, db.address()); +} +else { +synchronized (this) { -return lb.get(0); -} +return lb.get(0); +} +} +} /** * Sets the value of the perf counter to the given newValue. */ -public synchronized void set(long newValue) { +public void set(long newValue) { +if (db != null) { +unsafe.putOrderedLong(null, db.address(), newValue); +} +else { +synchronized (this) { -lb.put(0, newValue); -} +lb.put(0, newValue); +} +} +} /** * Adds the given value to the perf counter. */ -public synchronized void add(long value) { -long res = get() + value; +public void add(long value) { +if (db != null) { +unsafe.getAndAddLong(null, db.address(), value); +} +else { +synchronized (this) { +long res = lb.get(0) + value; -lb.put(0, res); +lb.put(0, res); +} +} } /** Testing the PerfCounter.increment() method in a loop on multiple threads sharing the same PerfCounter
Re: RFR: 8007806: Need a Throwables performance counter
On 02/24/2013 12:35 AM, David Holmes wrote: On 24/02/2013 6:08 AM, Alan Bateman wrote: On 23/02/2013 19:39, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Most of the jvmstat counters are in VM and we only update a small number of counters in the libraries. Even so, I think your idea is good as further usage could potentially to be an issue, particularly add/increment (which involves a get at this). So irrespective of Nil's patch (which I think is now just a proposal to add a counter, not original patch that Jason was commenting on) then we should take your patch. Before we rush down this path. Atomic operations on 64-bit types are not supported natively on all 32-bit platforms. Atomic updates on those platforms will degenerate into locking - which is the barrier to implementing frequently used counters in the first place. David It's true, yes. I tried replacing atomic add with a read/CAS loop (like it is coded in Unsafe as a fall-back for platforms that don't support atomic add) and even on Intel 64bit such add performs on-par with synchronized method as scalability is concerned. It's raw speed when not contended is 2x the synchronized variant. I also found out the compatibility gotcha when using null as the 1st argument to Unsafe double-register addressing mode methods. On platforms, that don't support 64bit CAS natively, the argument is used as an Object reference to synchronize on to emulate the CAS instruction. On Raspery-PI (armv6) for example, the patch causes a crash for that reason: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x40659f80, pid=2450, tid=1086497904 # # JRE version: Java(TM) SE Runtime Environment (8.0) # Java VM: Java HotSpot(TM) Client VM (25.0-b04 mixed mode linux-arm ) # Problematic frame: # V [libjvm.so+0x3d3f80] ObjectSynchronizer::slow_enter(Handle, BasicLock*, Thread*)+0x4 Regards, Peter -Alan
Re: RFR: 8007806: Need a Throwables performance counter
On 02/24/2013 11:31 AM, David Holmes wrote: On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset)); if (VM_Version::supports_cx8()) return (jlong)(Atomic::cmpxchg(x, addr, e)) == e; else { jboolean success = false; ObjectLocker ol(p, THREAD); if (*addr == e) { *addr = x; success = true; } return success; } UNSAFE_END David - Oh yes, I found that out too... :-( Thanks, Peter Does anybody know the in-depth interpretation of the above? Is it only the particular Java/native type differences (for example, endianess of variables) that these two addressing modes might interpret differently or something else too? Regards, Peter On 02/24/2013 12:39 AM, David Holmes wrote: Peter, In your use of Unsafe you pass null as the object. I'm pretty certain you can't pass null here. Unsafe operates on fields or array elements. David On 24/02/2013 5:39 AM, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Index: jdk/src/share/classes/sun/misc/PerfCounter.java === --- jdk/src/share/classes/sun/misc/PerfCounter.java +++ jdk/src/share/classes/sun/misc/PerfCounter.java @@ -25,6 +25,8 @@ package sun.misc; +import sun.nio.ch.DirectBuffer; + import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.LongBuffer; @@ -50,6 +52,8 @@ public class PerfCounter { private static final Perf perf = AccessController.doPrivileged(new Perf.GetPerfAction()); +private static final Unsafe unsafe = +Unsafe.getUnsafe(); // Must match values defined in hotspot/src/share/vm/runtime/perfdata.hpp private final static int V_Constant = 1; @@ -59,12 +63,14 @@ private final String name; private final LongBuffer lb; +private final DirectBuffer db; private PerfCounter(String name, int type) { this.name = name; ByteBuffer bb = perf.createLong(name, U_None, type, 0L); bb.order(ByteOrder.nativeOrder()); this.lb = bb.asLongBuffer(); +this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null; } static PerfCounter newPerfCounter(String name) { @@ -79,23 +85,44 @@ /** * Returns the current value of the perf counter. */ -public synchronized long get() { +public long get() { +if (db != null) { +return unsafe.getLongVolatile(null, db.address()); +} +else { +synchronized (this) { -return lb.get(0); -} +return lb.get(0); +} +} +} /** * Sets the value of the perf counter to the given newValue. */ -public synchronized void set(long newValue) { +public void set(long newValue) { +if (db != null) { +unsafe.putOrderedLong(null, db.address(), newValue); +} +else { +synchronized (this) { -lb.put(0, newValue); -} +lb.put(0, newValue); +} +} +} /** * Adds the given value to the perf counter. */ -public synchronized void add(long value) { -long res = get() + value; +public void add(long value) { +if (db != null) { +unsafe.getAndAddLong(null, db.address(), value); +} +else { +synchronized (this) { +long res = lb.get(0) + value; -lb.put(0, res); +lb.put(0, res); +} +} } /**
Re: RFR: 8007806: Need a Throwables performance counter
Hi Alan, David, Nils, I just want to clear something regarding PerfCounter implementation. Access to 64bit value in native memory is through a direct buffer which uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on processors that don't support atomic 64bit stores/loads, each access results in two separate 32bit load/store accesses right? The PerfCounter methods that access the 64bit value are synchronized, using PerfCounter instance as a lock. But how is this 64bit value accessed for example in the jstat utility? Is it possible that jstat can see one half of the 64bit value before the update and the other half after the update? If this is true and it is not that important, then instead of a synchronized update of 64bit counter, a 32bit CAS could be used, optionally (rarely) followed by a second 32bit CAS, like for example: http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html I tried this on ARM v6 and it works much better than synchronized access, but I don't know if it's acceptable. It guarantees eventual correctness of summed value if the only operation performed is add() (no set() intermingled) and has the same possibility of incorrect half-half reads by observers as current PerfCounter has for unsynchronized observers. Here's the comparison of unpatched/patched PerfCounter.increment() micro-benchmark on single-core ARM v6 (Raspbery-PI): *** Original PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =269.34 ns/op (? = 0.00 ns/op) [ 269.34] 2 threads, Tavg = 7,170.48 ns/op (? = 410.77 ns/op) [ 6,783.73, 7,603.95] 3 threads, Tavg = 12,034.82 ns/op (? = 418.99 ns/op) [11,792.33, 11,714.67, 12,639.26] 4 threads, Tavg = 16,029.76 ns/op (? = 1,411.44 ns/op) [15,592.04, 18,511.52, 15,642.52, 14,818.16] *** Patched PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =166.21 ns/op (? = 0.00 ns/op) [ 166.21] 2 threads, Tavg =332.58 ns/op (? = 0.12 ns/op) [ 332.45,332.70] 3 threads, Tavg =500.30 ns/op (? = 0.22 ns/op) [ 500.04,500.29,500.58] 4 threads, Tavg =667.95 ns/op (? = 2.11 ns/op) [ 665.22,667.18,668.40,671.04] Regards, Peter On 02/24/2013 11:31 AM, David Holmes wrote: On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset)); if (VM_Version::supports_cx8()) return (jlong)(Atomic::cmpxchg(x, addr, e)) == e; else { jboolean success = false; ObjectLocker ol(p, THREAD); if (*addr == e) { *addr = x; success = true; } return success; } UNSAFE_END David - Does anybody know the in-depth interpretation of the above? Is it only the particular Java/native type differences (for example, endianess of variables) that these two addressing modes might interpret differently or something else too? Regards, Peter On 02/24/2013 12:39 AM, David Holmes wrote: Peter, In your use of Unsafe you pass null as the object. I'm pretty certain you can't pass null here. Unsafe operates on fields or array elements. David On 24/02/2013 5:39 AM, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Index: jdk/src/share/classes/sun/misc/PerfCounter.java === --- jdk/src/share/classes/sun/misc/PerfCounter.java +++ jdk/src/share/classes/sun/misc/PerfCounter.java @@ -25,6
Re: RFR: 8007806: Need a Throwables performance counter
On 25/02/2013 6:18 AM, Peter Levart wrote: Hi Alan, David, Nils, I just want to clear something regarding PerfCounter implementation. Access to 64bit value in native memory is through a direct buffer which uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on processors that don't support atomic 64bit stores/loads, each access results in two separate 32bit load/store accesses right? Unsafe.get|setLong uses locking on those platforms. The PerfCounter methods that access the 64bit value are synchronized, using PerfCounter instance as a lock. But how is this 64bit value accessed for example in the jstat utility? Is it possible that jstat can see one half of the 64bit value before the update and the other half after the update? Does jstat access these values directly or only via the synchronized methods? If the latter then the value can't be torn that way. The sync method will store the value in 2 32-bit registers, and the variable load in jstat will take two instructions, but nothing can touch those registers. David - If this is true and it is not that important, then instead of a synchronized update of 64bit counter, a 32bit CAS could be used, optionally (rarely) followed by a second 32bit CAS, like for example: http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html I tried this on ARM v6 and it works much better than synchronized access, but I don't know if it's acceptable. It guarantees eventual correctness of summed value if the only operation performed is add() (no set() intermingled) and has the same possibility of incorrect half-half reads by observers as current PerfCounter has for unsynchronized observers. Here's the comparison of unpatched/patched PerfCounter.increment() micro-benchmark on single-core ARM v6 (Raspbery-PI): *** Original PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =269.34 ns/op (σ = 0.00 ns/op) [ 269.34] 2 threads, Tavg = 7,170.48 ns/op (σ = 410.77 ns/op) [ 6,783.73, 7,603.95] 3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op) [11,792.33, 11,714.67, 12,639.26] 4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op) [15,592.04, 18,511.52, 15,642.52, 14,818.16] *** Patched PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =166.21 ns/op (σ = 0.00 ns/op) [ 166.21] 2 threads, Tavg =332.58 ns/op (σ = 0.12 ns/op) [ 332.45,332.70] 3 threads, Tavg =500.30 ns/op (σ = 0.22 ns/op) [ 500.04,500.29,500.58] 4 threads, Tavg =667.95 ns/op (σ = 2.11 ns/op) [ 665.22,667.18,668.40,671.04] Regards, Peter On 02/24/2013 11:31 AM, David Holmes wrote: On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset)); if (VM_Version::supports_cx8()) return (jlong)(Atomic::cmpxchg(x, addr, e)) == e; else { jboolean success = false; ObjectLocker ol(p, THREAD); if (*addr == e) { *addr = x; success = true; } return success; } UNSAFE_END David - Does anybody know the in-depth interpretation of the above? Is it only the particular Java/native type differences (for example, endianess of variables) that these two addressing modes might interpret differently or something else too? Regards, Peter On 02/24/2013 12:39 AM, David Holmes wrote: Peter, In your use of Unsafe you pass null as the object. I'm pretty certain you can't pass null here. Unsafe operates on fields or array elements. David On 24/02/2013 5:39 AM, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be
Re: RFR: 8007806: Need a Throwables performance counter
On 02/24/2013 09:57 PM, David Holmes wrote: On 25/02/2013 6:18 AM, Peter Levart wrote: Hi Alan, David, Nils, I just want to clear something regarding PerfCounter implementation. Access to 64bit value in native memory is through a direct buffer which uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on processors that don't support atomic 64bit stores/loads, each access results in two separate 32bit load/store accesses right? Unsafe.get|setLong uses locking on those platforms. Even if it does, it is important whether all accesses to this 64bit value are using locking and whether they are using the same lock. Aren't performance counters JVM native variables where just some of them happen to be updated from Java? The PerfCounter methods that access the 64bit value are synchronized, using PerfCounter instance as a lock. But how is this 64bit value accessed for example in the jstat utility? Is it possible that jstat can see one half of the 64bit value before the update and the other half after the update? Does jstat access these values directly or only via the synchronized methods? If the latter then the value can't be torn that way. The sync method will store the value in 2 32-bit registers, and the variable load in jstat will take two instructions, but nothing can touch those registers. I'm not saying that the value could be corrupted in any way, just that the unsynchronized observer (like jstat) could see it torn sometimes. Regards, Peter David - If this is true and it is not that important, then instead of a synchronized update of 64bit counter, a 32bit CAS could be used, optionally (rarely) followed by a second 32bit CAS, like for example: http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html I tried this on ARM v6 and it works much better than synchronized access, but I don't know if it's acceptable. It guarantees eventual correctness of summed value if the only operation performed is add() (no set() intermingled) and has the same possibility of incorrect half-half reads by observers as current PerfCounter has for unsynchronized observers. Here's the comparison of unpatched/patched PerfCounter.increment() micro-benchmark on single-core ARM v6 (Raspbery-PI): *** Original PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =269.34 ns/op (σ = 0.00 ns/op) [ 269.34] 2 threads, Tavg = 7,170.48 ns/op (σ = 410.77 ns/op) [ 6,783.73, 7,603.95] 3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op) [11,792.33, 11,714.67, 12,639.26] 4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op) [15,592.04, 18,511.52, 15,642.52, 14,818.16] *** Patched PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =166.21 ns/op (σ = 0.00 ns/op) [ 166.21] 2 threads, Tavg =332.58 ns/op (σ = 0.12 ns/op) [ 332.45,332.70] 3 threads, Tavg =500.30 ns/op (σ = 0.22 ns/op) [ 500.04,500.29,500.58] 4 threads, Tavg =667.95 ns/op (σ = 2.11 ns/op) [ 665.22,667.18,668.40,671.04] Regards, Peter On 02/24/2013 11:31 AM, David Holmes wrote: On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset)); if (VM_Version::supports_cx8()) return (jlong)(Atomic::cmpxchg(x, addr, e)) == e; else { jboolean success = false; ObjectLocker ol(p, THREAD); if (*addr == e) { *addr = x; success = true; } return success; } UNSAFE_END David - Does anybody know the in-depth interpretation of the above? Is it only the particular Java/native type differences (for example,
Re: RFR: 8007806: Need a Throwables performance counter
Peter, On 25/02/2013 7:25 AM, Peter Levart wrote: On 02/24/2013 09:57 PM, David Holmes wrote: On 25/02/2013 6:18 AM, Peter Levart wrote: Hi Alan, David, Nils, I just want to clear something regarding PerfCounter implementation. Access to 64bit value in native memory is through a direct buffer which uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on processors that don't support atomic 64bit stores/loads, each access results in two separate 32bit load/store accesses right? Unsafe.get|setLong uses locking on those platforms. Even if it does, it is important whether all accesses to this 64bit value are using locking and whether they are using the same lock. Aren't performance counters JVM native variables where just some of them happen to be updated from Java? AFAICS PerfCounters have no thread-safety properties or guarantees. So it is up to the user of each counter to use it in an appropriate way. I think the serviceability folk would have to chime in here as to how PerfCounters are supposed to be used. The PerfCounter methods that access the 64bit value are synchronized, using PerfCounter instance as a lock. But how is this 64bit value accessed for example in the jstat utility? Is it possible that jstat can see one half of the 64bit value before the update and the other half after the update? Does jstat access these values directly or only via the synchronized methods? If the latter then the value can't be torn that way. The sync method will store the value in 2 32-bit registers, and the variable load in jstat will take two instructions, but nothing can touch those registers. I'm not saying that the value could be corrupted in any way, just that the unsynchronized observer (like jstat) could see it torn sometimes. If the value is initially read in a sync block and all updates are also synchronized, then I don't think it can. But you need to look at actual code to determine this. David Regards, Peter David - If this is true and it is not that important, then instead of a synchronized update of 64bit counter, a 32bit CAS could be used, optionally (rarely) followed by a second 32bit CAS, like for example: http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html I tried this on ARM v6 and it works much better than synchronized access, but I don't know if it's acceptable. It guarantees eventual correctness of summed value if the only operation performed is add() (no set() intermingled) and has the same possibility of incorrect half-half reads by observers as current PerfCounter has for unsynchronized observers. Here's the comparison of unpatched/patched PerfCounter.increment() micro-benchmark on single-core ARM v6 (Raspbery-PI): *** Original PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =269.34 ns/op (σ = 0.00 ns/op) [ 269.34] 2 threads, Tavg = 7,170.48 ns/op (σ = 410.77 ns/op) [ 6,783.73, 7,603.95] 3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op) [11,792.33, 11,714.67, 12,639.26] 4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op) [15,592.04, 18,511.52, 15,642.52, 14,818.16] *** Patched PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =166.21 ns/op (σ = 0.00 ns/op) [ 166.21] 2 threads, Tavg =332.58 ns/op (σ = 0.12 ns/op) [ 332.45,332.70] 3 threads, Tavg =500.30 ns/op (σ = 0.22 ns/op) [ 500.04,500.29,500.58] 4 threads, Tavg =667.95 ns/op (σ = 2.11 ns/op) [ 665.22,667.18,668.40,671.04] Regards, Peter On 02/24/2013 11:31 AM, David Holmes wrote: On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong*
Re: RFR: 8007806: Need a Throwables performance counter
We've not-so-slightly hijacked Nils' thread here - apologies for that. On 25/02/2013 8:05 AM, Peter Levart wrote: Just looked at one way jstat accesses the counters. It runs in a separate VM and maps-in a file that is already mapped in the observing VM in the direct buffer. It then accesses it via a LongBuffer view (for long counters). So there's no synchronization between counter updater and counter reader. On ARM v6 jstat could see a torn long counter then, right? Right. With current implementation of PerfLongCounter it uses simple stores (not atomic ops). The double-32bit-CAS updater that I presented would not make it worse then on such platforms, I suppose. No change in tearing abaility. On the platforms that support 64bit atomic stores, there are not such problems. And I assume those same platforms also support 64bit CAS, or are there platforms with 64bit atomic stores and no 64bit CAS? Most of them actually :) All Java platforms must support atomic load/store of 64-bit values to support volatile long and double variables. On 32-bit platforms this is done via a range of techniques - for example on x86 it is done via the FPU. But these atomic accesses are currently restricted to Java volatile field accesses via bytecode - there are not exposed via the Unsafe methods, nor are they made available via the Atomic:: class in the VM. Some of these 32-bit platforms also support the 64-bit CAS, which is what supports_cx8() is intended to indicate. If the PerfCounters were supposed to be thread-safe then they might use these alternate atomic access operations. David Regards, Peter David Regards, Peter David - If this is true and it is not that important, then instead of a synchronized update of 64bit counter, a 32bit CAS could be used, optionally (rarely) followed by a second 32bit CAS, like for example: http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html I tried this on ARM v6 and it works much better than synchronized access, but I don't know if it's acceptable. It guarantees eventual correctness of summed value if the only operation performed is add() (no set() intermingled) and has the same possibility of incorrect half-half reads by observers as current PerfCounter has for unsynchronized observers. Here's the comparison of unpatched/patched PerfCounter.increment() micro-benchmark on single-core ARM v6 (Raspbery-PI): *** Original PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =269.34 ns/op (σ = 0.00 ns/op) [ 269.34] 2 threads, Tavg = 7,170.48 ns/op (σ = 410.77 ns/op) [ 6,783.73, 7,603.95] 3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op) [11,792.33, 11,714.67, 12,639.26] 4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op) [15,592.04, 18,511.52, 15,642.52, 14,818.16] *** Patched PerfCounter, ARM v6 # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1 # 1 threads, Tavg =166.21 ns/op (σ = 0.00 ns/op) [ 166.21] 2 threads, Tavg =332.58 ns/op (σ = 0.12 ns/op) [ 332.45,332.70] 3 threads, Tavg =500.30 ns/op (σ = 0.22 ns/op) [ 500.04,500.29,500.58] 4 threads, Tavg =667.95 ns/op (σ = 2.11 ns/op) [ 665.22,667.18,668.40,671.04] Regards, Peter On 02/24/2013 11:31 AM, David Holmes wrote: On 24/02/2013 6:50 PM, Peter Levart wrote: Hi David, I thought it was ok to pass null, but I don't know the portability issues in-depth. The javadoc for Unsafe says: /This method refers to a variable by means of two parameters, and so it provides (in effect) a double-register addressing mode for Java variables. When the object reference is null, this method uses its offset as an absolute address. This is similar in operation to methods such as getInt(long), which provide (in effect) a single-register addressing mode for non-Java variables. However, because Java variables may have a different layout in memory from non-Java variables, programmers should not assume that these two addressing modes are ever equivalent. Also, programmers should remember that offsets from the double-register addressing mode cannot be portably confused with longs used in the single-register addressing mode./ That is the doc for getXXX but not for getAndAddXXX or compareAndSwapXXX. You can't have null here: UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jlong e, jlong x)) UnsafeWrapper(Unsafe_CompareAndSwapLong); Handle p (THREAD, JNIHandles::resolve(obj)); jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset)); if (VM_Version::supports_cx8()) return (jlong)(Atomic::cmpxchg(x, addr, e)) == e; else { jboolean success = false; ObjectLocker ol(p, THREAD); if (*addr == e) { *addr = x; success = true; } return success; } UNSAFE_END David
Re: RFR: 8007806: Need a Throwables performance counter
I'm getting very confused regarding the different threads on this. But this webrev: http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/ shows the intrusive event tracing that I'm sure we said we would grudgingly accept for 7u but not for JDK 8. David On 23/02/2013 2:16 AM, Jason Mehrens wrote: From this webrev http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/ you are counting the number of throwables constructed. You might want to change the name to reflect that. I don't think anyone would want to write a spec for how many throwables are thrown given that a throwable can be wrapped, suppressed, rethrown, etc. Jason Date: Fri, 22 Feb 2013 14:10:02 +0100 From: nils.loo...@oracle.com To: core-libs-dev@openjdk.java.net; serviceability-...@openjdk.java.net Subject: Fwd: RFR: 8007806: Need a Throwables performance counter Does anyone have anything strongly against this? This is a small change just to add a performance counter, the code to increment it and read it will live in other parts of the code and be a part of a larger separate commit. Alan Bateman gave the response that the name was inappropriate, but I don't really have a better alternative, but neither do I have strong opinions, so feedback would be appreciated! How about sun.throwables.thrownThrowables ? Regards, Nils Loodin Original Message Subject: RFR: 8007806: Need a Throwables performance counter Date: Fri, 08 Feb 2013 18:10:02 +0100 From: Nils Loodin nils.loo...@oracle.com To: core-libs-dev@openjdk.java.net,serviceability_dev_ww_grp serviceability_dev_ww_...@oracle.com It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance counter, since it is accessible both from Java and from the VM. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806 http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Index: jdk/src/share/classes/sun/misc/PerfCounter.java === --- jdk/src/share/classes/sun/misc/PerfCounter.java +++ jdk/src/share/classes/sun/misc/PerfCounter.java @@ -25,6 +25,8 @@ package sun.misc; +import sun.nio.ch.DirectBuffer; + import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.LongBuffer; @@ -50,6 +52,8 @@ public class PerfCounter { private static final Perf perf = AccessController.doPrivileged(new Perf.GetPerfAction()); +private static final Unsafe unsafe = +Unsafe.getUnsafe(); // Must match values defined in hotspot/src/share/vm/runtime/perfdata.hpp private final static int V_Constant = 1; @@ -59,12 +63,14 @@ private final String name; private final LongBuffer lb; +private final DirectBuffer db; private PerfCounter(String name, int type) { this.name = name; ByteBuffer bb = perf.createLong(name, U_None, type, 0L); bb.order(ByteOrder.nativeOrder()); this.lb = bb.asLongBuffer(); +this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null; } static PerfCounter newPerfCounter(String name) { @@ -79,23 +85,44 @@ /** * Returns the current value of the perf counter. */ -public synchronized long get() { +public long get() { +if (db != null) { +return unsafe.getLongVolatile(null, db.address()); +} +else { +synchronized (this) { -return lb.get(0); -} +return lb.get(0); +} +} +} /** * Sets the value of the perf counter to the given newValue. */ -public synchronized void set(long newValue) { +public void set(long newValue) { +if (db != null) { +unsafe.putOrderedLong(null, db.address(), newValue); +} +else { +synchronized (this) { -lb.put(0, newValue); -} +lb.put(0, newValue); +} +} +} /** * Adds the given value to the perf counter. */ -public synchronized void add(long value) { -long res = get() + value; +public void add(long value) { +if (db != null) { +unsafe.getAndAddLong(null, db.address(), value); +} +else { +synchronized (this) { +long res = lb.get(0) + value; -lb.put(0, res); +lb.put(0, res); +} +} } /** Testing the PerfCounter.increment() method in a loop on multiple threads sharing the same PerfCounter instance, for example, on a 4-core Intel i7 machine produces the following results: # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 8 # 1 threads, Tavg = 19.02 ns/op (? = 0.00 ns/op) 2 threads, Tavg =109.93 ns/op (? = 6.17 ns/op) 3 threads, Tavg =136.64 ns/op (? = 2.99 ns/op) 4 threads, Tavg =293.26 ns/op (? = 5.30 ns/op) 5 threads, Tavg =316.94 ns/op (? = 6.28 ns/op) 6 threads, Tavg =686.96 ns/op (? = 7.09 ns/op) 7 threads, Tavg =793.28 ns/op (? = 10.57 ns/op) 8 threads, Tavg =898.15 ns/op (? = 14.63 ns/op) With the presented patch, the results are a little better: # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 8 # # Measure: 1 threads, Tavg = 5.22 ns/op (? = 0.00 ns/op) 2 threads, Tavg = 34.51 ns/op (? = 0.60 ns/op) 3 threads, Tavg = 54.85 ns/op (? = 1.42 ns/op) 4 threads, Tavg = 74.67 ns/op (? = 1.71 ns/op) 5 threads, Tavg = 94.71 ns/op (? = 41.68 ns/op) 6 threads, Tavg =114.80 ns/op (? = 32.10 ns/op) 7 threads, Tavg =136.70 ns/op (? = 26.80 ns/op) 8 threads, Tavg =158.48 ns/op (? = 9.93 ns/op) The scalability is not much better, but the raw speed is, so it might present less contention when used in real-world code. If you wanted even better scalability, there is a new class in JDK8, the java.util.concurrent.LongAdder. But that doesn't buy atomic set() - only add(). And it can't update native-memory variables, so it could only be used for add-only counters and in conjunction with a background thread that would periodically flush the sum to the native memory Regards, Peter On 02/08/2013 06:10 PM, Nils Loodin wrote: It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance
Re: RFR: 8007806: Need a Throwables performance counter
On 23/02/2013 19:39, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Most of the jvmstat counters are in VM and we only update a small number of counters in the libraries. Even so, I think your idea is good as further usage could potentially to be an issue, particularly add/increment (which involves a get at this). So irrespective of Nil's patch (which I think is now just a proposal to add a counter, not original patch that Jason was commenting on) then we should take your patch. -Alan
Re: RFR: 8007806: Need a Throwables performance counter
On 24/02/2013 6:08 AM, Alan Bateman wrote: On 23/02/2013 19:39, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Most of the jvmstat counters are in VM and we only update a small number of counters in the libraries. Even so, I think your idea is good as further usage could potentially to be an issue, particularly add/increment (which involves a get at this). So irrespective of Nil's patch (which I think is now just a proposal to add a counter, not original patch that Jason was commenting on) then we should take your patch. Before we rush down this path. Atomic operations on 64-bit types are not supported natively on all 32-bit platforms. Atomic updates on those platforms will degenerate into locking - which is the barrier to implementing frequently used counters in the first place. David -Alan
Re: RFR: 8007806: Need a Throwables performance counter
Peter, In your use of Unsafe you pass null as the object. I'm pretty certain you can't pass null here. Unsafe operates on fields or array elements. David On 24/02/2013 5:39 AM, Peter Levart wrote: Hi Nils, If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example: Index: jdk/src/share/classes/sun/misc/PerfCounter.java === --- jdk/src/share/classes/sun/misc/PerfCounter.java +++ jdk/src/share/classes/sun/misc/PerfCounter.java @@ -25,6 +25,8 @@ package sun.misc; +import sun.nio.ch.DirectBuffer; + import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.LongBuffer; @@ -50,6 +52,8 @@ public class PerfCounter { private static final Perf perf = AccessController.doPrivileged(new Perf.GetPerfAction()); +private static final Unsafe unsafe = +Unsafe.getUnsafe(); // Must match values defined in hotspot/src/share/vm/runtime/perfdata.hpp private final static int V_Constant = 1; @@ -59,12 +63,14 @@ private final String name; private final LongBuffer lb; +private final DirectBuffer db; private PerfCounter(String name, int type) { this.name = name; ByteBuffer bb = perf.createLong(name, U_None, type, 0L); bb.order(ByteOrder.nativeOrder()); this.lb = bb.asLongBuffer(); +this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null; } static PerfCounter newPerfCounter(String name) { @@ -79,23 +85,44 @@ /** * Returns the current value of the perf counter. */ -public synchronized long get() { +public long get() { +if (db != null) { +return unsafe.getLongVolatile(null, db.address()); +} +else { +synchronized (this) { -return lb.get(0); -} +return lb.get(0); +} +} +} /** * Sets the value of the perf counter to the given newValue. */ -public synchronized void set(long newValue) { +public void set(long newValue) { +if (db != null) { +unsafe.putOrderedLong(null, db.address(), newValue); +} +else { +synchronized (this) { -lb.put(0, newValue); -} +lb.put(0, newValue); +} +} +} /** * Adds the given value to the perf counter. */ -public synchronized void add(long value) { -long res = get() + value; +public void add(long value) { +if (db != null) { +unsafe.getAndAddLong(null, db.address(), value); +} +else { +synchronized (this) { +long res = lb.get(0) + value; -lb.put(0, res); +lb.put(0, res); +} +} } /** Testing the PerfCounter.increment() method in a loop on multiple threads sharing the same PerfCounter instance, for example, on a 4-core Intel i7 machine produces the following results: # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 8 # 1 threads, Tavg = 19.02 ns/op (? = 0.00 ns/op) 2 threads, Tavg =109.93 ns/op (? = 6.17 ns/op) 3 threads, Tavg =136.64 ns/op (? = 2.99 ns/op) 4 threads, Tavg =293.26 ns/op (? = 5.30 ns/op) 5 threads, Tavg =316.94 ns/op (? = 6.28 ns/op) 6 threads, Tavg =686.96 ns/op (? = 7.09 ns/op) 7 threads, Tavg =793.28 ns/op (? = 10.57 ns/op) 8 threads, Tavg =898.15 ns/op (? = 14.63 ns/op) With the presented patch, the results are a little better: # # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 8 # # Measure: 1 threads, Tavg = 5.22 ns/op (? = 0.00 ns/op) 2 threads, Tavg = 34.51 ns/op (? = 0.60 ns/op) 3 threads, Tavg = 54.85 ns/op (? = 1.42 ns/op) 4 threads, Tavg = 74.67 ns/op (? = 1.71 ns/op) 5 threads, Tavg = 94.71 ns/op (? = 41.68 ns/op) 6 threads, Tavg =114.80 ns/op (? = 32.10 ns/op) 7 threads, Tavg =136.70 ns/op (? = 26.80 ns/op) 8 threads, Tavg =158.48 ns/op (? = 9.93 ns/op) The scalability is not much better, but the raw speed is, so it might present less contention when used in real-world code. If you wanted even better scalability, there is a new class in JDK8, the java.util.concurrent.LongAdder. But that doesn't buy atomic set() - only add(). And it can't update native-memory variables, so it could only be used for add-only counters and in conjunction with a background thread that would periodically flush the sum to the native memory Regards, Peter On 02/08/2013
RE: RFR: 8007806: Need a Throwables performance counter
From this webrev http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/ you are counting the number of throwables constructed. You might want to change the name to reflect that. I don't think anyone would want to write a spec for how many throwables are thrown given that a throwable can be wrapped, suppressed, rethrown, etc. Jason Date: Fri, 22 Feb 2013 14:10:02 +0100 From: nils.loo...@oracle.com To: core-libs-dev@openjdk.java.net; serviceability-...@openjdk.java.net Subject: Fwd: RFR: 8007806: Need a Throwables performance counter Does anyone have anything strongly against this? This is a small change just to add a performance counter, the code to increment it and read it will live in other parts of the code and be a part of a larger separate commit. Alan Bateman gave the response that the name was inappropriate, but I don't really have a better alternative, but neither do I have strong opinions, so feedback would be appreciated! How about sun.throwables.thrownThrowables ? Regards, Nils Loodin Original Message Subject: RFR: 8007806: Need a Throwables performance counter Date: Fri, 08 Feb 2013 18:10:02 +0100 From: Nils Loodin nils.loo...@oracle.com To: core-libs-dev@openjdk.java.net,serviceability_dev_ww_grp serviceability_dev_ww_...@oracle.com It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance counter, since it is accessible both from Java and from the VM. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806 http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
Hey Kasper! You're right that there are methods that throw exceptions as a part of the normal program flow. However, this number can be (and has been) used as a very high level telemetry for an application. Depending on exactly how large this number is, and how fast it is growing, something can be seen to be amis in the application. Paired together with data of what type of exceptions are being thrown, and from where, it can be useful for support to have this information. Regards, Nils Loodin On 02/12/2013 06:28 PM, Kasper Nielsen wrote: Jrockit mission control supports an exception count based on exception type. I have found the approach of just counting the total number of exception completely useless. Mainly because there are methods that throw exceptions as part of the normal flow. For example, Class.forName() is commonly used to test whether or not a certain class is on the classpath. And most developers will cringe whenever they see an exception count0 for an application they think is bug free. On Fri, Feb 8, 2013 at 6:10 PM, Nils Loodin nils.loo...@oracle.com mailto:nils.loo...@oracle.com wrote: It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance counter, since it is accessible both from Java and from the VM. http://bugs.sun.com/__bugdatabase/view_bug.do?bug___id=8007806 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806 http://cr.openjdk.java.net/~__nloodin/8007806/webrev.00/ http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
Jrockit mission control supports an exception count based on exception type. I have found the approach of just counting the total number of exception completely useless. Mainly because there are methods that throw exceptions as part of the normal flow. For example, Class.forName() is commonly used to test whether or not a certain class is on the classpath. And most developers will cringe whenever they see an exception count0 for an application they think is bug free. On Fri, Feb 8, 2013 at 6:10 PM, Nils Loodin nils.loo...@oracle.com wrote: It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance counter, since it is accessible both from Java and from the VM. http://bugs.sun.com/**bugdatabase/view_bug.do?bug_**id=8007806http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806 http://cr.openjdk.java.net/~**nloodin/8007806/webrev.00/http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ Regards, Nils Loodin
Re: RFR: 8007806: Need a Throwables performance counter
On 08/02/2013 17:10, Nils Loodin wrote: It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance counter, since it is accessible both from Java and from the VM. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806 http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/ Regards, Nils Loodin Nils - this seems incomplete, where is the counter updated? -Alan