Re: RFR: 8007806: Need a Throwables performance counter

2013-02-26 Thread Nils Loodin

On 02/26/2013 12:18 AM, Mandy Chung wrote:

On 2/25/2013 8:30 AM, Nils Loodin wrote:

I changed it to

sun.throwables.numThrowables

in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/

Is this better? However, bear in mind that it's not exactly specified
where this is going to be incremented from yet. Nothing in this change
states that it's going to be incremented from the Throwable constructor.


I'm surprised to see adding a performance counter for an external
instrumentation agent to use.   I understand that we agree that dynamic
bytecode instrumentation is a more appropriate approach to instrument
and count the number of thrown throwables.


Just to be clear, what I was trying to say that this review is just to 
add the counter, and the discussion on how and when to access it is 
something that I'd much rather have in context of a review of that code...


But yes, we do agree that dynamic bytecode instrumentation is a more 
appropriate aproach to count the number of throwables. But that code and 
the following review would depend on this one.


Regards,
Nils Loodin



RE: RFR: 8007806: Need a Throwables performance counter

2013-02-26 Thread Jason Mehrens

 Just to be clear, what I was trying to say that this review is just to
 add the counter, and the discussion on how and when to access it is
 something that I'd much rather have in context of a review of that code...

Then the counter name should not bind you to how it is counting. +1 for 
'sun.throwables' as suggested by Peter.

Jason 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Nils Loodin

On 02/25/2013 12:03 AM, David Holmes wrote:

I'm getting very confused regarding the different threads on this. But
this webrev:

http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/

shows the intrusive event tracing that I'm sure we said we would
grudgingly accept for 7u but not for JDK 8.



David et al: No!

That showed the intrusive event tracing that we will not implement ever.
It also has nothing to do with this code review, which is only to 
implement a performance counter.


Regards,
Nils Loodin



Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Nils Loodin

On 02/24/2013 11:18 PM, David Holmes wrote:

We've not-so-slightly hijacked Nils' thread here - apologies for that.


David, Peter!

Yes you did :)
However, feel free to make it up to me by:
1. Suggest a good name for the counter
2. Plz to say yes to adding that counter
3. Profit!

Regards,
Nils Loodin


Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Alan Bateman

On 24/02/2013 20:57, David Holmes wrote:


Does jstat access these values directly or only via the synchronized 
methods? If the latter then the value can't be torn that way. The 
sync method will store the value in 2 32-bit registers, and the 
variable load in jstat will take two instructions, but nothing can 
touch those registers.
The instrumentation/jvmstat buffer is a shared memory file so jstat 
(running in a different VM) just polls the counters so there isn't any 
Java-level synchronization.


-Alan.


Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Alan Bateman

On 25/02/2013 09:23, Nils Loodin wrote:


David et al: No!

That showed the intrusive event tracing that we will not implement ever.
It also has nothing to do with this code review, which is only to 
implement a performance counter.

Just to confirm, this is the webrev that we should be reviewing:

http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

and the only issue (I think) is the name of the counter where I think 
sun.* or jdk.* should be fine.


-Alan.


Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Nils Loodin

On 02/25/2013 11:15 AM, Alan Bateman wrote:

On 25/02/2013 09:23, Nils Loodin wrote:


David et al: No!

That showed the intrusive event tracing that we will not implement ever.
It also has nothing to do with this code review, which is only to
implement a performance counter.

Just to confirm, this is the webrev that we should be reviewing:

http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

and the only issue (I think) is the name of the counter where I think
sun.* or jdk.* should be fine.

-Alan.


Does this mean that you approve of
http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/ ?

Regards,
Nisse


Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Peter Levart

On 02/25/2013 10:29 AM, Nils Loodin wrote:

On 02/24/2013 11:18 PM, David Holmes wrote:

We've not-so-slightly hijacked Nils' thread here - apologies for that.


David, Peter!

Yes you did :)
However, feel free to make it up to me by:
1. Suggest a good name for the counter


As Jason Mehrens pointed out in this thread, the thrownThrowables 
might be misleading if the counter is incremented in the Throwable 
constructors (or are you going to instrument the throw sites?). So it 
might better be called what it is: constructedThrowables. Yes, this is 
a common idiom:


throw new ThrowableSubclass(...);

...but the number of thrown and constructed throwables may diverge 
considerably in situations like for example:


- preallocated throwable instances used (and reused) as a form of long 
return (from lambdas).
- deserialized throwables on RMI clients that are thrown on the server 
side, transfered over the wire and re-thrown on client side 
(de-seriailization does not call the constructor)



2. Plz to say yes to adding that counter


It's not my call but I would be concerned about performance. Current set 
of counters that are updated from Java is not that critical, but having 
a counter of constructed objects updated via synchronized method (or a 
CAS) can have a negative impact on application performance. See what 
happened with HashMap constructor. The same could happen in situations 
where Throwable subclass is (miss)used for example to long return a 
result from...  say a lambda (again). This is perfectly performant and 
scalable now (just override the Throwable.fillInStackTrace with empty 
implementation and you get a long return(value)). I'm not saying that 
this is a good practice, but such usages will emerge with lambdas.


So although this thread is only about counter name, I think care should 
be put also on it's implementation. Perhaps with an option to disable 
instrumentation if requested from user? On a per-Throwable-subclass 
basis (like Logger levels)? Using external configuration (system 
properties) or special @DontTrace annotation?



3. Profit!


:-)

Regards, Peter



Regards,
Nils Loodin




Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Nils Loodin

On 02/25/2013 02:30 PM, Peter Levart wrote:

On 02/25/2013 10:29 AM, Nils Loodin wrote:

On 02/24/2013 11:18 PM, David Holmes wrote:

We've not-so-slightly hijacked Nils' thread here - apologies for that.


David, Peter!

Yes you did :)
However, feel free to make it up to me by:
1. Suggest a good name for the counter


As Jason Mehrens pointed out in this thread, the thrownThrowables 
might be misleading if the counter is incremented in the Throwable 
constructors (or are you going to instrument the throw sites?). So it 
might better be called what it is: constructedThrowables. Yes, this 
is a common idiom:


throw new ThrowableSubclass(...);

...but the number of thrown and constructed throwables may diverge 
considerably in situations like for example:


- preallocated throwable instances used (and reused) as a form of 
long return (from lambdas).
- deserialized throwables on RMI clients that are thrown on the server 
side, transfered over the wire and re-thrown on client side 
(de-seriailization does not call the constructor)




I changed it to

sun.throwables.numThrowables

in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/

Is this better? However, bear in mind that it's not exactly specified 
where this is going to be incremented from yet. Nothing in this change 
states that it's going to be incremented from the Throwable constructor.


Regards,
Nils Loodin




Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Peter Levart

On 02/25/2013 05:30 PM, Nils Loodin wrote:

On 02/25/2013 02:30 PM, Peter Levart wrote:

On 02/25/2013 10:29 AM, Nils Loodin wrote:

On 02/24/2013 11:18 PM, David Holmes wrote:

We've not-so-slightly hijacked Nils' thread here - apologies for that.


David, Peter!

Yes you did :)
However, feel free to make it up to me by:
1. Suggest a good name for the counter


As Jason Mehrens pointed out in this thread, the thrownThrowables 
might be misleading if the counter is incremented in the Throwable 
constructors (or are you going to instrument the throw sites?). So it 
might better be called what it is: constructedThrowables. Yes, this 
is a common idiom:


throw new ThrowableSubclass(...);

...but the number of thrown and constructed throwables may diverge 
considerably in situations like for example:


- preallocated throwable instances used (and reused) as a form of 
long return (from lambdas).
- deserialized throwables on RMI clients that are thrown on the 
server side, transfered over the wire and re-thrown on client side 
(de-seriailization does not call the constructor)




I changed it to

sun.throwables.numThrowables

in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/

Is this better? However, bear in mind that it's not exactly specified 
where this is going to be incremented from yet. Nothing in this change 
states that it's going to be incremented from the Throwable constructor.


These are just suggestions... thinking loud ...

Other hierarchical names for counters use two common forms (note 
plural/singular):


sun.classloader.findClassTime
sun.classloader.findClasses
sun.classloader.parentDelegationTime

or:

java.threads.daemon
java.threads.live
java.threads.livePeak
java.threads.started


So by these forms, the name could be:

sun.throwable.numThrowables

or:

sun.throwables.num[ber]


might be later changed to (if instrumented to increment at construction 
time and/or throw time):


sun.throwables.constructed
sun.throwables.thrown


until then, it could be simply:

sun.throwables


Regards, Peter



Regards,
Nils Loodin






Re: RFR: 8007806: Need a Throwables performance counter

2013-02-25 Thread Mandy Chung

On 2/25/2013 8:30 AM, Nils Loodin wrote:

I changed it to

sun.throwables.numThrowables

in http://cr.openjdk.java.net/~nloodin/8007806/webrev.01/

Is this better? However, bear in mind that it's not exactly specified 
where this is going to be incremented from yet. Nothing in this change 
states that it's going to be incremented from the Throwable constructor.


I'm surprised to see adding a performance counter for an external 
instrumentation agent to use.   I understand that we agree that dynamic 
bytecode instrumentation is a more appropriate approach to instrument 
and count the number of thrown throwables.I guess I am missing the 
big picture how you want the serviceability tools (jstat etc) to work 
with the perf counters and tracing events.


Mandy


Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread Peter Levart

Hi David,

I thought it was ok to pass null, but I don't know the portability 
issues in-depth. The javadoc for Unsafe says:


/This method refers to a variable by means of two parameters, and so it 
provides (in effect) a double-register addressing mode for Java 
variables. When the object reference is null, this method uses its 
offset as an absolute address. This is similar in operation to methods 
such as getInt(long), which provide (in effect) a single-register 
addressing mode for non-Java variables. However, because Java variables 
may have a different layout in memory from non-Java variables, 
programmers should not assume that these two addressing modes are ever 
equivalent. Also, programmers should remember that offsets from the 
double-register addressing mode cannot be portably confused with longs 
used in the single-register addressing mode./


Does anybody know the in-depth interpretation of the above? Is it only 
the particular Java/native type differences (for example, endianess of 
variables) that these two addressing modes might interpret differently 
or something else too?


Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass null as the object. I'm pretty 
certain you can't pass null here. Unsafe operates on fields or array 
elements.


David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

  package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
  import java.nio.ByteBuffer;
  import java.nio.ByteOrder;
  import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
  public class PerfCounter {
  private static final Perf perf =
  AccessController.doPrivileged(new Perf.GetPerfAction());
+private static final Unsafe unsafe =
+Unsafe.getUnsafe();

  // Must match values defined in
hotspot/src/share/vm/runtime/perfdata.hpp
  private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

  private final String name;
  private final LongBuffer lb;
+private final DirectBuffer db;

  private PerfCounter(String name, int type) {
  this.name = name;
  ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
  bb.order(ByteOrder.nativeOrder());
  this.lb = bb.asLongBuffer();
+this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : 
null;

  }

  static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
  /**
   * Returns the current value of the perf counter.
   */
-public synchronized long get() {
+public long get() {
+if (db != null) {
+return unsafe.getLongVolatile(null, db.address());
+}
+else {
+synchronized (this) {
-return lb.get(0);
-}
+return lb.get(0);
+}
+}
+}

  /**
   * Sets the value of the perf counter to the given newValue.
   */
-public synchronized void set(long newValue) {
+public void set(long newValue) {
+if (db != null) {
+unsafe.putOrderedLong(null, db.address(), newValue);
+}
+else {
+synchronized (this) {
-lb.put(0, newValue);
-}
+lb.put(0, newValue);
+}
+}
+}

  /**
   * Adds the given value to the perf counter.
   */
-public synchronized void add(long value) {
-long res = get() + value;
+public void add(long value) {
+if (db != null) {
+unsafe.getAndAddLong(null, db.address(), value);
+}
+else {
+synchronized (this) {
+long res = lb.get(0) + value;
-lb.put(0, res);
+lb.put(0, res);
+}
+}
  }

  /**



Testing the PerfCounter.increment() method in a loop on multiple threads
sharing the same PerfCounter instance, for example, on a 4-core Intel i7
machine produces the following results:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
1 threads, Tavg = 19.02 ns/op (? =   0.00 ns/op)
2 threads, Tavg =109.93 ns/op (? =   6.17 ns/op)
3 threads, Tavg =136.64 ns/op (? =   2.99 ns/op)
4 threads, Tavg =293.26 ns/op (? =   5.30 ns/op)
5 threads, Tavg =316.94 ns/op (? =   6.28 ns/op)
6 threads, Tavg =686.96 ns/op (? =   7.09 ns/op)
7 threads, Tavg =793.28 ns/op (? =  10.57 ns/op)
8 threads, Tavg =898.15 ns/op (? =  14.63 ns/op)



Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread Alan Bateman

On 23/02/2013 23:35, David Holmes wrote:


Before we rush down this path. Atomic operations on 64-bit types are 
not supported natively on all 32-bit platforms. Atomic updates on 
those platforms will degenerate into locking - which is the barrier to 
implementing frequently used counters in the first place.
I understand, and clearly more work/investigation is required, but I 
think it's worth looking into, particularly if Nils and others are going 
are adding further counters and instrumenting code dynamic to use them.


-Alan


Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread David Holmes

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or 
compareAndSwapXXX. You can't have null here:


UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject 
unsafe, jobject obj, jlong offset, jlong e, jlong x))

  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset));
  if (VM_Version::supports_cx8())
return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
jboolean success = false;
ObjectLocker ol(p, THREAD);
if (*addr == e) { *addr = x; success = true; }
return success;
  }
UNSAFE_END

David
-



Does anybody know the in-depth interpretation of the above? Is it only
the particular Java/native type differences (for example, endianess of
variables) that these two addressing modes might interpret differently
or something else too?

Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass null as the object. I'm pretty
certain you can't pass null here. Unsafe operates on fields or array
elements.

David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

  package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
  import java.nio.ByteBuffer;
  import java.nio.ByteOrder;
  import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
  public class PerfCounter {
  private static final Perf perf =
  AccessController.doPrivileged(new Perf.GetPerfAction());
+private static final Unsafe unsafe =
+Unsafe.getUnsafe();

  // Must match values defined in
hotspot/src/share/vm/runtime/perfdata.hpp
  private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

  private final String name;
  private final LongBuffer lb;
+private final DirectBuffer db;

  private PerfCounter(String name, int type) {
  this.name = name;
  ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
  bb.order(ByteOrder.nativeOrder());
  this.lb = bb.asLongBuffer();
+this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb :
null;
  }

  static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
  /**
   * Returns the current value of the perf counter.
   */
-public synchronized long get() {
+public long get() {
+if (db != null) {
+return unsafe.getLongVolatile(null, db.address());
+}
+else {
+synchronized (this) {
-return lb.get(0);
-}
+return lb.get(0);
+}
+}
+}

  /**
   * Sets the value of the perf counter to the given newValue.
   */
-public synchronized void set(long newValue) {
+public void set(long newValue) {
+if (db != null) {
+unsafe.putOrderedLong(null, db.address(), newValue);
+}
+else {
+synchronized (this) {
-lb.put(0, newValue);
-}
+lb.put(0, newValue);
+}
+}
+}

  /**
   * Adds the given value to the perf counter.
   */
-public synchronized void add(long value) {
-long res = get() + value;
+public void add(long value) {
+if (db != null) {
+unsafe.getAndAddLong(null, db.address(), value);
+}
+else {
+synchronized (this) {
+long res = lb.get(0) + value;
-lb.put(0, res);
+lb.put(0, res);
+}
+}
  }

  /**



Testing the PerfCounter.increment() method in a loop on multiple threads
sharing the same PerfCounter 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread Peter Levart


On 02/24/2013 12:35 AM, David Holmes wrote:

On 24/02/2013 6:08 AM, Alan Bateman wrote:

On 23/02/2013 19:39, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Most of the jvmstat counters are in VM and we only update a small number
of counters in the libraries. Even so, I think your idea is good as
further usage could potentially to be an issue, particularly
add/increment (which involves a get at this). So irrespective of Nil's
patch (which I think is now just a proposal to add a counter, not
original patch that Jason was commenting on) then we should take your
patch.


Before we rush down this path. Atomic operations on 64-bit types are 
not supported natively on all 32-bit platforms. Atomic updates on 
those platforms will degenerate into locking - which is the barrier to 
implementing frequently used counters in the first place.


David


It's true, yes. I tried replacing atomic add with a read/CAS loop (like 
it is coded in Unsafe as a fall-back for platforms that don't support 
atomic add) and even on Intel 64bit such add performs on-par with 
synchronized method as scalability is concerned. It's raw speed when not 
contended is 2x the synchronized variant.


I also found out the compatibility gotcha when using null as the 1st 
argument to Unsafe double-register addressing mode methods. On 
platforms, that don't support 64bit CAS natively, the argument is used 
as an Object reference to synchronize on to emulate the CAS instruction. 
On Raspery-PI (armv6) for example, the patch causes a crash for that reason:


# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x40659f80, pid=2450, tid=1086497904
#
# JRE version: Java(TM) SE Runtime Environment (8.0)
# Java VM: Java HotSpot(TM) Client VM (25.0-b04 mixed mode linux-arm )
# Problematic frame:
# V  [libjvm.so+0x3d3f80] ObjectSynchronizer::slow_enter(Handle, 
BasicLock*, Thread*)+0x4



Regards, Peter




-Alan





Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread Peter Levart


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or 
compareAndSwapXXX. You can't have null here:


UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject 
unsafe, jobject obj, jlong offset, jlong e, jlong x))

  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset));
  if (VM_Version::supports_cx8())
return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
jboolean success = false;
ObjectLocker ol(p, THREAD);
if (*addr == e) { *addr = x; success = true; }
return success;
  }
UNSAFE_END

David
-



Oh yes, I found that out too... :-(

Thanks,

Peter





Does anybody know the in-depth interpretation of the above? Is it only
the particular Java/native type differences (for example, endianess of
variables) that these two addressing modes might interpret differently
or something else too?

Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass null as the object. I'm pretty
certain you can't pass null here. Unsafe operates on fields or array
elements.

David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

  package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
  import java.nio.ByteBuffer;
  import java.nio.ByteOrder;
  import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
  public class PerfCounter {
  private static final Perf perf =
  AccessController.doPrivileged(new Perf.GetPerfAction());
+private static final Unsafe unsafe =
+Unsafe.getUnsafe();

  // Must match values defined in
hotspot/src/share/vm/runtime/perfdata.hpp
  private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

  private final String name;
  private final LongBuffer lb;
+private final DirectBuffer db;

  private PerfCounter(String name, int type) {
  this.name = name;
  ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
  bb.order(ByteOrder.nativeOrder());
  this.lb = bb.asLongBuffer();
+this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb :
null;
  }

  static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
  /**
   * Returns the current value of the perf counter.
   */
-public synchronized long get() {
+public long get() {
+if (db != null) {
+return unsafe.getLongVolatile(null, db.address());
+}
+else {
+synchronized (this) {
-return lb.get(0);
-}
+return lb.get(0);
+}
+}
+}

  /**
   * Sets the value of the perf counter to the given newValue.
   */
-public synchronized void set(long newValue) {
+public void set(long newValue) {
+if (db != null) {
+unsafe.putOrderedLong(null, db.address(), newValue);
+}
+else {
+synchronized (this) {
-lb.put(0, newValue);
-}
+lb.put(0, newValue);
+}
+}
+}

  /**
   * Adds the given value to the perf counter.
   */
-public synchronized void add(long value) {
-long res = get() + value;
+public void add(long value) {
+if (db != null) {
+unsafe.getAndAddLong(null, db.address(), value);
+}
+else {
+synchronized (this) {
+long res = lb.get(0) + value;
-lb.put(0, res);
+lb.put(0, res);
+}
+}
  }

  /**




Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread Peter Levart

Hi Alan, David, Nils,

I just want to clear something regarding PerfCounter implementation.

Access to 64bit value in native memory is through a direct buffer which 
uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on 
processors that don't support atomic 64bit stores/loads, each access 
results in two separate 32bit load/store accesses right?


The PerfCounter methods that access the 64bit value are synchronized, 
using PerfCounter instance as a lock. But how is this 64bit value 
accessed for example in the jstat utility? Is it possible that jstat can 
see one half of the 64bit value before the update and the other half 
after the update?


If this is true and it is not that important, then instead of a 
synchronized update of 64bit counter, a 32bit CAS could be used, 
optionally (rarely) followed by a second 32bit CAS, like for example:


http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html

I tried this on ARM v6 and it works much better than synchronized 
access, but I don't know if it's acceptable. It guarantees eventual 
correctness of summed value if the only operation performed is add() (no 
set() intermingled) and has the same possibility of incorrect half-half 
reads by observers as current PerfCounter has for unsynchronized observers.


Here's the comparison of unpatched/patched PerfCounter.increment() 
micro-benchmark on single-core ARM v6 (Raspbery-PI):


*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
   1 threads, Tavg =269.34 ns/op (? =   0.00 ns/op) [   269.34]
   2 threads, Tavg =  7,170.48 ns/op (? = 410.77 ns/op) [ 
6,783.73,  7,603.95]
   3 threads, Tavg = 12,034.82 ns/op (? = 418.99 ns/op) 
[11,792.33, 11,714.67, 12,639.26]
   4 threads, Tavg = 16,029.76 ns/op (? = 1,411.44 ns/op) 
[15,592.04, 18,511.52, 15,642.52, 14,818.16]



*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
   1 threads, Tavg =166.21 ns/op (? =   0.00 ns/op) [   166.21]
   2 threads, Tavg =332.58 ns/op (? =   0.12 ns/op) [   
332.45,332.70]
   3 threads, Tavg =500.30 ns/op (? =   0.22 ns/op) [   
500.04,500.29,500.58]
   4 threads, Tavg =667.95 ns/op (? =   2.11 ns/op) [   
665.22,667.18,668.40,671.04]



Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or 
compareAndSwapXXX. You can't have null here:


UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject 
unsafe, jobject obj, jlong offset, jlong e, jlong x))

  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset));
  if (VM_Version::supports_cx8())
return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
jboolean success = false;
ObjectLocker ol(p, THREAD);
if (*addr == e) { *addr = x; success = true; }
return success;
  }
UNSAFE_END

David
-



Does anybody know the in-depth interpretation of the above? Is it only
the particular Java/native type differences (for example, endianess of
variables) that these two addressing modes might interpret differently
or something else too?

Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass null as the object. I'm pretty
certain you can't pass null here. Unsafe operates on fields or array
elements.

David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread David Holmes

On 25/02/2013 6:18 AM, Peter Levart wrote:

Hi Alan, David, Nils,

I just want to clear something regarding PerfCounter implementation.

Access to 64bit value in native memory is through a direct buffer which
uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on
processors that don't support atomic 64bit stores/loads, each access
results in two separate 32bit load/store accesses right?


Unsafe.get|setLong uses locking on those platforms.


The PerfCounter methods that access the 64bit value are synchronized,
using PerfCounter instance as a lock. But how is this 64bit value
accessed for example in the jstat utility? Is it possible that jstat can
see one half of the 64bit value before the update and the other half
after the update?


Does jstat access these values directly or only via the synchronized 
methods? If the latter then the value can't be torn that way. The sync 
method will store the value in 2 32-bit registers, and the variable load 
in jstat will take two instructions, but nothing can touch those registers.


David
-


If this is true and it is not that important, then instead of a
synchronized update of 64bit counter, a 32bit CAS could be used,
optionally (rarely) followed by a second 32bit CAS, like for example:

http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html

I tried this on ARM v6 and it works much better than synchronized
access, but I don't know if it's acceptable. It guarantees eventual
correctness of summed value if the only operation performed is add() (no
set() intermingled) and has the same possibility of incorrect half-half
reads by observers as current PerfCounter has for unsynchronized observers.

Here's the comparison of unpatched/patched PerfCounter.increment()
micro-benchmark on single-core ARM v6 (Raspbery-PI):

*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =269.34 ns/op (σ =   0.00 ns/op) [   269.34]
2 threads, Tavg =  7,170.48 ns/op (σ = 410.77 ns/op) [
6,783.73,  7,603.95]
3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
[11,792.33, 11,714.67, 12,639.26]
4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
[15,592.04, 18,511.52, 15,642.52, 14,818.16]


*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =166.21 ns/op (σ =   0.00 ns/op) [   166.21]
2 threads, Tavg =332.58 ns/op (σ =   0.12 ns/op) [
332.45,332.70]
3 threads, Tavg =500.30 ns/op (σ =   0.22 ns/op) [
500.04,500.29,500.58]
4 threads, Tavg =667.95 ns/op (σ =   2.11 ns/op) [
665.22,667.18,668.40,671.04]


Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or
compareAndSwapXXX. You can't have null here:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
unsafe, jobject obj, jlong offset, jlong e, jlong x))
  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset));
  if (VM_Version::supports_cx8())
return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
jboolean success = false;
ObjectLocker ol(p, THREAD);
if (*addr == e) { *addr = x; success = true; }
return success;
  }
UNSAFE_END

David
-



Does anybody know the in-depth interpretation of the above? Is it only
the particular Java/native type differences (for example, endianess of
variables) that these two addressing modes might interpret differently
or something else too?

Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass null as the object. I'm pretty
certain you can't pass null here. Unsafe operates on fields or array
elements.

David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread Peter Levart


On 02/24/2013 09:57 PM, David Holmes wrote:

On 25/02/2013 6:18 AM, Peter Levart wrote:

Hi Alan, David, Nils,

I just want to clear something regarding PerfCounter implementation.

Access to 64bit value in native memory is through a direct buffer which
uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on
processors that don't support atomic 64bit stores/loads, each access
results in two separate 32bit load/store accesses right?


Unsafe.get|setLong uses locking on those platforms.


Even if it does, it is important whether all accesses to this 64bit 
value are using locking and whether they are using the same lock. Aren't 
performance counters JVM native variables where just some of them happen 
to be updated from Java?





The PerfCounter methods that access the 64bit value are synchronized,
using PerfCounter instance as a lock. But how is this 64bit value
accessed for example in the jstat utility? Is it possible that jstat can
see one half of the 64bit value before the update and the other half
after the update?


Does jstat access these values directly or only via the synchronized 
methods? If the latter then the value can't be torn that way. The 
sync method will store the value in 2 32-bit registers, and the 
variable load in jstat will take two instructions, but nothing can 
touch those registers.


I'm not saying that the value could be corrupted in any way, just that 
the unsynchronized observer (like jstat) could see it torn sometimes.


Regards, Peter



David
-


If this is true and it is not that important, then instead of a
synchronized update of 64bit counter, a 32bit CAS could be used,
optionally (rarely) followed by a second 32bit CAS, like for example:

http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html 



I tried this on ARM v6 and it works much better than synchronized
access, but I don't know if it's acceptable. It guarantees eventual
correctness of summed value if the only operation performed is add() (no
set() intermingled) and has the same possibility of incorrect half-half
reads by observers as current PerfCounter has for unsynchronized 
observers.


Here's the comparison of unpatched/patched PerfCounter.increment()
micro-benchmark on single-core ARM v6 (Raspbery-PI):

*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =269.34 ns/op (σ =   0.00 ns/op) [   
269.34]

2 threads, Tavg =  7,170.48 ns/op (σ = 410.77 ns/op) [
6,783.73,  7,603.95]
3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
[11,792.33, 11,714.67, 12,639.26]
4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
[15,592.04, 18,511.52, 15,642.52, 14,818.16]


*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =166.21 ns/op (σ =   0.00 ns/op) [   
166.21]

2 threads, Tavg =332.58 ns/op (σ =   0.12 ns/op) [
332.45,332.70]
3 threads, Tavg =500.30 ns/op (σ =   0.22 ns/op) [
500.04,500.29,500.58]
4 threads, Tavg =667.95 ns/op (σ =   2.11 ns/op) [
665.22,667.18,668.40,671.04]


Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and 
so it

provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java 
variables

may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or
compareAndSwapXXX. You can't have null here:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
unsafe, jobject obj, jlong offset, jlong e, jlong x))
  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), 
offset));

  if (VM_Version::supports_cx8())
return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
jboolean success = false;
ObjectLocker ol(p, THREAD);
if (*addr == e) { *addr = x; success = true; }
return success;
  }
UNSAFE_END

David
-



Does anybody know the in-depth interpretation of the above? Is it only
the particular Java/native type differences (for example, 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread David Holmes

Peter,

On 25/02/2013 7:25 AM, Peter Levart wrote:


On 02/24/2013 09:57 PM, David Holmes wrote:

On 25/02/2013 6:18 AM, Peter Levart wrote:

Hi Alan, David, Nils,

I just want to clear something regarding PerfCounter implementation.

Access to 64bit value in native memory is through a direct buffer which
uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on
processors that don't support atomic 64bit stores/loads, each access
results in two separate 32bit load/store accesses right?


Unsafe.get|setLong uses locking on those platforms.


Even if it does, it is important whether all accesses to this 64bit
value are using locking and whether they are using the same lock. Aren't
performance counters JVM native variables where just some of them happen
to be updated from Java?


AFAICS PerfCounters have no thread-safety properties or guarantees. So 
it is up to the user of each counter to use it in an appropriate way. I 
think the serviceability folk would have to chime in here as to how 
PerfCounters are supposed to be used.



The PerfCounter methods that access the 64bit value are synchronized,
using PerfCounter instance as a lock. But how is this 64bit value
accessed for example in the jstat utility? Is it possible that jstat can
see one half of the 64bit value before the update and the other half
after the update?


Does jstat access these values directly or only via the synchronized
methods? If the latter then the value can't be torn that way. The
sync method will store the value in 2 32-bit registers, and the
variable load in jstat will take two instructions, but nothing can
touch those registers.


I'm not saying that the value could be corrupted in any way, just that
the unsynchronized observer (like jstat) could see it torn sometimes.


If the value is initially read in a sync block and all updates are also 
synchronized, then I don't think it can. But you need to look at actual 
code to determine this.


David


Regards, Peter



David
-


If this is true and it is not that important, then instead of a
synchronized update of 64bit counter, a 32bit CAS could be used,
optionally (rarely) followed by a second 32bit CAS, like for example:

http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html


I tried this on ARM v6 and it works much better than synchronized
access, but I don't know if it's acceptable. It guarantees eventual
correctness of summed value if the only operation performed is add() (no
set() intermingled) and has the same possibility of incorrect half-half
reads by observers as current PerfCounter has for unsynchronized
observers.

Here's the comparison of unpatched/patched PerfCounter.increment()
micro-benchmark on single-core ARM v6 (Raspbery-PI):

*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =269.34 ns/op (σ =   0.00 ns/op) [
269.34]
2 threads, Tavg =  7,170.48 ns/op (σ = 410.77 ns/op) [
6,783.73,  7,603.95]
3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
[11,792.33, 11,714.67, 12,639.26]
4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
[15,592.04, 18,511.52, 15,642.52, 14,818.16]


*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =166.21 ns/op (σ =   0.00 ns/op) [
166.21]
2 threads, Tavg =332.58 ns/op (σ =   0.12 ns/op) [
332.45,332.70]
3 threads, Tavg =500.30 ns/op (σ =   0.22 ns/op) [
500.04,500.29,500.58]
4 threads, Tavg =667.95 ns/op (σ =   2.11 ns/op) [
665.22,667.18,668.40,671.04]


Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and
so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java
variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or
compareAndSwapXXX. You can't have null here:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
unsafe, jobject obj, jlong offset, jlong e, jlong x))
  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread David Holmes

We've not-so-slightly hijacked Nils' thread here - apologies for that.

On 25/02/2013 8:05 AM, Peter Levart wrote:


Just looked at one way jstat accesses the counters. It runs in a
separate VM and maps-in a file that is already mapped in the observing
VM in the direct buffer. It then accesses it via a LongBuffer view (for
long counters). So there's no synchronization between counter updater
and counter reader. On ARM v6 jstat could see a torn long counter
then, right?


Right. With current implementation of PerfLongCounter it uses simple 
stores (not atomic ops).



The double-32bit-CAS updater that I presented would not make it worse
then on such platforms, I suppose.


No change in tearing abaility.


On the platforms that support 64bit atomic stores, there are not such
problems. And I assume those same platforms also support 64bit CAS, or
are there platforms with 64bit atomic stores and no 64bit CAS?


Most of them actually :) All Java platforms must support atomic 
load/store of 64-bit values to support volatile long and double 
variables. On 32-bit platforms this is done via a range of techniques - 
for example on x86 it is done via the FPU. But these atomic accesses are 
currently restricted to Java volatile field accesses via bytecode - 
there are not exposed via the Unsafe methods, nor are they made 
available via the Atomic:: class in the VM.


Some of these 32-bit platforms also support the 64-bit CAS, which is 
what supports_cx8() is intended to indicate.


If the PerfCounters were supposed to be thread-safe then they might use 
these alternate atomic access operations.


David


Regards, Peter



David


Regards, Peter



David
-


If this is true and it is not that important, then instead of a
synchronized update of 64bit counter, a 32bit CAS could be used,
optionally (rarely) followed by a second 32bit CAS, like for example:

http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html



I tried this on ARM v6 and it works much better than synchronized
access, but I don't know if it's acceptable. It guarantees eventual
correctness of summed value if the only operation performed is
add() (no
set() intermingled) and has the same possibility of incorrect
half-half
reads by observers as current PerfCounter has for unsynchronized
observers.

Here's the comparison of unpatched/patched PerfCounter.increment()
micro-benchmark on single-core ARM v6 (Raspbery-PI):

*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =269.34 ns/op (σ =   0.00 ns/op) [
269.34]
2 threads, Tavg =  7,170.48 ns/op (σ = 410.77 ns/op) [
6,783.73,  7,603.95]
3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
[11,792.33, 11,714.67, 12,639.26]
4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
[15,592.04, 18,511.52, 15,642.52, 14,818.16]


*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
1 threads, Tavg =166.21 ns/op (σ =   0.00 ns/op) [
166.21]
2 threads, Tavg =332.58 ns/op (σ =   0.12 ns/op) [
332.45,332.70]
3 threads, Tavg =500.30 ns/op (σ =   0.22 ns/op) [
500.04,500.29,500.58]
4 threads, Tavg =667.95 ns/op (σ =   2.11 ns/op) [
665.22,667.18,668.40,671.04]


Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the portability
issues in-depth. The javadoc for Unsafe says:

/This method refers to a variable by means of two parameters, and
so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to
methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java
variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are
ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with
longs
used in the single-register addressing mode./


That is the doc for getXXX but not for getAndAddXXX or
compareAndSwapXXX. You can't have null here:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
unsafe, jobject obj, jlong offset, jlong e, jlong x))
  UnsafeWrapper(Unsafe_CompareAndSwapLong);
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(),
offset));
  if (VM_Version::supports_cx8())
return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
jboolean success = false;
ObjectLocker ol(p, THREAD);
if (*addr == e) { *addr = x; success = true; }
return success;
  }
UNSAFE_END

David

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-24 Thread David Holmes
I'm getting very confused regarding the different threads on this. But 
this webrev:


http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/

shows the intrusive event tracing that I'm sure we said we would 
grudgingly accept for 7u but not for JDK 8.


David

On 23/02/2013 2:16 AM, Jason Mehrens wrote:

 From this webrev
http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/  you
are counting the number of throwables constructed.  You might want to
change the name to reflect that.  I don't think anyone would want to
write a spec for how many throwables are thrown given that a throwable
can be wrapped, suppressed, rethrown, etc.
Jason

  Date: Fri, 22 Feb 2013 14:10:02 +0100
  From: nils.loo...@oracle.com
  To: core-libs-dev@openjdk.java.net; serviceability-...@openjdk.java.net
  Subject: Fwd: RFR: 8007806: Need a Throwables performance counter
 
  Does anyone have anything strongly against this? This is a small change
  just to add a performance counter, the code to increment it and read it
  will live in other parts of the code and be a part of a larger separate
  commit.
 
  Alan Bateman gave the response that the name was inappropriate, but I
  don't really have a better alternative, but neither do I have strong
  opinions, so feedback would be appreciated!
 
  How about
  sun.throwables.thrownThrowables ?
 
  Regards,
  Nils Loodin
 
 
   Original Message 
  Subject: RFR: 8007806: Need a Throwables performance counter
  Date: Fri, 08 Feb 2013 18:10:02 +0100
  From: Nils Loodin nils.loo...@oracle.com
  To: core-libs-dev@openjdk.java.net,serviceability_dev_ww_grp
  serviceability_dev_ww_...@oracle.com
 
  It would be interesting to know the number of thrown throwables in the
  JVM, to be able to do some high level application diagnostics /
  statistics. A good way to put this number would be a performance
  counter, since it is accessible both from Java and from the VM.
 
  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
  http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/
 
  Regards,
  Nils Loodin
 
 



Re: RFR: 8007806: Need a Throwables performance counter

2013-02-23 Thread Peter Levart

Hi Nils,

If the counters are updated frequently from multiple threads, there 
might be contention/scalability issues. Instead of synchronization on 
updates, you might consider using atomic updates provided by 
sun.misc.Unsafe, like for example:



Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

 package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
 import java.nio.ByteBuffer;
 import java.nio.ByteOrder;
 import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
 public class PerfCounter {
 private static final Perf perf =
 AccessController.doPrivileged(new Perf.GetPerfAction());
+private static final Unsafe unsafe =
+Unsafe.getUnsafe();

 // Must match values defined in 
hotspot/src/share/vm/runtime/perfdata.hpp

 private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

 private final String name;
 private final LongBuffer lb;
+private final DirectBuffer db;

 private PerfCounter(String name, int type) {
 this.name = name;
 ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
 bb.order(ByteOrder.nativeOrder());
 this.lb = bb.asLongBuffer();
+this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null;
 }

 static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
 /**
  * Returns the current value of the perf counter.
  */
-public synchronized long get() {
+public long get() {
+if (db != null) {
+return unsafe.getLongVolatile(null, db.address());
+}
+else {
+synchronized (this) {
-return lb.get(0);
-}
+return lb.get(0);
+}
+}
+}

 /**
  * Sets the value of the perf counter to the given newValue.
  */
-public synchronized void set(long newValue) {
+public void set(long newValue) {
+if (db != null) {
+unsafe.putOrderedLong(null, db.address(), newValue);
+}
+else {
+synchronized (this) {
-lb.put(0, newValue);
-}
+lb.put(0, newValue);
+}
+}
+}

 /**
  * Adds the given value to the perf counter.
  */
-public synchronized void add(long value) {
-long res = get() + value;
+public void add(long value) {
+if (db != null) {
+unsafe.getAndAddLong(null, db.address(), value);
+}
+else {
+synchronized (this) {
+long res = lb.get(0) + value;
-lb.put(0, res);
+lb.put(0, res);
+}
+}
 }

 /**



Testing the PerfCounter.increment() method in a loop on multiple threads 
sharing the same PerfCounter instance, for example, on a 4-core Intel i7 
machine produces the following results:


#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
   1 threads, Tavg = 19.02 ns/op (? =   0.00 ns/op)
   2 threads, Tavg =109.93 ns/op (? =   6.17 ns/op)
   3 threads, Tavg =136.64 ns/op (? =   2.99 ns/op)
   4 threads, Tavg =293.26 ns/op (? =   5.30 ns/op)
   5 threads, Tavg =316.94 ns/op (? =   6.28 ns/op)
   6 threads, Tavg =686.96 ns/op (? =   7.09 ns/op)
   7 threads, Tavg =793.28 ns/op (? =  10.57 ns/op)
   8 threads, Tavg =898.15 ns/op (? =  14.63 ns/op)


With the presented patch, the results are a little better:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
# Measure:
   1 threads, Tavg =  5.22 ns/op (? =   0.00 ns/op)
   2 threads, Tavg = 34.51 ns/op (? =   0.60 ns/op)
   3 threads, Tavg = 54.85 ns/op (? =   1.42 ns/op)
   4 threads, Tavg = 74.67 ns/op (? =   1.71 ns/op)
   5 threads, Tavg = 94.71 ns/op (? =  41.68 ns/op)
   6 threads, Tavg =114.80 ns/op (? =  32.10 ns/op)
   7 threads, Tavg =136.70 ns/op (? =  26.80 ns/op)
   8 threads, Tavg =158.48 ns/op (? =   9.93 ns/op)


The scalability is not much better, but the raw speed is, so it might 
present less contention when used in real-world code. If you wanted even 
better scalability, there is a new class in JDK8, the 
java.util.concurrent.LongAdder. But that doesn't buy atomic set() - 
only add(). And it can't update native-memory variables, so it could 
only be used for add-only counters and in conjunction with a background 
thread that would periodically flush the sum to the native memory


Regards, Peter


On 02/08/2013 06:10 PM, Nils Loodin wrote:
It would be interesting to know the number of thrown throwables in the 
JVM, to be able to do some high level application diagnostics / 
statistics. A good way to put this number would be a performance 

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-23 Thread Alan Bateman

On 23/02/2013 19:39, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there 
might be contention/scalability issues. Instead of synchronization on 
updates, you might consider using atomic updates provided by 
sun.misc.Unsafe, like for example:


Most of the jvmstat counters are in VM and we only update a small number 
of counters in the libraries. Even so, I think your idea is good as 
further usage could potentially to be an issue, particularly 
add/increment (which involves a get at this). So irrespective of Nil's 
patch (which I think is now just a proposal to add a counter, not 
original patch that Jason was commenting on) then we should take your patch.


-Alan



Re: RFR: 8007806: Need a Throwables performance counter

2013-02-23 Thread David Holmes

On 24/02/2013 6:08 AM, Alan Bateman wrote:

On 23/02/2013 19:39, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Most of the jvmstat counters are in VM and we only update a small number
of counters in the libraries. Even so, I think your idea is good as
further usage could potentially to be an issue, particularly
add/increment (which involves a get at this). So irrespective of Nil's
patch (which I think is now just a proposal to add a counter, not
original patch that Jason was commenting on) then we should take your
patch.


Before we rush down this path. Atomic operations on 64-bit types are not 
supported natively on all 32-bit platforms. Atomic updates on those 
platforms will degenerate into locking - which is the barrier to 
implementing frequently used counters in the first place.


David


-Alan



Re: RFR: 8007806: Need a Throwables performance counter

2013-02-23 Thread David Holmes

Peter,

In your use of Unsafe you pass null as the object. I'm pretty certain 
you can't pass null here. Unsafe operates on fields or array elements.


David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

  package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
  import java.nio.ByteBuffer;
  import java.nio.ByteOrder;
  import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
  public class PerfCounter {
  private static final Perf perf =
  AccessController.doPrivileged(new Perf.GetPerfAction());
+private static final Unsafe unsafe =
+Unsafe.getUnsafe();

  // Must match values defined in
hotspot/src/share/vm/runtime/perfdata.hpp
  private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

  private final String name;
  private final LongBuffer lb;
+private final DirectBuffer db;

  private PerfCounter(String name, int type) {
  this.name = name;
  ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
  bb.order(ByteOrder.nativeOrder());
  this.lb = bb.asLongBuffer();
+this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null;
  }

  static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
  /**
   * Returns the current value of the perf counter.
   */
-public synchronized long get() {
+public long get() {
+if (db != null) {
+return unsafe.getLongVolatile(null, db.address());
+}
+else {
+synchronized (this) {
-return lb.get(0);
-}
+return lb.get(0);
+}
+}
+}

  /**
   * Sets the value of the perf counter to the given newValue.
   */
-public synchronized void set(long newValue) {
+public void set(long newValue) {
+if (db != null) {
+unsafe.putOrderedLong(null, db.address(), newValue);
+}
+else {
+synchronized (this) {
-lb.put(0, newValue);
-}
+lb.put(0, newValue);
+}
+}
+}

  /**
   * Adds the given value to the perf counter.
   */
-public synchronized void add(long value) {
-long res = get() + value;
+public void add(long value) {
+if (db != null) {
+unsafe.getAndAddLong(null, db.address(), value);
+}
+else {
+synchronized (this) {
+long res = lb.get(0) + value;
-lb.put(0, res);
+lb.put(0, res);
+}
+}
  }

  /**



Testing the PerfCounter.increment() method in a loop on multiple threads
sharing the same PerfCounter instance, for example, on a 4-core Intel i7
machine produces the following results:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
1 threads, Tavg = 19.02 ns/op (? =   0.00 ns/op)
2 threads, Tavg =109.93 ns/op (? =   6.17 ns/op)
3 threads, Tavg =136.64 ns/op (? =   2.99 ns/op)
4 threads, Tavg =293.26 ns/op (? =   5.30 ns/op)
5 threads, Tavg =316.94 ns/op (? =   6.28 ns/op)
6 threads, Tavg =686.96 ns/op (? =   7.09 ns/op)
7 threads, Tavg =793.28 ns/op (? =  10.57 ns/op)
8 threads, Tavg =898.15 ns/op (? =  14.63 ns/op)


With the presented patch, the results are a little better:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
# Measure:
1 threads, Tavg =  5.22 ns/op (? =   0.00 ns/op)
2 threads, Tavg = 34.51 ns/op (? =   0.60 ns/op)
3 threads, Tavg = 54.85 ns/op (? =   1.42 ns/op)
4 threads, Tavg = 74.67 ns/op (? =   1.71 ns/op)
5 threads, Tavg = 94.71 ns/op (? =  41.68 ns/op)
6 threads, Tavg =114.80 ns/op (? =  32.10 ns/op)
7 threads, Tavg =136.70 ns/op (? =  26.80 ns/op)
8 threads, Tavg =158.48 ns/op (? =   9.93 ns/op)


The scalability is not much better, but the raw speed is, so it might
present less contention when used in real-world code. If you wanted even
better scalability, there is a new class in JDK8, the
java.util.concurrent.LongAdder. But that doesn't buy atomic set() -
only add(). And it can't update native-memory variables, so it could
only be used for add-only counters and in conjunction with a background
thread that would periodically flush the sum to the native memory

Regards, Peter


On 02/08/2013 

RE: RFR: 8007806: Need a Throwables performance counter

2013-02-22 Thread Jason Mehrens







From this webrev 
http://cr.openjdk.java.net/~nloodin/exception-tracing/webrev.01/  you are 
counting the number of throwables constructed.  You might want to change the 
name to reflect that.  I don't think anyone would want to write a spec for how 
many throwables are thrown given that a throwable can be wrapped, suppressed, 
rethrown, etc. Jason
 Date: Fri, 22 Feb 2013 14:10:02 +0100
 From: nils.loo...@oracle.com
 To: core-libs-dev@openjdk.java.net; serviceability-...@openjdk.java.net
 Subject: Fwd: RFR: 8007806: Need a Throwables performance counter
 
 Does anyone have anything strongly against this? This is a small change 
 just to add a performance counter, the code to increment it and read it 
 will live in other parts of the code and be a part of a larger separate 
 commit.
 
 Alan Bateman gave the response that the name was inappropriate, but I 
 don't really have a better alternative, but neither do I have strong 
 opinions, so feedback would be appreciated!
 
 How about
 sun.throwables.thrownThrowables ?
 
 Regards,
 Nils Loodin
 
 
  Original Message 
 Subject: RFR: 8007806: Need a Throwables performance counter
 Date: Fri, 08 Feb 2013 18:10:02 +0100
 From: Nils Loodin nils.loo...@oracle.com
 To: core-libs-dev@openjdk.java.net,serviceability_dev_ww_grp 
 serviceability_dev_ww_...@oracle.com
 
 It would be interesting to know the number of thrown throwables in the
 JVM, to be able to do some high level application diagnostics /
 statistics. A good way to put this number would be a performance
 counter, since it is accessible both from Java and from the VM.
 
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
 http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/
 
 Regards,
 Nils Loodin
 
 



  

Re: RFR: 8007806: Need a Throwables performance counter

2013-02-19 Thread Nils Loodin

Hey Kasper!

You're right that there are methods that throw exceptions as a part of 
the normal program flow.


However, this number can be (and has been) used as a very high level 
telemetry for an application. Depending on exactly how large this number 
is, and how fast it is growing, something can be seen to be amis in the 
application. Paired together with data of what type of exceptions are 
being thrown, and from where, it can be useful for support to have this 
information.


Regards,
Nils Loodin

On 02/12/2013 06:28 PM, Kasper Nielsen wrote:

Jrockit mission control supports an exception count based on exception
type.

I have found the approach of just counting the total number of exception
completely useless.
Mainly because there are methods that throw exceptions as part of the
normal flow. For example, Class.forName() is commonly used to test
whether or not a certain
class is on the classpath.

And most developers will cringe whenever they see an exception count0
for an application they think is bug free.





On Fri, Feb 8, 2013 at 6:10 PM, Nils Loodin nils.loo...@oracle.com
mailto:nils.loo...@oracle.com wrote:

It would be interesting to know the number of thrown throwables in
the JVM, to be able to do some high level application diagnostics /
statistics. A good way to put this number would be a performance
counter, since it is accessible both from Java and from the VM.

http://bugs.sun.com/__bugdatabase/view_bug.do?bug___id=8007806
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
http://cr.openjdk.java.net/~__nloodin/8007806/webrev.00/
http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

Regards,
Nils Loodin






Re: RFR: 8007806: Need a Throwables performance counter

2013-02-12 Thread Kasper Nielsen
Jrockit mission control supports an exception count based on exception
type.

I have found the approach of just counting the total number of exception
completely useless.
Mainly because there are methods that throw exceptions as part of the
normal flow. For example, Class.forName() is commonly used to test whether
or not a certain
class is on the classpath.

And most developers will cringe whenever they see an exception count0 for
an application they think is bug free.





On Fri, Feb 8, 2013 at 6:10 PM, Nils Loodin nils.loo...@oracle.com wrote:

 It would be interesting to know the number of thrown throwables in the
 JVM, to be able to do some high level application diagnostics / statistics.
 A good way to put this number would be a performance counter, since it is
 accessible both from Java and from the VM.

 http://bugs.sun.com/**bugdatabase/view_bug.do?bug_**id=8007806http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
 http://cr.openjdk.java.net/~**nloodin/8007806/webrev.00/http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

 Regards,
 Nils Loodin



Re: RFR: 8007806: Need a Throwables performance counter

2013-02-09 Thread Alan Bateman

On 08/02/2013 17:10, Nils Loodin wrote:
It would be interesting to know the number of thrown throwables in the 
JVM, to be able to do some high level application diagnostics / 
statistics. A good way to put this number would be a performance 
counter, since it is accessible both from Java and from the VM.


http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

Regards,
Nils Loodin

Nils - this seems incomplete, where is the counter updated?

-Alan