Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-02 Thread Claes Redestad

One less (benign) race - and possibly more efficient, too :-)

If we really worry about the startup costs here, we could make it so 
that the three Cache classes

themselves aren't loaded until someone actually has a need for them:

http://cr.openjdk.java.net/~redestad/scratch/coderresult_cache.00/

/Claes

On 2018-03-02 21:02, Xueming Shen wrote:


To follow Claes's suggestion to make the CoderResult.Cache.cache field 
final and allocate early.


issue: https://bugs.openjdk.java.net/browse/JDK-8198966
webrev: http://cr.openjdk.java.net/~sherman/8198966/webrev/

Thanks,
-Sherman




Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-02 Thread Xueming Shen


To follow Claes's suggestion to make the CoderResult.Cache.cache field 
final and allocate early.


issue: https://bugs.openjdk.java.net/browse/JDK-8198966
webrev: http://cr.openjdk.java.net/~sherman/8198966/webrev/

Thanks,
-Sherman


Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-02 Thread Peter Levart

Hi Sherman,

On 03/02/18 03:20, David Holmes wrote:

Also this:

195 private Map cache = null;

should now be volatile.


Either that, or you should load the 'cache' field only once per method 
call into a local variable unless you want reorderings of reads and 
writes observed from concurrent threads to result in NPE-s.


If you do replace it with a volatile, you should also load the field 
into local variable just once (although not strictly necessary for 
correctness).


Also:

 193 private abstract static class Cache {
 197 *protected* abstract CoderResult create(int len);

the abstract method is protected, but the implementations:

 217 = new Cache() {
 218 *public* CoderResult create(int len) {
 219 return new CoderResult(CR_MALFORMED, len);
 220 }};

...are public.


Since you are dealing with private nested class, all create() methods 
could be package-private. Less words to write and read...


Regards, Peter


Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-01 Thread Xueming Shen

On 3/1/18, 4:35 PM, David Holmes wrote:

When you replace synchronized code with concurrent data structures you 
introduce race conditions that are precluded in the synchronized code. 
These need to be examined carefully to ensure they are safe. For 
example, whenever you replace a HashMap with a ConcurrentHashMap you 
need to see if put() needs to be replaced by putIfAbsent().




Hi David,

The assumption here is that putIfAbsent() does not help/save anything as 
the value object
would have been created already when it reaches here. And it appears 
there is no need here
to have the check to be atomic, replacing any existing key/value 
pair is fine in this use
scenario.  Was thinking about computIfAbsent(), but concluded it's just 
little overdone (wonder
why we decided to use the WeakReference in this case, it probably is not 
worth it, but keep it as

is for "compatibility" concern).

-Sherman


Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-01 Thread David Holmes

Hi Sherman,

On 24/02/2018 11:26 AM, Xueming Shen wrote:

Hi,

Please help review the proposed change to remove the potential performance
bottleneck in CoderResult caused by the "over" synchronization the cache
mechanism.

issue: https://bugs.openjdk.java.net/browse/JDK-8187653
webrev: http://cr.openjdk.java.net/~sherman/8187653/webrev

Notes:
While the original test case (new String/String.getBytes() with UTF8 as 
showed in the
stacktrace) described in the bug report might no longer be reproducible 
in jdk9, as
we have been optimizing lots of String related  char[]/byte[] coding 
path away from
the paths that use CoderResult. But it is still a concern/issue for the 
"general"
CharsetDe/Encoder.de/encode() operation, in which the malformed or 
unmappable

CoderResult object is returned.

As showed in the "[synchronized]" section in bm.scores [1], which is 
from the simple
jmh benchmark test CoderResultBM.java [2], the sores are getting 
significant worse

when the number of concurrent de/encoding threads gets bigger.

It appears the easy fix is to replace the sync mechanism from "method 
synchronized
+ HashMap" to  "ConcurrentHashMap" solves the problem, as showed in the 
same bm
result [1] in [ConcurrentHashMap] section, because most of the accesses 
to the caching
HashMap is read operation. The ConcurrentHahsMap's "almost non-block for 
retrieval

operation" really helps here.


When you replace synchronized code with concurrent data structures you 
introduce race conditions that are precluded in the synchronized code. 
These need to be examined carefully to ensure they are safe. For 
example, whenever you replace a HashMap with a ConcurrentHashMap you 
need to see if put() needs to be replaced by putIfAbsent().


David
-

There is another fact that might help us optimize further here. For most 
of our charsets
in JDK repository (with couple exceptions), the length of malformed and 
unmappable
CoderResult that these charsets might trigger actually is never longer 
than 4. So we might
not need to access the ConcurrentHashMap cache at all in normal use 
scenario. I'm putting
a CoderResult[4] on top the hashmap cache in proposed webrev. It does 
not improve the
performance significantly, but when the thread number gets bigger, a 
10%+ improvement
is observed. So I would assume it might be something worth doing, given 
its cost is really

low.

Thanks,
Sherman


[1] http://cr.openjdk.java.net/~sherman/8187653/bm.scores
[2] http://cr.openjdk.java.net/~sherman/8187653/CoderResultBM.java
[3] http://cr.openjdk.java.net/~sherman/8187653/bm.scores.ms932



Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-01 Thread Alan Bateman



On 01/03/2018 20:10, Xueming Shen wrote:


Right, they all should be final.
webrev has been updated accordingly.

Thanks, it looks okay now.


Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-01 Thread Xueming Shen

On 03/01/2018 12:00 PM, Alan Bateman wrote:



On 01/03/2018 19:42, Roger Riggs wrote:

Hi Sherman,

Looks to be a good fix with predictable behavior and performance.

+1, Roger

I assume malformed4 and unmappable4 should be final, the XXXCache fields too.

-Alan


Right, they all should be final.
webrev has been updated accordingly.

-sherman


Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-01 Thread Alan Bateman



On 01/03/2018 19:42, Roger Riggs wrote:

Hi Sherman,

Looks to be a good fix with predictable behavior and performance.

+1, Roger
I assume malformed4 and unmappable4 should be final, the XXXCache fields 
too.


-Alan


Re: RFR JDK-8187653: Lock in CoderResult.Cache becomes performance bottleneck

2018-03-01 Thread Roger Riggs

Hi Sherman,

Looks to be a good fix with predictable behavior and performance.

+1, Roger

On 2/23/2018 8:26 PM, Xueming Shen wrote:

Hi,

Please help review the proposed change to remove the potential 
performance

bottleneck in CoderResult caused by the "over" synchronization the cache
mechanism.

issue: https://bugs.openjdk.java.net/browse/JDK-8187653
webrev: http://cr.openjdk.java.net/~sherman/8187653/webrev

Notes:
While the original test case (new String/String.getBytes() with UTF8 
as showed in the
stacktrace) described in the bug report might no longer be 
reproducible in jdk9, as
we have been optimizing lots of String related  char[]/byte[] coding 
path away from
the paths that use CoderResult. But it is still a concern/issue for 
the "general"
CharsetDe/Encoder.de/encode() operation, in which the malformed or 
unmappable

CoderResult object is returned.

As showed in the "[synchronized]" section in bm.scores [1], which is 
from the simple
jmh benchmark test CoderResultBM.java [2], the sores are getting 
significant worse

when the number of concurrent de/encoding threads gets bigger.

It appears the easy fix is to replace the sync mechanism from "method 
synchronized
+ HashMap" to  "ConcurrentHashMap" solves the problem, as showed in 
the same bm
result [1] in [ConcurrentHashMap] section, because most of the 
accesses to the caching
HashMap is read operation. The ConcurrentHahsMap's "almost non-block 
for retrieval

operation" really helps here.

There is another fact that might help us optimize further here. For 
most of our charsets
in JDK repository (with couple exceptions), the length of malformed 
and unmappable
CoderResult that these charsets might trigger actually is never longer 
than 4. So we might
not need to access the ConcurrentHashMap cache at all in normal use 
scenario. I'm putting
a CoderResult[4] on top the hashmap cache in proposed webrev. It does 
not improve the
performance significantly, but when the thread number gets bigger, a 
10%+ improvement
is observed. So I would assume it might be something worth doing, 
given its cost is really

low.

Thanks,
Sherman


[1] http://cr.openjdk.java.net/~sherman/8187653/bm.scores
[2] http://cr.openjdk.java.net/~sherman/8187653/CoderResultBM.java
[3] http://cr.openjdk.java.net/~sherman/8187653/bm.scores.ms932