Re: RFR: 8284638: store skip buffers in InputStream Object [v15]

2022-05-29 Thread Sergey Kuksenko
On Tue, 24 May 2022 20:40:51 GMT, XenoAmess  wrote:

>> @jmehrens what about this then?
>> I think it safe now(actually this mechanism is learned from Reader)
>
> XenoAmess has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   invert if; refine javadoc.

Another InputStreams cleaning direction.

Many InputStream subclasses have read(byte[],...) as a primary implementation. 
Somehow they should implement 1-byte reading (read()) via array reading. All 
imaginable strategies are used:
- allocate byte[1] per read() invocation
- cache byte[1] preallocated
- cache byte[1], allocation on demand
- share 1-byte read buffer and skip buffer, as preallocating and allocating on 
demand.
- etc

So cleaning and making consistent all 1-byte read implementations - also may be 
valuable.

-

PR: https://git.openjdk.java.net/jdk/pull/5872


Re: RFR: 8284638: store skip buffers in InputStream Object [v15]

2022-05-29 Thread Sergey Kuksenko
On Tue, 24 May 2022 20:40:51 GMT, XenoAmess  wrote:

>> @jmehrens what about this then?
>> I think it safe now(actually this mechanism is learned from Reader)
>
> XenoAmess has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   invert if; refine javadoc.

More practically.
This PR has a noticeable negative effect - it increases the size of InputStream 
objects. Moreover, it increases the size of InputStream subclasses which has 
own skip() implementation and don't need this superclass modification.

Let's look into InputStream subclasses.
I've checked 51 InputStream from classlib. 30 of them either have their own 
skip() implementation or use super.skip() other than from InputStream. This PR 
is definitely harmful to these 30 classes. These 30 classes can be divided into 
the following categories:

1) Clean non-allocating skip() implementation (22 classes):
 - BufferedInputStream (java.io) 
 - ByteArrayInputStream (java.io)
 - FileInputStream (java.io) 
 - FilterInputStream (java.io)
 - PeekInputStream in ObjectInputStream (java.io)
 - BlockDataInputStream in ObjectInputStream (java.io) 
 - PushbackInputStream (java.io) 
 - FastInputStream in Manifest (java.util.jar)
 - ZipFileInputStream in ZipFile (java.util.zip)
 - CipherInputStream (javax.crypto)
 - MeteredStream (sun.net.www)
 - Anonymous in nullInputStream() in InputStream (java.io)
 - DataInputStream (java.io)
 - Anonymous in readTrailer() in GZIPInputStream (java.util.zip)
 - FtpInputStream in FtpURLConnection (sun.net.www.protocol.ftp)
 - JarURLInputStream in JarURLConnection (sun.net.www.protocol.jar)
 - PlainTextInputStream (sun.net.www.content.text)
 - PushbackInputStream (java.io)
 - TelnetInputStream (sun.net)
 - UnclosableInputStream in FileInputStreamPool (sun.security.provider)
 - MeteredStream (sun.net.www)
 - KeepAliveStream (sun.net.www.http)

2) Partially clean skip() implementation (1 class):
 - ChannelInputStream (sun.nio.ch)
   Note: clean skip impl when using SeekableByteChannel (most usages) otherwise 
super.skip() is used, and it may be easily rewritten using the existing 
internal buffer.

3) Unavoidable skip buffers (7 classes):
 - PipeInputStream in Process (java.lang) // per skip() invocation buffer 
byte[2048]
 - CheckedInputStream (java.util.zip) // per skip() invocation buffer 
byte[512]
 - InflaterInputStream (java.util.zip)// per skip() invocation buffer 
byte[512]
 - AppInputStream in SSLSocketImpl (sun.security.ssl) // per skip() invocation 
buffer byte[256]
 - DeflaterInputStream (java.util.zip)   //  cached  skip buffer, byte[512], 
allocated on demand
 - ZipInputStream (java.util.zip)   //   preallocated skip buffer byte[512]
 - HttpInputStream in HttpURLConnection (sun.net.www.protocol.http) //  cached  
skip buffer, byte[8096], allocated on demand

It would be better to clean all skip implementations for the latter category 
and make it consistent. Now it's totally a mess. All possible strategies were 
implemented.

Now let's consider classes which uses InputStream.skip() implementation (21 
classes):

4) skip() is not implemented, when trivial implementation is possible (4 
classes):
 - EmptyInputStream (sun.net.www.protocol.http)
 - NullInputStream in ProcessBuilder (java.lang)
 - ObjectInputStream (java.io)
 - extObjectInputStream (javax.crypto)

5) skip() is not implemented, when not-so-trivial implementation is possible (9 
classes):
 - Anonymous in getInputStream() in NioSocketImpl (sun.nio.ch)  
   Notes: temp direct buffer is used for reading, it's possible to implement 
skip over the direct buffer, save allocation + no copying from direct buffer to 
byte[]
 - Anonymous in newInputStream() in Channels (java.nio.channels)
   Notes: temp direct buffer is used for reading, it's possible to implement 
skip over the direct buffer, save allocation + no copying from direct buffer to 
byte[]
 - ChunkedInputStream (sun.net.www.http)
   Notes: skip() maybe implemented with the existing internal buffer
 - DecInputStream in Base64 (java.util)   
 - ErrorStream in HttpURLConnection (sun.net.www.protocol.http)
   Notes: skip (but easily can be implemented with internal buffer + delegation 
to tail stream)
 - PipedInputStream (java.io)
   Notes: skip() may be implemented with the existing internal buffer
 - SequenceInputStream (java.io)   
   Notes: skip() maybe implemented with delegation
 - SocketInputStream (sun.nio.ch)   
   Notes: temp direct buffer is used for reading, it's possible to implement 
skip over the direct buffer, save allocation + no copying from direct buffer to 
byte[]
 - SocketInputStream in Socket (java.net)
   Notes: skip() maybe implemented with delegation

And the last category:

6) skip() is not implemented, and skip buffer is unavoidable (8 classes):
  - VerifierStream in JarVerifier (java.util.jar)
  - DigestInputStream (java.security)
  - HttpCaptureInputStream (sun.net.www.http)  
  - InflaterInputStream (java.util.zip)
  - GZIPIn

Re: RFR: 8284638: store skip buffers in InputStream Object [v15]

2022-05-29 Thread Sergey Kuksenko
On Thu, 26 May 2022 15:43:57 GMT, XenoAmess  wrote:

> > Is there any practical scenario where the current code (skip buffer 
> > allocation on each invocation) creates issues?
> 
> @kuksenko Not found any yet :)

In that case, what is the value of this PR? Do we need a code change for the 
sake of code change?

-

PR: https://git.openjdk.java.net/jdk/pull/5872


Re: RFR: 8284638: store skip buffers in InputStream Object [v15]

2022-05-25 Thread Sergey Kuksenko
On Tue, 24 May 2022 20:40:51 GMT, XenoAmess  wrote:

>> @jmehrens what about this then?
>> I think it safe now(actually this mechanism is learned from Reader)
>
> XenoAmess has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   invert if; refine javadoc.

Is there any practical scenario where the current code (skip buffer allocation 
on each invocation) creates issues?
Most leaf InputStreams have their own skip implementation. And the same 
question for Reader class.

-

PR: https://git.openjdk.java.net/jdk/pull/5872


Re: RFR(s): 8176894 Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute, merge in TreeMap

2019-11-22 Thread Sergey Kuksenko

Thanks! Looks good.

The only I have comments for added microbenchmark:

* Keys and values should be preallocated in setup. We want to measure TreeMap, 
not boxing. I'd prefer to see preallocated array of keys.

* What is the reason to have "shuffled" parameter? Random keys distribution is 
more preferable.

* pairs of similar operations looks weird. I mean code like this:
   bh.consume(map.put(key, key));
   bh.consume(map.put(key, key));
  The second put has advantage due to the fact that all required data is in CPU 
cache already. If we want to take into account operations over existing keys - it 
would be better to have two keys in the preallocated array. If array of keys is 
shuffled -> put operations for equal keys won't be following as sequentially. I 
think it will be closer to real behavior.
 
* general notice about random keys. Typically I use the following scheme:


@Param("0")
long seed;

@Setup()
public void setup() {
   Random rnd = seed==0 ? new Random() : new Random(seed);
   // use rnd for generating data
}

In default case we always generates random data and cover quite nice distribution of 
really random cases. But if we found some "bad" behavior in some cases or we 
want to fix sequence of out tree modifications - we always may setup seed parameter as we 
wish and make it fixed.

On 10/13/19 2:51 AM, Tagir Valeev wrote:

Hello!

Please review the updated patch (no sponsorship is necessary; review only):
https://cr.openjdk.java.net/~tvaleev/webrev/8176894/r3/
https://bugs.openjdk.java.net/browse/JDK-8176894

The difference from the previous version [1] is minimal: I just
noticed that the specification for computeIfAbsent should say
"mapping" instead of "remapping", so I fixed the spec. No changes in
code/tests.

Also please review the associated CSR:
https://bugs.openjdk.java.net/browse/JDK-8227666
I updated it, according to Joe Darcy's comments.

An additional explanation and background is copied below from my
original e-mail [2] for your convenience:

The patch was originally submitted by Sergey Kuksenko in March 2017 and
reviewed by some people:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-March/046825.html
The latest patch submitted by Sergey is here:
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/

I asked Sergey why it was abandoned. Seems there were no particular reason
and Sergey asked if I could pick up this work. I will be happy to finish it.

Here's the list of differences between the latest Sergey patch and my patch:
- A bug is fixed in putIfAbsent implementation. TreeMap.java, lines 576 and
596:  `|| oldValue == null` condition added (the null value should be
overwritten no matter what)
- A testcase is added to cover this problem (InPlaceOpsCollisions.java,
also checks HashMap and LinkedHashMap). Not sure if it's the best place for
such test though. Probably a separate file would be better?
- TreeMap.merge() implementation is added.
- TreeMap is added to the FunctionalCMEs.java test (btw this file copyright
says that it's a subject to Classpath exception which is probably wrong?)
- Simple microbenchmark is added: TreeMapUpdate.java

My quick benchmarking shows that optimized version is indeed faster for the
most of the tests and no regression is observed for put() method. Here's
raw results of jdk13-ea+26 and jdk13-ea+26+patch if anybody is interested.
http://cr.openjdk.java.net/~tvaleev/jmh/JDK-8176894/

With best regards,
Tagir Valeev.

[1] https://cr.openjdk.java.net/~tvaleev/webrev/8176894/r2/
[2] https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-July/061309.html


Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap

2017-04-10 Thread Sergey Kuksenko

Hi Claes,

There is no need to backport it to 8 & 9. 8 & 9 uses default 
implementation, which is free from such tree corruption issue.



On 04/06/2017 03:45 AM, Claes Redestad wrote:

Hi Sergey,

this looks good to me*, but I can't help wonder if the modCount 
checking is something that should be done separately as a bug fix 
(with a higher priority) and be backported to 8 and 9? Alternatively 
re-categorize this fix as such.


Thanks!

/Claes

* I wouldn't mind seeing the cleanup Martin suggested, i.e., write the 
remapValue as:


private V remapValue(Entry t, K key, BiFunctionsuper V, ? extends V> remappingFunction) {

V newValue = remappingFunction.apply(key, t.value);
if (newValue == null) {
deleteEntry(t);
} else {
// replace old mapping
t.value = newValue;
}
return newValue;
 }

On 2017-03-28 21:22, Sergey Kuksenko wrote:

Friendly reminder.

Have you have chance to review the latest version?


On 03/17/2017 12:45 PM, Sergey Kuksenko wrote:

Hi, All.

I realized (thanks Tagir V.) that if we don't check modCount after 
calling mapping function we may get corrupted tree structure.

new webrev for review:
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/

On 03/17/2017 11:29 AM, Martin Buchholz wrote:
Thanks!  This looks pretty good.  I have a similar effort in 
progress to improve bulk collection operations, most of which made 
it into jdk9.


---

Please use standard java.util whitespace, as Aleksey suggested.

---

Below (and in compute) I wpuld simply
return newValue;
saving a line of code and making it clearer that we are returning 
the result of the remappingFunction


 676 private V remapValue(Entry t, K key, BiFunctionsuper K, ? super V, ? extends V> remappingFunction) {

 677 V newValue = remappingFunction.apply(key, t.value);
 678 if (newValue == null) {
 679 deleteEntry(t);
 680 return null;
 681 } else {
 682 // replace old mapping
 683 t.value = newValue;
 684 return newValue;
 685 }
 686 }
---

This code is surely tested but testing could also surely be 
improved.  That's probably not your job though (it may be mine!)


I would probably try hand-injecting some bugs into a copy of the 
code and seeing if our jtreg tests catch it, to increase coverage 
confidence.



On Thu, Mar 16, 2017 at 12:04 PM, Sergey Kuksenko 
mailto:sergey.kukse...@oracle.com>> 
wrote:


Hi All,

Please, review:
https://bugs.openjdk.java.net/browse/JDK-8176894
<https://bugs.openjdk.java.net/browse/JDK-8176894>
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ 

<http://cr.openjdk.java.net/%7Eskuksenko/corelibs/utils/8176894/webrev.00/> 



The issue was created for JDK10 in order to don't disturb JDK9
before launch.

-- Best regards,
Sergey Kuksenko










--
Best regards,
Sergey Kuksenko



Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap

2017-03-28 Thread Sergey Kuksenko

Friendly reminder.

Have you have chance to review the latest version?


On 03/17/2017 12:45 PM, Sergey Kuksenko wrote:

Hi, All.

I realized (thanks Tagir V.) that if we don't check modCount after 
calling mapping function we may get corrupted tree structure.

new webrev for review:
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/

On 03/17/2017 11:29 AM, Martin Buchholz wrote:
Thanks!  This looks pretty good.  I have a similar effort in progress 
to improve bulk collection operations, most of which made it into jdk9.


---

Please use standard java.util whitespace, as Aleksey suggested.

---

Below (and in compute) I wpuld simply
return newValue;
saving a line of code and making it clearer that we are returning the 
result of the remappingFunction


 676 private V remapValue(Entry t, K key, BiFunctionsuper K, ? super V, ? extends V> remappingFunction) {

 677 V newValue = remappingFunction.apply(key, t.value);
 678 if (newValue == null) {
 679 deleteEntry(t);
 680 return null;
 681 } else {
 682 // replace old mapping
 683 t.value = newValue;
 684 return newValue;
 685 }
 686 }
---

This code is surely tested but testing could also surely be 
improved.  That's probably not your job though (it may be mine!)


I would probably try hand-injecting some bugs into a copy of the code 
and seeing if our jtreg tests catch it, to increase coverage confidence.



On Thu, Mar 16, 2017 at 12:04 PM, Sergey Kuksenko 
mailto:sergey.kukse...@oracle.com>> wrote:


Hi All,

Please, review:
https://bugs.openjdk.java.net/browse/JDK-8176894
<https://bugs.openjdk.java.net/browse/JDK-8176894>
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/
<http://cr.openjdk.java.net/%7Eskuksenko/corelibs/utils/8176894/webrev.00/>

The issue was created for JDK10 in order to don't disturb JDK9
before launch.

--     Best regards,
Sergey Kuksenko






--
Best regards,
Sergey Kuksenko



Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap

2017-03-17 Thread Sergey Kuksenko

Hi, All.

I realized (thanks Tagir V.) that if we don't check modCount after 
calling mapping function we may get corrupted tree structure.

new webrev for review:
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/

On 03/17/2017 11:29 AM, Martin Buchholz wrote:
Thanks!  This looks pretty good.  I have a similar effort in progress 
to improve bulk collection operations, most of which made it into jdk9.


---

Please use standard java.util whitespace, as Aleksey suggested.

---

Below (and in compute) I wpuld simply
return newValue;
saving a line of code and making it clearer that we are returning the 
result of the remappingFunction


 676 private V remapValue(Entry t, K key, BiFunctionK, ? super V, ? extends V> remappingFunction) {

 677 V newValue = remappingFunction.apply(key, t.value);
 678 if (newValue == null) {
 679 deleteEntry(t);
 680 return null;
 681 } else {
 682 // replace old mapping
 683 t.value = newValue;
 684 return newValue;
 685 }
 686 }
---

This code is surely tested but testing could also surely be improved.  
That's probably not your job though (it may be mine!)


I would probably try hand-injecting some bugs into a copy of the code 
and seeing if our jtreg tests catch it, to increase coverage confidence.



On Thu, Mar 16, 2017 at 12:04 PM, Sergey Kuksenko 
mailto:sergey.kukse...@oracle.com>> wrote:


Hi All,

Please, review:
https://bugs.openjdk.java.net/browse/JDK-8176894
<https://bugs.openjdk.java.net/browse/JDK-8176894>
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/
<http://cr.openjdk.java.net/%7Eskuksenko/corelibs/utils/8176894/webrev.00/>

The issue was created for JDK10 in order to don't disturb JDK9
before launch.

    -- 
    Best regards,

Sergey Kuksenko




--
Best regards,
Sergey Kuksenko



Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap

2017-03-17 Thread Sergey Kuksenko

Got it. It definitely should be done.


On 03/17/2017 07:33 AM, Tagir F. Valeev wrote:

Hello!

Great and long-awaited feature, thanks!

I don't see that modCount is checked after calling mappingFunction in
compute/computeIfAbsent/computeIfPresent. Is it possible to break the
Map if mappingFunction changes the Map content? See similar problems
in HashMap:
https://bugs.openjdk.java.net/browse/JDK-8071667
https://bugs.openjdk.java.net/browse/JDK-8172951

With best regards,
Tagir Valeev.


SK> Hi All,

SK> Please, review:
SK> https://bugs.openjdk.java.net/browse/JDK-8176894
SK> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/

SK> The issue was created for JDK10 in order to don't disturb JDK9 before
SK> launch.



--
Best regards,
Sergey Kuksenko



Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap

2017-03-17 Thread Sergey Kuksenko


But existing (default) implementation doesn't check modCount as well. I 
just preserve existing functionality.


On 03/17/2017 07:33 AM, Tagir F. Valeev wrote:

Hello!

Great and long-awaited feature, thanks!

I don't see that modCount is checked after calling mappingFunction in
compute/computeIfAbsent/computeIfPresent. Is it possible to break the
Map if mappingFunction changes the Map content? See similar problems
in HashMap:
https://bugs.openjdk.java.net/browse/JDK-8071667
https://bugs.openjdk.java.net/browse/JDK-8172951

With best regards,
Tagir Valeev.


SK> Hi All,

SK> Please, review:
SK> https://bugs.openjdk.java.net/browse/JDK-8176894
SK> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/

SK> The issue was created for JDK10 in order to don't disturb JDK9 before
SK> launch.



--
Best regards,
Sergey Kuksenko



JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap

2017-03-16 Thread Sergey Kuksenko

Hi All,

Please, review:
https://bugs.openjdk.java.net/browse/JDK-8176894
http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/

The issue was created for JDK10 in order to don't disturb JDK9 before 
launch.


--
Best regards,
Sergey Kuksenko



Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal

2014-03-11 Thread Sergey Kuksenko

Brian,
Could you share your benchmarks?

On 03/04/2014 04:14 AM, Brian Burkhalter wrote:

Hello all,

This review request concerns this issue and proposed patch:

issue   https://bugs.openjdk.java.net/browse/JDK-6375303
patch   http://cr.openjdk.java.net/~bpb/6375303/webrev.00/

A review of the current code did not reveal anything clearly amiss in the 
nature of the the problems which instigated the original filing of this issue. 
Examination of the code did however suggest some code style and aesthetic 
improvements as well as a change to how one lazy initialization was implemented 
(here we go again ...).

Summary:

- clean up imports section
- add AtomicReferenceFieldUpdater-type constant for stringCache initialization
- change some existing static final constants to conform to Java coding style
- change name of some local variables so as not to hide instance variables of 
the same name
- add missing @Override annotations
- modify lazy initialization of stringCache in toString() to use 
AtomicReferenceFieldUpdater constant
- various other minor cleanup

Note that the performance of initialization of stringCache was measured for the 
extant double-checked locking approach, an AtomicReference, and the 
AtomicReferenceFieldUpdater and no significant degradation was observed for either of 
the latter versus the baseline for 1 and 4 threads. The updater has the advantage of 
adding only one object per class as opposed to one per instance.

Thanks,

Brian



--
Best regards,
Sergey Kuksenko


Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal

2014-03-11 Thread Sergey Kuksenko

Brian, Mike.

Could you explain what is the problem with the old caching?
String is immutable, BigDecimal is immutable. So we have data race at 
that place, but it is safe data race. What is the problem if we create 
several identical strings?

You are making stringCache volatile -> performance penalty on ARM & PPC.
Setting cache via CAS -> performance penalty everywhere.

If you insist to use AtomicFieldUpdater - the better way is to use 
lazySet, not CAS.



On 03/04/2014 09:21 PM, Mike Duigou wrote:


On Mar 4 2014, at 07:13 , Peter Levart  wrote:


On 03/04/2014 01:14 AM, Brian Burkhalter wrote:

- add AtomicReferenceFieldUpdater-type constant for stringCache initialization


Hi Brian,

By using volatile read and CAS, there's still a chance that multiple concurrent 
threads will be invoking the layoutChars(true) concurrently, but you guarantee 
that only a single instance of String will ever be returned as a result of the 
toString() method. Is that what you were pursuing?


Yes. (I provided the AtomicReferenceFieldUpdater code). The previous 
implementation had a benign data race that could result in a later layoutChars 
result replacing an earlier result and multiple string instances being 
returned. The new implementation, at small cost, prevents multiple different 
instances from being returned.

Mike



--
Best regards,
Sergey Kuksenko


Re: RFR (M): 8024635: Caching MethodType's descriptor string improves lambda linkage performance

2013-09-27 Thread Sergey Kuksenko

Updated version at
http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.02/
according comments:
- remove "null check" comment near Object.requireNonNull calls
- distort/(change parameters order) in key-purposed MethodType constructor
- move count-slot logic from checkPType to checkPTypes.



On 09/26/2013 11:09 PM, John Rose wrote:
> (Quick note on format:  I have changed the subject line to include the 
> conventional bug number and size estimate.  The bug number makes it easier to 
> track in mailers.)
> 
> On Sep 26, 2013, at 9:22 AM, Sergey Kuksenko  
> wrote:
> 
>> Hi All,
>>
>> I updated fix.
>> You may find it here
>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.01/
>>
>> See my comments and new webrev descition below
>>
>> On 09/18/2013 12:59 AM, John Rose wrote:
>>>>> We can tweak that later if necessary.  There are probably only a small 
>>>>> number of such strings, so maybe a small leaky map would do the trick as 
>>>>> well. 
>>>> In case of small amount of such strings we will get a huge overhead from
>>>> any kind of map.
>>> I would expect O(10^1.5) memory references and branches.  Depends on what 
>>> you mean by "huge".
>> Sure. I think the question to use external map or field may be decided
>> later when/if it will be needed.
>> Just some statictics, I've collected on nashorn/octane benchmarks
>> (statistics collected only for the single(first) benchmark iteration:
>> mandreel: 514 unique strings, toMethodDescriptorString called 69047 times
>> box2d: 560 unique strings, 34776 toMethodDescriptorString invocations.
> 
> Good data; thanks.
> 
>>
>>> Would there be any benefit from recording the original string from the 
>>> constant pool?  The JVM path for this is 
>>> SystemDictionary::find_method_handle_type, which makes an upcall to 
>>> MethodHandleNatives.findMethodHandleType.  Currently only the ptypes and 
>>> rtype are passed, but the JVM could also convert the internal Symbol to a 
>>> java.lang.String and pass that also.  In that case, MT's created by JVM 
>>> upcalls would be pre-equipped with descriptor strings.
>>>
>>> This is probably worth an experiment, although it requires some JVM work.
>>
>> I am definitely sure that we shouldn't do that.
>> Right now caching desriptor strings is internal decision of MethodType.
>> Otherwise it will make interfaces more complex.
> 
> Yes, I agree.  Also, it would only benefit the 514 calls which introduce 
> unique strings, whereas your change helps the 69047 calls for descriptor 
> strings.
> 
>>> I hope you get my overall point:  Hand optimizations have a continuing 
>>> cost, related to obscuring the logical structure of the code.  The 
>>> transforming engineer might make a mistake, and later maintaining engineers 
>>> might also make mistakes.
>>>
>>> https://blogs.oracle.com/jrose/entry/three_software_proverbs
>>
>> And it doesn't mean that we should afraid any kind of optimization.
>> Lucky positions - someone(or something) will optimize it for us. But
>> sometimes JIT (even smart JIT) is not enough.
> 
> Sometimes.  When that happens we reluctantly hand-optimize our code.
> 
>> Let's back to the patch.construstors
>> I decided not change original checkPType/checkRType code, except
>> explicit Objects.requireNonNull. The problem here that we are creating
>> new MethodType objects which are already exists, we have to create MT as
>> a key for searching in the internedTable. Ratio between already exiting
>> and new MethodTypes a quite huge.
>> Here is some statistics I've collected on nashorn/octane benchmarks
>> (statistics collected only for the single(first) benchmark iteration:
>>
>> mandreel:
>>  - 1739 unique MethodTypes,
>>  - 878414 MethodType.makeImpl requests;
>> box2d:
>>  - 1861 unique MethodTypes,
>>  - 527486 MethodType.makeImpl requests;
>> gameboy:
>>  - 2062 unique MethodTypes,
>>  - 311378 MethodType.makeImpl requests;
>>
>> So
>> 1. MethodType constructor do checkPTypes/checkRType which are frequently
>> (99%) useless - we already have the same (checked) MethodType in the
>> internTable.
>> 2. Less than 1% of created MethodTypes will be stored into internTable,
>> but reusing that MethodType objects prevent ti from scalar replacement.
>> (I've heard that Graal may do flow-sensitive scalar replacement, but C2
>> can't do).
>>
>> What I did. I separa

Re: RFR: Caching MethodType's descriptor string improves lambda linkage performance

2013-09-26 Thread Sergey Kuksenko
Hi All,

I updated fix.
You may find it here
http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.01/

See my comments and new webrev descition below

On 09/18/2013 12:59 AM, John Rose wrote:
>>> We can tweak that later if necessary.  There are probably only a small 
>>> number of such strings, so maybe a small leaky map would do the trick as 
>>> well. 
>> In case of small amount of such strings we will get a huge overhead from
>> any kind of map.
> I would expect O(10^1.5) memory references and branches.  Depends on what you 
> mean by "huge".
Sure. I think the question to use external map or field may be decided
later when/if it will be needed.
Just some statictics, I've collected on nashorn/octane benchmarks
(statistics collected only for the single(first) benchmark iteration:
mandreel: 514 unique strings, toMethodDescriptorString called 69047 times
box2d: 560 unique strings, 34776 toMethodDescriptorString invocations.

> Would there be any benefit from recording the original string from the 
> constant pool?  The JVM path for this is 
> SystemDictionary::find_method_handle_type, which makes an upcall to 
> MethodHandleNatives.findMethodHandleType.  Currently only the ptypes and 
> rtype are passed, but the JVM could also convert the internal Symbol to a 
> java.lang.String and pass that also.  In that case, MT's created by JVM 
> upcalls would be pre-equipped with descriptor strings.
> 
> This is probably worth an experiment, although it requires some JVM work.

I am definitely sure that we shouldn't do that.
Right now caching desriptor strings is internal decision of MethodType.
Otherwise it will make interfaces more complex.

> I hope you get my overall point:  Hand optimizations have a continuing cost, 
> related to obscuring the logical structure of the code.  The transforming 
> engineer might make a mistake, and later maintaining engineers might also 
> make mistakes.
> 
> https://blogs.oracle.com/jrose/entry/three_software_proverbs

And it doesn't mean that we should afraid any kind of optimization.
Lucky positions - someone(or something) will optimize it for us. But
sometimes JIT (even smart JIT) is not enough.

Let's back to the patch.construstors
I decided not change original checkPType/checkRType code, except
explicit Objects.requireNonNull. The problem here that we are creating
new MethodType objects which are already exists, we have to create MT as
a key for searching in the internedTable. Ratio between already exiting
and new MethodTypes a quite huge.
Here is some statistics I've collected on nashorn/octane benchmarks
(statistics collected only for the single(first) benchmark iteration:

mandreel:
  - 1739 unique MethodTypes,
  - 878414 MethodType.makeImpl requests;
box2d:
  - 1861 unique MethodTypes,
  - 527486 MethodType.makeImpl requests;
gameboy:
  - 2062 unique MethodTypes,
  - 311378 MethodType.makeImpl requests;

So
1. MethodType constructor do checkPTypes/checkRType which are frequently
(99%) useless - we already have the same (checked) MethodType in the
internTable.
2. Less than 1% of created MethodTypes will be stored into internTable,
but reusing that MethodType objects prevent ti from scalar replacement.
(I've heard that Graal may do flow-sensitive scalar replacement, but C2
can't do).

What I did. I separate constructors:
- for long lived MethodTypes with all checks,
- for short lived MethodTypes used only as key for searching in the
InterTable, no checks are performed.
if we didn't find  MethodType in the table we always create a new one
(with checks) that is required in less than 1% cases. And we remove at
least one obstacle for scalar replacement. Unfortunately, scalaer
replacement for expression "internTable.get(new MethodType(rtype,
ptypes))" was not performed, the reason may be evaluated, and hope it
would be easier to achieve scalarization in this case.




-- 
Best regards,
Sergey Kuksenko


Re: RFR: Caching MethodType's descriptor string improves lambda linkage performance

2013-09-17 Thread Sergey Kuksenko
On 09/17/2013 04:41 AM, John Rose wrote:
> The algorithmic change (string cache) is acceptable, although it will tend to 
> increase footprint somewhat.  
I don't think that it would be visible footprint increasing. MethodTypes
are interned by implementation, so I don't expect that this "overhead"
will be significant.

> We can tweak that later if necessary.  There are probably only a small number 
> of such strings, so maybe a small leaky map would do the trick as well. 
In case of small amount of such strings we will get a huge overhead from
any kind of map. From performance point of view map may acceptable when
we have a lot of such strings. The key fact here that we have a few
amount of MethodTypes, but calls "toMethodDescriptorString@ very frequent.

> 
> But, please don't hand-inline methods (checkPtype, checkSlotCount).  That is 
> usually the wrong answer.  
> If there is a JIT inlining decision that isn't going right, we need to fix 
> that, not distort the Java code.
Unfortunately it is not a case when we may rely on JIT. Certainly, there
are no perf problems here (except caching) after JIT (at least C2, I did
n't check C1). But we have huge performance problems/impact here while
interpreting. I know (I think) a really general solution for such perf
problems. Besides it will be useful in a big amount of other places.  It
is a bytecode level inline before interpretation.

Let's back to the patch.

checkPType is method which performs two _different_ actions. The first
is check that type in not null and type is not void. The second action
is calculate size of extra parameter slot (lond/double types). So we
came to idea that we need two different methods as from performance and
OOP purity points of view. checkPType has exactly two usages. And the
second usage don't need calculate size of parameters slots. So one
action used only in the single place and we get more readable code if we
inline it. The second action used twice, but it a very simple action
(NPE and void check) and I think here we may do manual inline because of
it is give us visible benefits and don't create any problems as for
later JITing and code reading.

> Also, I think you deleted a call to checkSlotCount that should still be there 
> to detect illegal inputs.  One of the costs of hand-inlining is bitrot like 
> that.
Oops.
You are right. I did a mistake. I eliminate one checkSlotCount usage by
error, but I didn't inline anything in that place. So it is not a cost
of hand-inlining, it is a cost of my inadvertence when I eliminame a
whole code line by misclick and didn't noticed that.


Tomorrow I'll do other webrev -  will try to pay attention to all
comments and we may continue the discussion.


> 
> — John
> 
> On Sep 11, 2013, at 12:03 PM, Sergey Kuksenko  
> wrote:
> 
>> On 09/11/2013 09:01 PM, Aleksey Shipilev wrote:
>>> On 09/11/2013 08:23 PM, Sergey Kuksenko wrote:
>>>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/
>>>
>>> As much as I hate to see the hand code tweaking instead of relying on
>>> compiler to do it's job, I understand this is about interpreter. Seems
>>> good then.
>>>
>>> * Formatting: "if(...)" should be "if (...")
>> Will do
>>>
>>> * Formatting: "//NPE" should be "// null check"
>> I just preserve exiting comments.
>> Decided don't modify original code which is not important to required
>> functionality.
>>
>>>
>>> * Formatting: "desc =  " should be "desc = "
>>>
>>> * Formatting: this one should not use braces (for consistency with other
>>> usages)?
>>> 364 if(nptype == null) { //NPE
>>> 365 throw new NullPointerException();
>>> 366 }
>> Let ask code owner.
>>>
>>> * Explicit null-checks: implicits via .getClass and .equals always
>>> bothered me in j.l.i.*; the idea was seemingly to piggyback on the
>>> compiler intrinsics. 
>> anyway it is doesn't matter after C1/C2
>>
>>> Any idea what's the cost of using
>>> Objects.requireNonNull there?
>> If we are running under the interpreter Objects.requireNonNull costs
>> enough to eliminate benefits from this fine tuning.
>>
>> -- 
>> Best regards,
>> Sergey Kuksenko
> 


-- 
Best regards,
Sergey Kuksenko


Re: RFR: Direct LambdaMetaFactory invocation from CallSite significantly improves lambda linkage performance

2013-09-16 Thread Sergey Kuksenko
Hi John, Brian,
I did evaluation JDK-8024630(my patch) vs JDK-8024761(your patch).

I did it for lambda linkage vs anonymous classloading. Because of
non-capture lambdas behaviour significantly differ capturing lambdas I
have to measure linkage twice (non-capturing vs capturing). Other
dimension is Tiered vs NonTiered compilation. I checked it on loading
different amount of lambdas (classes) - 1K, 4K, 8K, 16K, 24K, 32K, 64K.
(K=1024)


Non-capturing lambda:
- JDK-8024630 is slightly faster (2-3%) than JDK-8024761 on less amount
of lambdas (1K,4K,8K). But starting from 16K JDK-8024761 is faster (also
2-3%). The same for tiered/non-tiered.

Capturing lambda:
- Non-tiered compilation show the same behaviour and the same
percentages as non-capturing lambda. In case of TieredCompilation the
initial difference is started from 8% (JDK-8024630 is faster) and became
lower and both RFEs has the same performance from 24K lambdas.

So looking into my results I'll vote for JDK-8024761 instead of
JDK-8024630. Because of your solution is general and performance
difference is not significant.

On 09/11/2013 10:18 PM, John Rose wrote:
> Yes, this is a good tactic, but as written it is too much of a point hack for 
> lambdas (important though they are).
> 
> I have an alternate solution I would like you to measure.  It provides a fast 
> path for other BSM lookups like Nashorn's, so (if it works well for lambda) 
> it is preferable.
> 
> I will attach a patch to the bug report.
> 
> — John
> 
> On Sep 11, 2013, at 9:23 AM, Sergey Kuksenko  
> wrote:
> 
>> Please review the webrev at:
>>
>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024630/webrev.00/
>>
>> LambdaMetafactory is is a quite frequent bootstrap method for
>> invokedynamic in JDK8.
>> We can do direct method (LambdaMetafactory) invocation as fastpath when
>> proved that bootstrap MethodHandle points to LambdaMetafactory.
>> The modification gives +10% - +35% to lambda linkage performance
>> (depends on amount of lambdas).
>>
>> -- 
>> Best regards,
>> Sergey Kuksenko
> 


-- 
Best regards,
Sergey Kuksenko


Re: RFR: Caching MethodType's descriptor string improves lambda linkage performance

2013-09-11 Thread Sergey Kuksenko
On 09/11/2013 09:01 PM, Aleksey Shipilev wrote:
> On 09/11/2013 08:23 PM, Sergey Kuksenko wrote:
>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/
> 
> As much as I hate to see the hand code tweaking instead of relying on
> compiler to do it's job, I understand this is about interpreter. Seems
> good then.
> 
> * Formatting: "if(...)" should be "if (...")
Will do
> 
> * Formatting: "//NPE" should be "// null check"
I just preserve exiting comments.
Decided don't modify original code which is not important to required
functionality.

> 
> * Formatting: "desc =  " should be "desc = "
> 
> * Formatting: this one should not use braces (for consistency with other
> usages)?
>  364 if(nptype == null) { //NPE
>  365 throw new NullPointerException();
>  366 }
Let ask code owner.
> 
> * Explicit null-checks: implicits via .getClass and .equals always
> bothered me in j.l.i.*; the idea was seemingly to piggyback on the
> compiler intrinsics. 
anyway it is doesn't matter after C1/C2

> Any idea what's the cost of using
> Objects.requireNonNull there?
If we are running under the interpreter Objects.requireNonNull costs
enough to eliminate benefits from this fine tuning.

-- 
Best regards,
Sergey Kuksenko


Re: RFR: Direct LambdaMetaFactory invocation from CallSite significantly improves lambda linkage performance

2013-09-11 Thread Sergey Kuksenko
> On 09/11/2013 08:23 PM, Sergey Kuksenko wrote:
>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024630/webrev.00/
> 
> * webrev metadata: "invokation" -> "invocation"

misprint. Should I fix it right now, or may it be done later?

> * Why $caller is MethodHandles.Lookup now? Is it known to have that type?
Yes.
MethodHandles.Lookup is return type of IMPL_LOOKUP.in()


> * I would put the entire LMF.metafactory call inside the new method.
> That way, instanceof checks are right before the (otherwise potentially
> unsafe) casts.

At that case the single method should return two values: boolean flag
success/failure and CallSite.

>> The modification gives +10% - +35% to lambda linkage performance
>> (depends on amount of lambdas).
> 
> Any JSR292/Nashorn benchmarks to prove it does not degrade the "usual"
> bootstrap scenarios? I understand most of the performance is dominated
> by already-linked sites, but still.

I check it on octane - no perf difference. But I need somebody to
recheck my measurements.

-- 
Best regards,
Sergey Kuksenko


RFR: Caching MethodType's descriptor string improves lambda linkage performance

2013-09-11 Thread Sergey Kuksenko
Please review the webrev at:

http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/

MethodType.toMethodDescriptorString() is frequently invoked when
generating lambda classes from InnerClassLambdaMetafactory.
Caching resulting string into the field of MethodType  gives +5% - +10%
to lambda linkage performance.

Minor performance improvement: private method "checkPtype" was inlined
and eliminated. "checkRtype" and "checkPtypes" were refactored for
better perfomance in HotSpot interpreter (important for lambda linkage).
overall result +1%.


-- 
Best regards,
Sergey Kuksenko


RFR: Direct LambdaMetaFactory invocation from CallSite significantly improves lambda linkage performance

2013-09-11 Thread Sergey Kuksenko
Please review the webrev at:

http://cr.openjdk.java.net/~skuksenko/jsr335/8024630/webrev.00/

LambdaMetafactory is is a quite frequent bootstrap method for
invokedynamic in JDK8.
We can do direct method (LambdaMetafactory) invocation as fastpath when
proved that bootstrap MethodHandle points to LambdaMetafactory.
The modification gives +10% - +35% to lambda linkage performance
(depends on amount of lambdas).

-- 
Best regards,
Sergey Kuksenko


Re: Measuring performance changes from applying 4837946 patch

2013-06-04 Thread Sergey Kuksenko

Brian,
could you show your benchmark?

My quick experiments show that current Karatsuba threshold is quite 
reasonable.


On 06/04/2013 12:36 PM, Aleksey Shipilev wrote:

On 06/04/2013 04:27 AM, Brian Burkhalter wrote:

A) Is there some particular approach that should be used in testing
these algorithms for relative performance?


The number one approach is peer review. Is there the JMH project with
microbenchmarks somewhere? Sergey (cc'ed) knows a lot about
BigInteger/BigDecimal performance, and particular quirks you might be
seeing.


B) Could there be something platform-dependent happening?


It does not seem likely for pure Java BI/BD benchmarks. Thread
scheduling might affect the results; but this seems to be non-essential
for single-threaded benchmarks.


C) As currently implemented, the relative performance between
algorithms should be unaffected by thread count, correct?


Nope, that is generally incorrect:
   a) allocation patterns are different; you can hit the allocation wall
with multiple threads (which is realistic for highly-loaded systems);
the algorithm allocating more temporal objects may be faster in single
thread, but slower in multiple threads
   b) the cache contention effects might be more pronunciated with more
threads hitting the same caches
   c) CMT-enabled machines will have significantly different performance
with thread count exceeding the number of cores;
   d) thread schedulers are known to cause weird decisions for the thread
layout when there is the liberty of distributing N threads over M
hardware threads (N < M)


-Aleksey.




--
Best regards,
Sergey Kuksenko


Re: RFR: 8015522 - CharSequence.codePoints can be faster

2013-06-04 Thread Sergey Kuksenko

Hi
JavaDoc to the method says: "If the sequence is mutated while the stream 
is being read, the result is undefined."


What about exceptions? If the sequence is mutated and length becomes 
smaller you'll get IndexOutOfBoundsException in forEachRemaining method.



On 06/04/2013 12:33 AM, Henry Jen wrote:

Hi,

Please review
http://cr.openjdk.java.net/~henryjen/tl/8015522.0/webrev/

This webrev is based on previous suggestion from Martin Buchholz while
keep the lazy-binding nature without seize the length at construction time.

The benchmark shows ~50% performance increase, mainly from reduce of
operation, for example, inline block.accept instead of store the
character to a local var also helps.

Cheers,
Henry




--
Best regards,
Sergey Kuksenko


Re: RFR: 4837946 - Faster multiplication and exponentiation of large integers

2013-05-17 Thread Sergey Kuksenko

Hi Brian,

In the parallel thread I see a new implementation of CRC32 & Adler32. I 
don't mean new HW intrinsics, I mean parallel operation using 
ForkJoinPool. I think BigInteger operations may be a better candidate 
for such kind of parallelization. For example Karatsuba may be 
forkjoined naturally.



--
Best regards,
Sergey Kuksenko