Re: RFR: 8284638: store skip buffers in InputStream Object [v15]
On Tue, 24 May 2022 20:40:51 GMT, XenoAmess wrote: >> @jmehrens what about this then? >> I think it safe now(actually this mechanism is learned from Reader) > > XenoAmess has updated the pull request incrementally with one additional > commit since the last revision: > > invert if; refine javadoc. Another InputStreams cleaning direction. Many InputStream subclasses have read(byte[],...) as a primary implementation. Somehow they should implement 1-byte reading (read()) via array reading. All imaginable strategies are used: - allocate byte[1] per read() invocation - cache byte[1] preallocated - cache byte[1], allocation on demand - share 1-byte read buffer and skip buffer, as preallocating and allocating on demand. - etc So cleaning and making consistent all 1-byte read implementations - also may be valuable. - PR: https://git.openjdk.java.net/jdk/pull/5872
Re: RFR: 8284638: store skip buffers in InputStream Object [v15]
On Tue, 24 May 2022 20:40:51 GMT, XenoAmess wrote: >> @jmehrens what about this then? >> I think it safe now(actually this mechanism is learned from Reader) > > XenoAmess has updated the pull request incrementally with one additional > commit since the last revision: > > invert if; refine javadoc. More practically. This PR has a noticeable negative effect - it increases the size of InputStream objects. Moreover, it increases the size of InputStream subclasses which has own skip() implementation and don't need this superclass modification. Let's look into InputStream subclasses. I've checked 51 InputStream from classlib. 30 of them either have their own skip() implementation or use super.skip() other than from InputStream. This PR is definitely harmful to these 30 classes. These 30 classes can be divided into the following categories: 1) Clean non-allocating skip() implementation (22 classes): - BufferedInputStream (java.io) - ByteArrayInputStream (java.io) - FileInputStream (java.io) - FilterInputStream (java.io) - PeekInputStream in ObjectInputStream (java.io) - BlockDataInputStream in ObjectInputStream (java.io) - PushbackInputStream (java.io) - FastInputStream in Manifest (java.util.jar) - ZipFileInputStream in ZipFile (java.util.zip) - CipherInputStream (javax.crypto) - MeteredStream (sun.net.www) - Anonymous in nullInputStream() in InputStream (java.io) - DataInputStream (java.io) - Anonymous in readTrailer() in GZIPInputStream (java.util.zip) - FtpInputStream in FtpURLConnection (sun.net.www.protocol.ftp) - JarURLInputStream in JarURLConnection (sun.net.www.protocol.jar) - PlainTextInputStream (sun.net.www.content.text) - PushbackInputStream (java.io) - TelnetInputStream (sun.net) - UnclosableInputStream in FileInputStreamPool (sun.security.provider) - MeteredStream (sun.net.www) - KeepAliveStream (sun.net.www.http) 2) Partially clean skip() implementation (1 class): - ChannelInputStream (sun.nio.ch) Note: clean skip impl when using SeekableByteChannel (most usages) otherwise super.skip() is used, and it may be easily rewritten using the existing internal buffer. 3) Unavoidable skip buffers (7 classes): - PipeInputStream in Process (java.lang) // per skip() invocation buffer byte[2048] - CheckedInputStream (java.util.zip) // per skip() invocation buffer byte[512] - InflaterInputStream (java.util.zip)// per skip() invocation buffer byte[512] - AppInputStream in SSLSocketImpl (sun.security.ssl) // per skip() invocation buffer byte[256] - DeflaterInputStream (java.util.zip) // cached skip buffer, byte[512], allocated on demand - ZipInputStream (java.util.zip) // preallocated skip buffer byte[512] - HttpInputStream in HttpURLConnection (sun.net.www.protocol.http) // cached skip buffer, byte[8096], allocated on demand It would be better to clean all skip implementations for the latter category and make it consistent. Now it's totally a mess. All possible strategies were implemented. Now let's consider classes which uses InputStream.skip() implementation (21 classes): 4) skip() is not implemented, when trivial implementation is possible (4 classes): - EmptyInputStream (sun.net.www.protocol.http) - NullInputStream in ProcessBuilder (java.lang) - ObjectInputStream (java.io) - extObjectInputStream (javax.crypto) 5) skip() is not implemented, when not-so-trivial implementation is possible (9 classes): - Anonymous in getInputStream() in NioSocketImpl (sun.nio.ch) Notes: temp direct buffer is used for reading, it's possible to implement skip over the direct buffer, save allocation + no copying from direct buffer to byte[] - Anonymous in newInputStream() in Channels (java.nio.channels) Notes: temp direct buffer is used for reading, it's possible to implement skip over the direct buffer, save allocation + no copying from direct buffer to byte[] - ChunkedInputStream (sun.net.www.http) Notes: skip() maybe implemented with the existing internal buffer - DecInputStream in Base64 (java.util) - ErrorStream in HttpURLConnection (sun.net.www.protocol.http) Notes: skip (but easily can be implemented with internal buffer + delegation to tail stream) - PipedInputStream (java.io) Notes: skip() may be implemented with the existing internal buffer - SequenceInputStream (java.io) Notes: skip() maybe implemented with delegation - SocketInputStream (sun.nio.ch) Notes: temp direct buffer is used for reading, it's possible to implement skip over the direct buffer, save allocation + no copying from direct buffer to byte[] - SocketInputStream in Socket (java.net) Notes: skip() maybe implemented with delegation And the last category: 6) skip() is not implemented, and skip buffer is unavoidable (8 classes): - VerifierStream in JarVerifier (java.util.jar) - DigestInputStream (java.security) - HttpCaptureInputStream (sun.net.www.http) - InflaterInputStream (java.util.zip) - GZIPIn
Re: RFR: 8284638: store skip buffers in InputStream Object [v15]
On Thu, 26 May 2022 15:43:57 GMT, XenoAmess wrote: > > Is there any practical scenario where the current code (skip buffer > > allocation on each invocation) creates issues? > > @kuksenko Not found any yet :) In that case, what is the value of this PR? Do we need a code change for the sake of code change? - PR: https://git.openjdk.java.net/jdk/pull/5872
Re: RFR: 8284638: store skip buffers in InputStream Object [v15]
On Tue, 24 May 2022 20:40:51 GMT, XenoAmess wrote: >> @jmehrens what about this then? >> I think it safe now(actually this mechanism is learned from Reader) > > XenoAmess has updated the pull request incrementally with one additional > commit since the last revision: > > invert if; refine javadoc. Is there any practical scenario where the current code (skip buffer allocation on each invocation) creates issues? Most leaf InputStreams have their own skip implementation. And the same question for Reader class. - PR: https://git.openjdk.java.net/jdk/pull/5872
Re: RFR(s): 8176894 Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute, merge in TreeMap
Thanks! Looks good. The only I have comments for added microbenchmark: * Keys and values should be preallocated in setup. We want to measure TreeMap, not boxing. I'd prefer to see preallocated array of keys. * What is the reason to have "shuffled" parameter? Random keys distribution is more preferable. * pairs of similar operations looks weird. I mean code like this: bh.consume(map.put(key, key)); bh.consume(map.put(key, key)); The second put has advantage due to the fact that all required data is in CPU cache already. If we want to take into account operations over existing keys - it would be better to have two keys in the preallocated array. If array of keys is shuffled -> put operations for equal keys won't be following as sequentially. I think it will be closer to real behavior. * general notice about random keys. Typically I use the following scheme: @Param("0") long seed; @Setup() public void setup() { Random rnd = seed==0 ? new Random() : new Random(seed); // use rnd for generating data } In default case we always generates random data and cover quite nice distribution of really random cases. But if we found some "bad" behavior in some cases or we want to fix sequence of out tree modifications - we always may setup seed parameter as we wish and make it fixed. On 10/13/19 2:51 AM, Tagir Valeev wrote: Hello! Please review the updated patch (no sponsorship is necessary; review only): https://cr.openjdk.java.net/~tvaleev/webrev/8176894/r3/ https://bugs.openjdk.java.net/browse/JDK-8176894 The difference from the previous version [1] is minimal: I just noticed that the specification for computeIfAbsent should say "mapping" instead of "remapping", so I fixed the spec. No changes in code/tests. Also please review the associated CSR: https://bugs.openjdk.java.net/browse/JDK-8227666 I updated it, according to Joe Darcy's comments. An additional explanation and background is copied below from my original e-mail [2] for your convenience: The patch was originally submitted by Sergey Kuksenko in March 2017 and reviewed by some people: http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-March/046825.html The latest patch submitted by Sergey is here: http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/ I asked Sergey why it was abandoned. Seems there were no particular reason and Sergey asked if I could pick up this work. I will be happy to finish it. Here's the list of differences between the latest Sergey patch and my patch: - A bug is fixed in putIfAbsent implementation. TreeMap.java, lines 576 and 596: `|| oldValue == null` condition added (the null value should be overwritten no matter what) - A testcase is added to cover this problem (InPlaceOpsCollisions.java, also checks HashMap and LinkedHashMap). Not sure if it's the best place for such test though. Probably a separate file would be better? - TreeMap.merge() implementation is added. - TreeMap is added to the FunctionalCMEs.java test (btw this file copyright says that it's a subject to Classpath exception which is probably wrong?) - Simple microbenchmark is added: TreeMapUpdate.java My quick benchmarking shows that optimized version is indeed faster for the most of the tests and no regression is observed for put() method. Here's raw results of jdk13-ea+26 and jdk13-ea+26+patch if anybody is interested. http://cr.openjdk.java.net/~tvaleev/jmh/JDK-8176894/ With best regards, Tagir Valeev. [1] https://cr.openjdk.java.net/~tvaleev/webrev/8176894/r2/ [2] https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-July/061309.html
Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap
Hi Claes, There is no need to backport it to 8 & 9. 8 & 9 uses default implementation, which is free from such tree corruption issue. On 04/06/2017 03:45 AM, Claes Redestad wrote: Hi Sergey, this looks good to me*, but I can't help wonder if the modCount checking is something that should be done separately as a bug fix (with a higher priority) and be backported to 8 and 9? Alternatively re-categorize this fix as such. Thanks! /Claes * I wouldn't mind seeing the cleanup Martin suggested, i.e., write the remapValue as: private V remapValue(Entry t, K key, BiFunctionsuper V, ? extends V> remappingFunction) { V newValue = remappingFunction.apply(key, t.value); if (newValue == null) { deleteEntry(t); } else { // replace old mapping t.value = newValue; } return newValue; } On 2017-03-28 21:22, Sergey Kuksenko wrote: Friendly reminder. Have you have chance to review the latest version? On 03/17/2017 12:45 PM, Sergey Kuksenko wrote: Hi, All. I realized (thanks Tagir V.) that if we don't check modCount after calling mapping function we may get corrupted tree structure. new webrev for review: http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/ On 03/17/2017 11:29 AM, Martin Buchholz wrote: Thanks! This looks pretty good. I have a similar effort in progress to improve bulk collection operations, most of which made it into jdk9. --- Please use standard java.util whitespace, as Aleksey suggested. --- Below (and in compute) I wpuld simply return newValue; saving a line of code and making it clearer that we are returning the result of the remappingFunction 676 private V remapValue(Entry t, K key, BiFunctionsuper K, ? super V, ? extends V> remappingFunction) { 677 V newValue = remappingFunction.apply(key, t.value); 678 if (newValue == null) { 679 deleteEntry(t); 680 return null; 681 } else { 682 // replace old mapping 683 t.value = newValue; 684 return newValue; 685 } 686 } --- This code is surely tested but testing could also surely be improved. That's probably not your job though (it may be mine!) I would probably try hand-injecting some bugs into a copy of the code and seeing if our jtreg tests catch it, to increase coverage confidence. On Thu, Mar 16, 2017 at 12:04 PM, Sergey Kuksenko mailto:sergey.kukse...@oracle.com>> wrote: Hi All, Please, review: https://bugs.openjdk.java.net/browse/JDK-8176894 <https://bugs.openjdk.java.net/browse/JDK-8176894> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ <http://cr.openjdk.java.net/%7Eskuksenko/corelibs/utils/8176894/webrev.00/> The issue was created for JDK10 in order to don't disturb JDK9 before launch. -- Best regards, Sergey Kuksenko -- Best regards, Sergey Kuksenko
Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap
Friendly reminder. Have you have chance to review the latest version? On 03/17/2017 12:45 PM, Sergey Kuksenko wrote: Hi, All. I realized (thanks Tagir V.) that if we don't check modCount after calling mapping function we may get corrupted tree structure. new webrev for review: http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/ On 03/17/2017 11:29 AM, Martin Buchholz wrote: Thanks! This looks pretty good. I have a similar effort in progress to improve bulk collection operations, most of which made it into jdk9. --- Please use standard java.util whitespace, as Aleksey suggested. --- Below (and in compute) I wpuld simply return newValue; saving a line of code and making it clearer that we are returning the result of the remappingFunction 676 private V remapValue(Entry t, K key, BiFunctionsuper K, ? super V, ? extends V> remappingFunction) { 677 V newValue = remappingFunction.apply(key, t.value); 678 if (newValue == null) { 679 deleteEntry(t); 680 return null; 681 } else { 682 // replace old mapping 683 t.value = newValue; 684 return newValue; 685 } 686 } --- This code is surely tested but testing could also surely be improved. That's probably not your job though (it may be mine!) I would probably try hand-injecting some bugs into a copy of the code and seeing if our jtreg tests catch it, to increase coverage confidence. On Thu, Mar 16, 2017 at 12:04 PM, Sergey Kuksenko mailto:sergey.kukse...@oracle.com>> wrote: Hi All, Please, review: https://bugs.openjdk.java.net/browse/JDK-8176894 <https://bugs.openjdk.java.net/browse/JDK-8176894> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ <http://cr.openjdk.java.net/%7Eskuksenko/corelibs/utils/8176894/webrev.00/> The issue was created for JDK10 in order to don't disturb JDK9 before launch. -- Best regards, Sergey Kuksenko -- Best regards, Sergey Kuksenko
Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap
Hi, All. I realized (thanks Tagir V.) that if we don't check modCount after calling mapping function we may get corrupted tree structure. new webrev for review: http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.01/ On 03/17/2017 11:29 AM, Martin Buchholz wrote: Thanks! This looks pretty good. I have a similar effort in progress to improve bulk collection operations, most of which made it into jdk9. --- Please use standard java.util whitespace, as Aleksey suggested. --- Below (and in compute) I wpuld simply return newValue; saving a line of code and making it clearer that we are returning the result of the remappingFunction 676 private V remapValue(Entry t, K key, BiFunctionK, ? super V, ? extends V> remappingFunction) { 677 V newValue = remappingFunction.apply(key, t.value); 678 if (newValue == null) { 679 deleteEntry(t); 680 return null; 681 } else { 682 // replace old mapping 683 t.value = newValue; 684 return newValue; 685 } 686 } --- This code is surely tested but testing could also surely be improved. That's probably not your job though (it may be mine!) I would probably try hand-injecting some bugs into a copy of the code and seeing if our jtreg tests catch it, to increase coverage confidence. On Thu, Mar 16, 2017 at 12:04 PM, Sergey Kuksenko mailto:sergey.kukse...@oracle.com>> wrote: Hi All, Please, review: https://bugs.openjdk.java.net/browse/JDK-8176894 <https://bugs.openjdk.java.net/browse/JDK-8176894> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ <http://cr.openjdk.java.net/%7Eskuksenko/corelibs/utils/8176894/webrev.00/> The issue was created for JDK10 in order to don't disturb JDK9 before launch. -- Best regards, Sergey Kuksenko -- Best regards, Sergey Kuksenko
Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap
Got it. It definitely should be done. On 03/17/2017 07:33 AM, Tagir F. Valeev wrote: Hello! Great and long-awaited feature, thanks! I don't see that modCount is checked after calling mappingFunction in compute/computeIfAbsent/computeIfPresent. Is it possible to break the Map if mappingFunction changes the Map content? See similar problems in HashMap: https://bugs.openjdk.java.net/browse/JDK-8071667 https://bugs.openjdk.java.net/browse/JDK-8172951 With best regards, Tagir Valeev. SK> Hi All, SK> Please, review: SK> https://bugs.openjdk.java.net/browse/JDK-8176894 SK> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ SK> The issue was created for JDK10 in order to don't disturb JDK9 before SK> launch. -- Best regards, Sergey Kuksenko
Re: JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap
But existing (default) implementation doesn't check modCount as well. I just preserve existing functionality. On 03/17/2017 07:33 AM, Tagir F. Valeev wrote: Hello! Great and long-awaited feature, thanks! I don't see that modCount is checked after calling mappingFunction in compute/computeIfAbsent/computeIfPresent. Is it possible to break the Map if mappingFunction changes the Map content? See similar problems in HashMap: https://bugs.openjdk.java.net/browse/JDK-8071667 https://bugs.openjdk.java.net/browse/JDK-8172951 With best regards, Tagir Valeev. SK> Hi All, SK> Please, review: SK> https://bugs.openjdk.java.net/browse/JDK-8176894 SK> http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ SK> The issue was created for JDK10 in order to don't disturb JDK9 before SK> launch. -- Best regards, Sergey Kuksenko
JDK 10 RFR of 8176894: Provide specialized implementation for default methods putIfAbsent, computeIfAbsent, computeIfPresent, compute in TreeMap
Hi All, Please, review: https://bugs.openjdk.java.net/browse/JDK-8176894 http://cr.openjdk.java.net/~skuksenko/corelibs/utils/8176894/webrev.00/ The issue was created for JDK10 in order to don't disturb JDK9 before launch. -- Best regards, Sergey Kuksenko
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Brian, Could you share your benchmarks? On 03/04/2014 04:14 AM, Brian Burkhalter wrote: Hello all, This review request concerns this issue and proposed patch: issue https://bugs.openjdk.java.net/browse/JDK-6375303 patch http://cr.openjdk.java.net/~bpb/6375303/webrev.00/ A review of the current code did not reveal anything clearly amiss in the nature of the the problems which instigated the original filing of this issue. Examination of the code did however suggest some code style and aesthetic improvements as well as a change to how one lazy initialization was implemented (here we go again ...). Summary: - clean up imports section - add AtomicReferenceFieldUpdater-type constant for stringCache initialization - change some existing static final constants to conform to Java coding style - change name of some local variables so as not to hide instance variables of the same name - add missing @Override annotations - modify lazy initialization of stringCache in toString() to use AtomicReferenceFieldUpdater constant - various other minor cleanup Note that the performance of initialization of stringCache was measured for the extant double-checked locking approach, an AtomicReference, and the AtomicReferenceFieldUpdater and no significant degradation was observed for either of the latter versus the baseline for 1 and 4 threads. The updater has the advantage of adding only one object per class as opposed to one per instance. Thanks, Brian -- Best regards, Sergey Kuksenko
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Brian, Mike. Could you explain what is the problem with the old caching? String is immutable, BigDecimal is immutable. So we have data race at that place, but it is safe data race. What is the problem if we create several identical strings? You are making stringCache volatile -> performance penalty on ARM & PPC. Setting cache via CAS -> performance penalty everywhere. If you insist to use AtomicFieldUpdater - the better way is to use lazySet, not CAS. On 03/04/2014 09:21 PM, Mike Duigou wrote: On Mar 4 2014, at 07:13 , Peter Levart wrote: On 03/04/2014 01:14 AM, Brian Burkhalter wrote: - add AtomicReferenceFieldUpdater-type constant for stringCache initialization Hi Brian, By using volatile read and CAS, there's still a chance that multiple concurrent threads will be invoking the layoutChars(true) concurrently, but you guarantee that only a single instance of String will ever be returned as a result of the toString() method. Is that what you were pursuing? Yes. (I provided the AtomicReferenceFieldUpdater code). The previous implementation had a benign data race that could result in a later layoutChars result replacing an earlier result and multiple string instances being returned. The new implementation, at small cost, prevents multiple different instances from being returned. Mike -- Best regards, Sergey Kuksenko
Re: RFR (M): 8024635: Caching MethodType's descriptor string improves lambda linkage performance
Updated version at http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.02/ according comments: - remove "null check" comment near Object.requireNonNull calls - distort/(change parameters order) in key-purposed MethodType constructor - move count-slot logic from checkPType to checkPTypes. On 09/26/2013 11:09 PM, John Rose wrote: > (Quick note on format: I have changed the subject line to include the > conventional bug number and size estimate. The bug number makes it easier to > track in mailers.) > > On Sep 26, 2013, at 9:22 AM, Sergey Kuksenko > wrote: > >> Hi All, >> >> I updated fix. >> You may find it here >> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.01/ >> >> See my comments and new webrev descition below >> >> On 09/18/2013 12:59 AM, John Rose wrote: >>>>> We can tweak that later if necessary. There are probably only a small >>>>> number of such strings, so maybe a small leaky map would do the trick as >>>>> well. >>>> In case of small amount of such strings we will get a huge overhead from >>>> any kind of map. >>> I would expect O(10^1.5) memory references and branches. Depends on what >>> you mean by "huge". >> Sure. I think the question to use external map or field may be decided >> later when/if it will be needed. >> Just some statictics, I've collected on nashorn/octane benchmarks >> (statistics collected only for the single(first) benchmark iteration: >> mandreel: 514 unique strings, toMethodDescriptorString called 69047 times >> box2d: 560 unique strings, 34776 toMethodDescriptorString invocations. > > Good data; thanks. > >> >>> Would there be any benefit from recording the original string from the >>> constant pool? The JVM path for this is >>> SystemDictionary::find_method_handle_type, which makes an upcall to >>> MethodHandleNatives.findMethodHandleType. Currently only the ptypes and >>> rtype are passed, but the JVM could also convert the internal Symbol to a >>> java.lang.String and pass that also. In that case, MT's created by JVM >>> upcalls would be pre-equipped with descriptor strings. >>> >>> This is probably worth an experiment, although it requires some JVM work. >> >> I am definitely sure that we shouldn't do that. >> Right now caching desriptor strings is internal decision of MethodType. >> Otherwise it will make interfaces more complex. > > Yes, I agree. Also, it would only benefit the 514 calls which introduce > unique strings, whereas your change helps the 69047 calls for descriptor > strings. > >>> I hope you get my overall point: Hand optimizations have a continuing >>> cost, related to obscuring the logical structure of the code. The >>> transforming engineer might make a mistake, and later maintaining engineers >>> might also make mistakes. >>> >>> https://blogs.oracle.com/jrose/entry/three_software_proverbs >> >> And it doesn't mean that we should afraid any kind of optimization. >> Lucky positions - someone(or something) will optimize it for us. But >> sometimes JIT (even smart JIT) is not enough. > > Sometimes. When that happens we reluctantly hand-optimize our code. > >> Let's back to the patch.construstors >> I decided not change original checkPType/checkRType code, except >> explicit Objects.requireNonNull. The problem here that we are creating >> new MethodType objects which are already exists, we have to create MT as >> a key for searching in the internedTable. Ratio between already exiting >> and new MethodTypes a quite huge. >> Here is some statistics I've collected on nashorn/octane benchmarks >> (statistics collected only for the single(first) benchmark iteration: >> >> mandreel: >> - 1739 unique MethodTypes, >> - 878414 MethodType.makeImpl requests; >> box2d: >> - 1861 unique MethodTypes, >> - 527486 MethodType.makeImpl requests; >> gameboy: >> - 2062 unique MethodTypes, >> - 311378 MethodType.makeImpl requests; >> >> So >> 1. MethodType constructor do checkPTypes/checkRType which are frequently >> (99%) useless - we already have the same (checked) MethodType in the >> internTable. >> 2. Less than 1% of created MethodTypes will be stored into internTable, >> but reusing that MethodType objects prevent ti from scalar replacement. >> (I've heard that Graal may do flow-sensitive scalar replacement, but C2 >> can't do). >> >> What I did. I separa
Re: RFR: Caching MethodType's descriptor string improves lambda linkage performance
Hi All, I updated fix. You may find it here http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.01/ See my comments and new webrev descition below On 09/18/2013 12:59 AM, John Rose wrote: >>> We can tweak that later if necessary. There are probably only a small >>> number of such strings, so maybe a small leaky map would do the trick as >>> well. >> In case of small amount of such strings we will get a huge overhead from >> any kind of map. > I would expect O(10^1.5) memory references and branches. Depends on what you > mean by "huge". Sure. I think the question to use external map or field may be decided later when/if it will be needed. Just some statictics, I've collected on nashorn/octane benchmarks (statistics collected only for the single(first) benchmark iteration: mandreel: 514 unique strings, toMethodDescriptorString called 69047 times box2d: 560 unique strings, 34776 toMethodDescriptorString invocations. > Would there be any benefit from recording the original string from the > constant pool? The JVM path for this is > SystemDictionary::find_method_handle_type, which makes an upcall to > MethodHandleNatives.findMethodHandleType. Currently only the ptypes and > rtype are passed, but the JVM could also convert the internal Symbol to a > java.lang.String and pass that also. In that case, MT's created by JVM > upcalls would be pre-equipped with descriptor strings. > > This is probably worth an experiment, although it requires some JVM work. I am definitely sure that we shouldn't do that. Right now caching desriptor strings is internal decision of MethodType. Otherwise it will make interfaces more complex. > I hope you get my overall point: Hand optimizations have a continuing cost, > related to obscuring the logical structure of the code. The transforming > engineer might make a mistake, and later maintaining engineers might also > make mistakes. > > https://blogs.oracle.com/jrose/entry/three_software_proverbs And it doesn't mean that we should afraid any kind of optimization. Lucky positions - someone(or something) will optimize it for us. But sometimes JIT (even smart JIT) is not enough. Let's back to the patch.construstors I decided not change original checkPType/checkRType code, except explicit Objects.requireNonNull. The problem here that we are creating new MethodType objects which are already exists, we have to create MT as a key for searching in the internedTable. Ratio between already exiting and new MethodTypes a quite huge. Here is some statistics I've collected on nashorn/octane benchmarks (statistics collected only for the single(first) benchmark iteration: mandreel: - 1739 unique MethodTypes, - 878414 MethodType.makeImpl requests; box2d: - 1861 unique MethodTypes, - 527486 MethodType.makeImpl requests; gameboy: - 2062 unique MethodTypes, - 311378 MethodType.makeImpl requests; So 1. MethodType constructor do checkPTypes/checkRType which are frequently (99%) useless - we already have the same (checked) MethodType in the internTable. 2. Less than 1% of created MethodTypes will be stored into internTable, but reusing that MethodType objects prevent ti from scalar replacement. (I've heard that Graal may do flow-sensitive scalar replacement, but C2 can't do). What I did. I separate constructors: - for long lived MethodTypes with all checks, - for short lived MethodTypes used only as key for searching in the InterTable, no checks are performed. if we didn't find MethodType in the table we always create a new one (with checks) that is required in less than 1% cases. And we remove at least one obstacle for scalar replacement. Unfortunately, scalaer replacement for expression "internTable.get(new MethodType(rtype, ptypes))" was not performed, the reason may be evaluated, and hope it would be easier to achieve scalarization in this case. -- Best regards, Sergey Kuksenko
Re: RFR: Caching MethodType's descriptor string improves lambda linkage performance
On 09/17/2013 04:41 AM, John Rose wrote: > The algorithmic change (string cache) is acceptable, although it will tend to > increase footprint somewhat. I don't think that it would be visible footprint increasing. MethodTypes are interned by implementation, so I don't expect that this "overhead" will be significant. > We can tweak that later if necessary. There are probably only a small number > of such strings, so maybe a small leaky map would do the trick as well. In case of small amount of such strings we will get a huge overhead from any kind of map. From performance point of view map may acceptable when we have a lot of such strings. The key fact here that we have a few amount of MethodTypes, but calls "toMethodDescriptorString@ very frequent. > > But, please don't hand-inline methods (checkPtype, checkSlotCount). That is > usually the wrong answer. > If there is a JIT inlining decision that isn't going right, we need to fix > that, not distort the Java code. Unfortunately it is not a case when we may rely on JIT. Certainly, there are no perf problems here (except caching) after JIT (at least C2, I did n't check C1). But we have huge performance problems/impact here while interpreting. I know (I think) a really general solution for such perf problems. Besides it will be useful in a big amount of other places. It is a bytecode level inline before interpretation. Let's back to the patch. checkPType is method which performs two _different_ actions. The first is check that type in not null and type is not void. The second action is calculate size of extra parameter slot (lond/double types). So we came to idea that we need two different methods as from performance and OOP purity points of view. checkPType has exactly two usages. And the second usage don't need calculate size of parameters slots. So one action used only in the single place and we get more readable code if we inline it. The second action used twice, but it a very simple action (NPE and void check) and I think here we may do manual inline because of it is give us visible benefits and don't create any problems as for later JITing and code reading. > Also, I think you deleted a call to checkSlotCount that should still be there > to detect illegal inputs. One of the costs of hand-inlining is bitrot like > that. Oops. You are right. I did a mistake. I eliminate one checkSlotCount usage by error, but I didn't inline anything in that place. So it is not a cost of hand-inlining, it is a cost of my inadvertence when I eliminame a whole code line by misclick and didn't noticed that. Tomorrow I'll do other webrev - will try to pay attention to all comments and we may continue the discussion. > > — John > > On Sep 11, 2013, at 12:03 PM, Sergey Kuksenko > wrote: > >> On 09/11/2013 09:01 PM, Aleksey Shipilev wrote: >>> On 09/11/2013 08:23 PM, Sergey Kuksenko wrote: >>>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/ >>> >>> As much as I hate to see the hand code tweaking instead of relying on >>> compiler to do it's job, I understand this is about interpreter. Seems >>> good then. >>> >>> * Formatting: "if(...)" should be "if (...") >> Will do >>> >>> * Formatting: "//NPE" should be "// null check" >> I just preserve exiting comments. >> Decided don't modify original code which is not important to required >> functionality. >> >>> >>> * Formatting: "desc = " should be "desc = " >>> >>> * Formatting: this one should not use braces (for consistency with other >>> usages)? >>> 364 if(nptype == null) { //NPE >>> 365 throw new NullPointerException(); >>> 366 } >> Let ask code owner. >>> >>> * Explicit null-checks: implicits via .getClass and .equals always >>> bothered me in j.l.i.*; the idea was seemingly to piggyback on the >>> compiler intrinsics. >> anyway it is doesn't matter after C1/C2 >> >>> Any idea what's the cost of using >>> Objects.requireNonNull there? >> If we are running under the interpreter Objects.requireNonNull costs >> enough to eliminate benefits from this fine tuning. >> >> -- >> Best regards, >> Sergey Kuksenko > -- Best regards, Sergey Kuksenko
Re: RFR: Direct LambdaMetaFactory invocation from CallSite significantly improves lambda linkage performance
Hi John, Brian, I did evaluation JDK-8024630(my patch) vs JDK-8024761(your patch). I did it for lambda linkage vs anonymous classloading. Because of non-capture lambdas behaviour significantly differ capturing lambdas I have to measure linkage twice (non-capturing vs capturing). Other dimension is Tiered vs NonTiered compilation. I checked it on loading different amount of lambdas (classes) - 1K, 4K, 8K, 16K, 24K, 32K, 64K. (K=1024) Non-capturing lambda: - JDK-8024630 is slightly faster (2-3%) than JDK-8024761 on less amount of lambdas (1K,4K,8K). But starting from 16K JDK-8024761 is faster (also 2-3%). The same for tiered/non-tiered. Capturing lambda: - Non-tiered compilation show the same behaviour and the same percentages as non-capturing lambda. In case of TieredCompilation the initial difference is started from 8% (JDK-8024630 is faster) and became lower and both RFEs has the same performance from 24K lambdas. So looking into my results I'll vote for JDK-8024761 instead of JDK-8024630. Because of your solution is general and performance difference is not significant. On 09/11/2013 10:18 PM, John Rose wrote: > Yes, this is a good tactic, but as written it is too much of a point hack for > lambdas (important though they are). > > I have an alternate solution I would like you to measure. It provides a fast > path for other BSM lookups like Nashorn's, so (if it works well for lambda) > it is preferable. > > I will attach a patch to the bug report. > > — John > > On Sep 11, 2013, at 9:23 AM, Sergey Kuksenko > wrote: > >> Please review the webrev at: >> >> http://cr.openjdk.java.net/~skuksenko/jsr335/8024630/webrev.00/ >> >> LambdaMetafactory is is a quite frequent bootstrap method for >> invokedynamic in JDK8. >> We can do direct method (LambdaMetafactory) invocation as fastpath when >> proved that bootstrap MethodHandle points to LambdaMetafactory. >> The modification gives +10% - +35% to lambda linkage performance >> (depends on amount of lambdas). >> >> -- >> Best regards, >> Sergey Kuksenko > -- Best regards, Sergey Kuksenko
Re: RFR: Caching MethodType's descriptor string improves lambda linkage performance
On 09/11/2013 09:01 PM, Aleksey Shipilev wrote: > On 09/11/2013 08:23 PM, Sergey Kuksenko wrote: >> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/ > > As much as I hate to see the hand code tweaking instead of relying on > compiler to do it's job, I understand this is about interpreter. Seems > good then. > > * Formatting: "if(...)" should be "if (...") Will do > > * Formatting: "//NPE" should be "// null check" I just preserve exiting comments. Decided don't modify original code which is not important to required functionality. > > * Formatting: "desc = " should be "desc = " > > * Formatting: this one should not use braces (for consistency with other > usages)? > 364 if(nptype == null) { //NPE > 365 throw new NullPointerException(); > 366 } Let ask code owner. > > * Explicit null-checks: implicits via .getClass and .equals always > bothered me in j.l.i.*; the idea was seemingly to piggyback on the > compiler intrinsics. anyway it is doesn't matter after C1/C2 > Any idea what's the cost of using > Objects.requireNonNull there? If we are running under the interpreter Objects.requireNonNull costs enough to eliminate benefits from this fine tuning. -- Best regards, Sergey Kuksenko
Re: RFR: Direct LambdaMetaFactory invocation from CallSite significantly improves lambda linkage performance
> On 09/11/2013 08:23 PM, Sergey Kuksenko wrote: >> http://cr.openjdk.java.net/~skuksenko/jsr335/8024630/webrev.00/ > > * webrev metadata: "invokation" -> "invocation" misprint. Should I fix it right now, or may it be done later? > * Why $caller is MethodHandles.Lookup now? Is it known to have that type? Yes. MethodHandles.Lookup is return type of IMPL_LOOKUP.in() > * I would put the entire LMF.metafactory call inside the new method. > That way, instanceof checks are right before the (otherwise potentially > unsafe) casts. At that case the single method should return two values: boolean flag success/failure and CallSite. >> The modification gives +10% - +35% to lambda linkage performance >> (depends on amount of lambdas). > > Any JSR292/Nashorn benchmarks to prove it does not degrade the "usual" > bootstrap scenarios? I understand most of the performance is dominated > by already-linked sites, but still. I check it on octane - no perf difference. But I need somebody to recheck my measurements. -- Best regards, Sergey Kuksenko
RFR: Caching MethodType's descriptor string improves lambda linkage performance
Please review the webrev at: http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/ MethodType.toMethodDescriptorString() is frequently invoked when generating lambda classes from InnerClassLambdaMetafactory. Caching resulting string into the field of MethodType gives +5% - +10% to lambda linkage performance. Minor performance improvement: private method "checkPtype" was inlined and eliminated. "checkRtype" and "checkPtypes" were refactored for better perfomance in HotSpot interpreter (important for lambda linkage). overall result +1%. -- Best regards, Sergey Kuksenko
RFR: Direct LambdaMetaFactory invocation from CallSite significantly improves lambda linkage performance
Please review the webrev at: http://cr.openjdk.java.net/~skuksenko/jsr335/8024630/webrev.00/ LambdaMetafactory is is a quite frequent bootstrap method for invokedynamic in JDK8. We can do direct method (LambdaMetafactory) invocation as fastpath when proved that bootstrap MethodHandle points to LambdaMetafactory. The modification gives +10% - +35% to lambda linkage performance (depends on amount of lambdas). -- Best regards, Sergey Kuksenko
Re: Measuring performance changes from applying 4837946 patch
Brian, could you show your benchmark? My quick experiments show that current Karatsuba threshold is quite reasonable. On 06/04/2013 12:36 PM, Aleksey Shipilev wrote: On 06/04/2013 04:27 AM, Brian Burkhalter wrote: A) Is there some particular approach that should be used in testing these algorithms for relative performance? The number one approach is peer review. Is there the JMH project with microbenchmarks somewhere? Sergey (cc'ed) knows a lot about BigInteger/BigDecimal performance, and particular quirks you might be seeing. B) Could there be something platform-dependent happening? It does not seem likely for pure Java BI/BD benchmarks. Thread scheduling might affect the results; but this seems to be non-essential for single-threaded benchmarks. C) As currently implemented, the relative performance between algorithms should be unaffected by thread count, correct? Nope, that is generally incorrect: a) allocation patterns are different; you can hit the allocation wall with multiple threads (which is realistic for highly-loaded systems); the algorithm allocating more temporal objects may be faster in single thread, but slower in multiple threads b) the cache contention effects might be more pronunciated with more threads hitting the same caches c) CMT-enabled machines will have significantly different performance with thread count exceeding the number of cores; d) thread schedulers are known to cause weird decisions for the thread layout when there is the liberty of distributing N threads over M hardware threads (N < M) -Aleksey. -- Best regards, Sergey Kuksenko
Re: RFR: 8015522 - CharSequence.codePoints can be faster
Hi JavaDoc to the method says: "If the sequence is mutated while the stream is being read, the result is undefined." What about exceptions? If the sequence is mutated and length becomes smaller you'll get IndexOutOfBoundsException in forEachRemaining method. On 06/04/2013 12:33 AM, Henry Jen wrote: Hi, Please review http://cr.openjdk.java.net/~henryjen/tl/8015522.0/webrev/ This webrev is based on previous suggestion from Martin Buchholz while keep the lazy-binding nature without seize the length at construction time. The benchmark shows ~50% performance increase, mainly from reduce of operation, for example, inline block.accept instead of store the character to a local var also helps. Cheers, Henry -- Best regards, Sergey Kuksenko
Re: RFR: 4837946 - Faster multiplication and exponentiation of large integers
Hi Brian, In the parallel thread I see a new implementation of CRC32 & Adler32. I don't mean new HW intrinsics, I mean parallel operation using ForkJoinPool. I think BigInteger operations may be a better candidate for such kind of parallelization. For example Karatsuba may be forkjoined naturally. -- Best regards, Sergey Kuksenko