Re: PING! Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Hi Brian, My inclination is if it ain't broke... and AFAICT nothing indicates toString it is particular broken [*], so perhaps just focus on the cleanup aspects and revisit the other when the JMM is updated (and maybe enhanced volatiles is ready)? Paul. [*] although i still question the use of thread locals, but that is a separate issue that requires more investigation and i don't know if if can be improved. On Apr 10, 2014, at 10:45 PM, Brian Burkhalter brian.burkhal...@oracle.com wrote: Second day back from vacation so I guess it’s time to beat this horse again … As there was no response to the message included below I am simply re-posting it. Thanks, Brian On Mar 25, 2014, at 7:19 AM, Brian Burkhalter brian.burkhal...@oracle.com wrote: On Mar 25, 2014, at 1:58 AM, Paul Sandoz paul.san...@oracle.com wrote: This is another example of a stable variable. I would like to re-iterate my scepticism that such changes are necessary in this case (i am not sure if it is possible to create a benchmark that could better exacerbate the concurrent overlap of calls to layoutChars). But, i do agree the discussion has been useful and interesting. I am happy either to leave the toString() code as it is or to change it to the variant with toStringSlow(). There is however other cleanup in the patch to consider. So it would be good to get consensus on the two points: 1) Change toString() to variant using toStringSlow() or leave it as-is. 2) Change non-toString() code as indicated in the patch or leave it as-is. If “as-is” is the answer in both cases, then it’s simply a matter of resolving the enhancement as “not an issue.” Thanks, Brian
Re: PING! Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Hi Paul, On Apr 15, 2014, at 12:23 AM, Paul Sandoz paul.san...@oracle.com wrote: Hi Brian, My inclination is if it ain't broke... and AFAICT nothing indicates toString it is particular broken [*], so perhaps just focus on the cleanup aspects and revisit the other when the JMM is updated (and maybe enhanced volatiles is ready)? Paul. Fine by me. I’ll update the webrev. The other aspects could come under this issue already on file: https://bugs.openjdk.java.net/browse/JDK-8033491. Thanks, Brian [*] although i still question the use of thread locals, but that is a separate issue that requires more investigation and i don't know if if can be improved.
Re: PING! Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Apr 15, 2014, at 9:24 AM, Brian Burkhalter brian.burkhal...@oracle.com wrote: On Apr 15, 2014, at 12:23 AM, Paul Sandoz paul.san...@oracle.com wrote: My inclination is if it ain't broke... and AFAICT nothing indicates toString it is particular broken [*], so perhaps just focus on the cleanup aspects and revisit the other when the JMM is updated (and maybe enhanced volatiles is ready)? I updated the patch to remove the toString() changes: http://cr.openjdk.java.net/~bpb/6375303/webrev.02/ Unless there are objections within the next 24 hours or so I will push this as-is. Fine by me. I’ll update the webrev. The other aspects could come under this issue already on file:https://bugs.openjdk.java.net/browse/JDK-8033491. I linked this issue to 6375303 and included links to the archive discussion thread. Thanks, Brian
PING! Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Second day back from vacation so I guess it’s time to beat this horse again … As there was no response to the message included below I am simply re-posting it. Thanks, Brian On Mar 25, 2014, at 7:19 AM, Brian Burkhalter brian.burkhal...@oracle.com wrote: On Mar 25, 2014, at 1:58 AM, Paul Sandoz paul.san...@oracle.com wrote: This is another example of a stable variable. I would like to re-iterate my scepticism that such changes are necessary in this case (i am not sure if it is possible to create a benchmark that could better exacerbate the concurrent overlap of calls to layoutChars). But, i do agree the discussion has been useful and interesting. I am happy either to leave the toString() code as it is or to change it to the variant with toStringSlow(). There is however other cleanup in the patch to consider. So it would be good to get consensus on the two points: 1) Change toString() to variant using toStringSlow() or leave it as-is. 2) Change non-toString() code as indicated in the patch or leave it as-is. If “as-is” is the answer in both cases, then it’s simply a matter of resolving the enhancement as “not an issue.” Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/24/2014 06:48 PM, Brian Burkhalter wrote: Hi Peter, On Mar 24, 2014, at 2:21 AM, Peter Levart peter.lev...@gmail.com mailto:peter.lev...@gmail.com wrote: Thanks to Aleksey for re-establishing the access, I bring you results of the microbenchmark from his quad-core Cortex-A9: Thanks to you and Aleksey for taking the initiative to test on that platform. ...as can be seen, the double-checked read-then-volatile-read+CAS trick is about 15% faster than classic volatile-read+CAS in this case. Just to be sure, by “double-checked read-then-volaite-read+CAS trick” you intend the variant with two methods toString/toStringSlow?” That's right. I split the code into two methods in both cases to make sure it gets inlined for fast-path. The variant in your webrev with single method has been demonstrated to exceed the maximum inline bytecode size. Regards, Peter Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 24, 2014, at 8:30 PM, Mike Duigou mike.dui...@oracle.com wrote: The other issue here, and why this thread has gone on so long, is that we've been trying to establish what the best idiom is (for 2014-) for this lazy initialization situation so that we can reuse it with less thought and review. In this case the chances of race are fairly low and the cost of extra layoutChars is bearable. In other cases where we are likely to use the same idiom this is less likely true. Peter's current implementation has the advantage that is the best-right answer for lazy initialization even in the contended/expensive case. I look forward to revisiting this again when the new JMM primitives are available. :-) This is another example of a stable variable. I would like to re-iterate my scepticism that such changes are necessary in this case (i am not sure if it is possible to create a benchmark that could better exacerbate the concurrent overlap of calls to layoutChars). But, i do agree the discussion has been useful and interesting. Paul.
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 25, 2014, at 1:58 AM, Paul Sandoz paul.san...@oracle.com wrote: This is another example of a stable variable. I would like to re-iterate my scepticism that such changes are necessary in this case (i am not sure if it is possible to create a benchmark that could better exacerbate the concurrent overlap of calls to layoutChars). But, i do agree the discussion has been useful and interesting. I am happy either to leave the toString() code as it is or to change it to the variant with toStringSlow(). There is however other cleanup in the patch to consider. So it would be good to get consensus on the two points: 1) Change toString() to variant using toStringSlow() or leave it as-is. 2) Change non-toString() code as indicated in the patch or leave it as-is. If “as-is” is the answer in both cases, then it’s simply a matter of resolving the enhancement as “not an issue.” Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/20/2014 08:49 AM, Aleksey Shipilev wrote: On 03/20/2014 11:06 AM, Peter Levart wrote: I was thinking about last night, for question: Why is this double-checked non-volatile-then-volatile trick not any faster than pure volatile variant even on ARM platform where volatile read should have some penalty compared to normal read?, might be in the fact that Raspberry Pi is a single-core/single-thread machine. Would anyone with JVM JIT compiler expertise care to share some insight? I suspect that on such platform, the compiler optimizes volatile accesses so that they are performed without otherwise necessary memory fences... Yes, at least C2 is known to not emit memory fences on uniprocessor machines. You need to have a multicore ARM. If you are still interested, contact me privately and I can arrange the access to my personal quad-core Cortex-A9. -Aleksey. Hi, Thanks to Aleksey for re-establishing the access, I bring you results of the microbenchmark from his quad-core Cortex-A9: JDK 8 options: -client, org.openjdk.jmh.Main parameters: .* -i 10 -r 5 -wi 5 -w 1 -f 1 [-t 1|max] --- Baseline, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.t.Bench6375303.testFirstToString avgt1069292.305 299.516 ns/op o.t.Bench6375303.testToString avgt10*20.003* 0.433 ns/op --- Baseline, 4-threads --- Benchmark Mode Samples Mean Mean error Units o.t.Bench6375303.testFirstToString avgt10 100390.024 2158.132 ns/op o.t.Bench6375303.testToString avgt10*20.151* 0.677 ns/op --- double-checked nonvolatile-then-volatile-read+CAS, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.t.Bench6375303.testFirstToString avgt1069951.406 221.516 ns/op o.t.Bench6375303.testToString avgt10*19.681* 0.025 ns/op --- double-checked nonvolatile-then-volatile-read+CAS, 4-threads --- Benchmark Mode Samples Mean Mean error Units o.t.Bench6375303.testFirstToString avgt10 104231.335 3842.095 ns/op o.t.Bench6375303.testToString avgt10*20.030* 0.595 ns/op --- classic volatile read+CAS, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.t.Bench6375303.testFirstToString avgt1069753.542 180.110 ns/op o.t.Bench6375303.testToString avgt10*23.285* 0.267 ns/op --- classic volatile read+CAS, 4-threads --- Benchmark Mode Samples Mean Mean error Units o.t.Bench6375303.testFirstToString avgt1099664.256 1814.090 ns/op o.t.Bench6375303.testToString avgt10*23.491* 0.606 ns/op ...as can be seen, the double-checked read-then-volatile-read+CAS trick is about 15% faster than classic volatile-read+CAS in this case. Regards, Peter
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Hi Peter, On Mar 22, 2014, at 2:01 AM, Peter Levart peter.lev...@gmail.com wrote: Looks good. Just a nit. In the following method: 3726 private static void matchScale(BigDecimal[] val) { I concur. One of 3 ifs is superfluous. Either the 1st one: private static void matchScale(BigDecimal[] val) { /* if (val[0].scale == val[1].scale) { // return } else */ if (val[0].scale val[1].scale) { val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY); } else if (val[1].scale val[0].scale) { val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY); } } I think this one. I’ll update the patch accordingly. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Hi Peter, On Mar 24, 2014, at 2:21 AM, Peter Levart peter.lev...@gmail.com wrote: Thanks to Aleksey for re-establishing the access, I bring you results of the microbenchmark from his quad-core Cortex-A9: Thanks to you and Aleksey for taking the initiative to test on that platform. ...as can be seen, the double-checked read-then-volatile-read+CAS trick is about 15% faster than classic volatile-read+CAS in this case. Just to be sure, by “double-checked read-then-volaite-read+CAS trick” you intend the variant with two methods toString/toStringSlow?” Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/24/2014 06:52 PM, Martin Buchholz wrote: On Wed, Mar 12, 2014 at 1:45 AM, Peter Levart peter.lev...@gmail.com mailto:peter.lev...@gmail.com wrote: What would the following cost? private transient String stringCache; public String toString() { String sc = stringCache; if (sc == null) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } } return sc; } I feel I'm missing something. If read - volatile read - CAS works, then why wouldn't read - CAS work and be slightly preferable, because races are unlikely? public String toString() { String sc = stringCache; if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } return sc; } ...yeah, I thought about that too. In any case, the overhead of volatile re-read is negligible in this case, since it happens on slow-path and it might reduce the chance of superfluos calls to layoutChars. Regards, Peter
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 24 2014, at 12:25 , Martin Buchholz marti...@google.com wrote: On Mon, Mar 24, 2014 at 11:25 AM, Peter Levart peter.lev...@gmail.com wrote: On 03/24/2014 06:52 PM, Martin Buchholz wrote: On Wed, Mar 12, 2014 at 1:45 AM, Peter Levart peter.lev...@gmail.com wrote: What would the following cost? private transient String stringCache; public String toString() { String sc = stringCache; if (sc == null) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } } return sc; } I feel I'm missing something. If read - volatile read - CAS works, then why wouldn't read - CAS work and be slightly preferable, because races are unlikely? public String toString() { String sc = stringCache; if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } return sc; } ...yeah, I thought about that too. In any case, the overhead of volatile re-read is negligible in this case, since it happens on slow-path and it might reduce the chance of superfluos calls to layoutChars. Hmmm OK. I still slightly prefer my version, but I can see there is a tradeoff, and the difference is very small. The other issue here, and why this thread has gone on so long, is that we've been trying to establish what the best idiom is (for 2014-) for this lazy initialization situation so that we can reuse it with less thought and review. In this case the chances of race are fairly low and the cost of extra layoutChars is bearable. In other cases where we are likely to use the same idiom this is less likely true. Peter's current implementation has the advantage that is the best-right answer for lazy initialization even in the contended/expensive case. I look forward to revisiting this again when the new JMM primitives are available. :-) Mike
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/22/2014 02:01 AM, Brian Burkhalter wrote: On Mar 20, 2014, at 12:49 AM, Aleksey Shipilev aleksey.shipi...@oracle.com mailto:aleksey.shipi...@oracle.com wrote: On 03/20/2014 11:06 AM, Peter Levart wrote: I was thinking about last night, for question: Why is this double-checked non-volatile-then-volatile trick not any faster than pure volatile variant even on ARM platform where volatile read should have some penalty compared to normal read?, might be in the fact that Raspberry Pi is a single-core/single-thread machine. Would anyone with JVM JIT compiler expertise care to share some insight? I suspect that on such platform, the compiler optimizes volatile accesses so that they are performed without otherwise necessary memory fences... Yes, at least C2 is known to not emit memory fences on uniprocessor machines. You need to have a multicore ARM. If you are still interested, contact me privately and I can arrange the access to my personal quad-core Cortex-A9. This is all very interesting but I would like to have closure so to speak on the patches that I posted. Hi Brian, I think the questions still unresolved are: 1) Is a change of toString() necessary? I can't speak of that since I don't have empirical data in what proportion of code it would be beneficial. I can imagine situations where returned strings are used as keys in HashMap and String's equals is optimized for same-instance comparison, but I don't know if such situations happen in practice... 2) If #1 is “yes” then which version of the patch? (I think the most recent) Aleksey Shipilev kindly let me access it's quad-code ARM Big Iron and before the connection to it broke yesterday, I managed to get the results of a single-threaded run of the microbenchmark. While differences were not spectacular, they were measurable (sorry I haven't saved the results). The variant with volatile stringCache field and CAS was a little slower than the double-checked nonvolatile-then-volatile-read-and-CAS trick. It's also important that the fast-path method does not exceed the maximum bytecode size for inlining (35 bytes by default, I think) so the variant with two methods toString/toStringSlow is necessary to avoid performance regression. 3) Are the other changes OK which are unrelated to toString()? Looks good. Just a nit. In the following method: 3726 private static void matchScale(BigDecimal[] val) { 3727 if (val[0].scale == val[1].scale) { 3728 // return 3729 } else if (val[0].scale val[1].scale) { 3730 val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY); 3731 } else if (val[1].scale val[0].scale) { 3732 val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY); 3733 } 3734 } One of 3 ifs is superfluous. Either the 1st one: private static void matchScale(BigDecimal[] val) { */* if (val[0].scale == val[1].scale) { // return } else */* if (val[0].scale val[1].scale) { val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY); } else if (val[1].scale val[0].scale) { val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY); } } ...or the last one: private static void matchScale(BigDecimal[] val) { if (val[0].scale == val[1].scale) { // return } else if (val[0].scale val[1].scale) { val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY); } else*/* if (val[1].scale val[0].scale) */* { val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY); } } Regards, Peter Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 20, 2014, at 12:49 AM, Aleksey Shipilev aleksey.shipi...@oracle.com wrote: On 03/20/2014 11:06 AM, Peter Levart wrote: I was thinking about last night, for question: Why is this double-checked non-volatile-then-volatile trick not any faster than pure volatile variant even on ARM platform where volatile read should have some penalty compared to normal read?, might be in the fact that Raspberry Pi is a single-core/single-thread machine. Would anyone with JVM JIT compiler expertise care to share some insight? I suspect that on such platform, the compiler optimizes volatile accesses so that they are performed without otherwise necessary memory fences... Yes, at least C2 is known to not emit memory fences on uniprocessor machines. You need to have a multicore ARM. If you are still interested, contact me privately and I can arrange the access to my personal quad-core Cortex-A9. This is all very interesting but I would like to have closure so to speak on the patches that I posted. I think the questions still unresolved are: 1) Is a change of toString() necessary? 2) If #1 is “yes” then which version of the patch? (I think the most recent) 3) Are the other changes OK which are unrelated to toString()? Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/20/2014 11:06 AM, Peter Levart wrote: I was thinking about last night, for question: Why is this double-checked non-volatile-then-volatile trick not any faster than pure volatile variant even on ARM platform where volatile read should have some penalty compared to normal read?, might be in the fact that Raspberry Pi is a single-core/single-thread machine. Would anyone with JVM JIT compiler expertise care to share some insight? I suspect that on such platform, the compiler optimizes volatile accesses so that they are performed without otherwise necessary memory fences... Yes, at least C2 is known to not emit memory fences on uniprocessor machines. You need to have a multicore ARM. If you are still interested, contact me privately and I can arrange the access to my personal quad-core Cortex-A9. -Aleksey.
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 14, 2014, at 7:17 AM, Brian Burkhalter brian.burkhal...@oracle.com wrote: On Mar 14, 2014, at 3:39 AM, Peter Levart wrote: But in general it would be better to just use ThreadLocalRandom.current() everywhere you use rnd variable. This is precisely it's purpose - a random number generator that is never contended. The overhead of ThreadLocalRandom.current() call is hardly measurable by itself. I'll update that and re-run some of the benchmarks later. Following up on the content above and this earlier message in the thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-March/025676.html I have posted a revised patch (NB: I know lines 2897-2906 should be elsewhere) http://cr.openjdk.java.net/~bpb/6375303/webrev.01/ and updated benchmark source (using only ThreadLocalRandom.current()) http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java and updated benchmark results for three different variations http://cr.openjdk.java.net/~bpb/6375303/6375303-bench-2.html This version of toString() is from Peter and dispenses with the volatile qualifier on stringCache. At least on my system, there is no statistically significant micro-performance difference among the three versions tested, viz., baseline, toString() change only, toString() change plus other cleanup. Any comments appreciated. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/19/2014 11:01 PM, Brian Burkhalter wrote: On Mar 14, 2014, at 7:17 AM, Brian Burkhalter brian.burkhal...@oracle.com mailto:brian.burkhal...@oracle.com wrote: On Mar 14, 2014, at 3:39 AM, Peter Levart wrote: But in general it would be better to just use ThreadLocalRandom.current() everywhere you use rnd variable. This is precisely it's purpose - a random number generator that is never contended. The overhead of ThreadLocalRandom.current() call is hardly measurable by itself. I'll update that and re-run some of the benchmarks later. Following up on the content above and this earlier message in the thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-March/025676.html I have posted a revised patch (NB: I know lines 2897-2906 should be elsewhere) http://cr.openjdk.java.net/~bpb/6375303/webrev.01/ http://cr.openjdk.java.net/%7Ebpb/6375303/webrev.01/ and updated benchmark source (using only ThreadLocalRandom.current()) http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java http://cr.openjdk.java.net/%7Ebpb/6375303/Bench6375303.java and updated benchmark results for three different variations http://cr.openjdk.java.net/~bpb/6375303/6375303-bench-2.html http://cr.openjdk.java.net/%7Ebpb/6375303/6375303-bench-2.html This version of toString() is from Peter and dispenses with the volatile qualifier on stringCache. At least on my system, there is no statistically significant micro-performance difference among the three versions tested, viz., baseline, toString() change only, toString() change plus other cleanup. Any comments appreciated. Thanks, Brian Hi Brian, Here's my promised run of your latest webrev and microbenchmark on ARM platform (Raspberry Pi) with just released JDK 8 for ARM (-client compiler, since -server does not work on Raspberry Pi): org.openjdk.jmh.Main parameters: .* -i 10 -r 5 -wi 5 -w 1 -f 1 -t 1 --- Baseline, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testFirstToString avgt10 330618.266 2211.637 ns/op o.s.Bench6375303.testToString avgt10 80.5460.134 ns/op --- Proposed webrev, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testFirstToString avgt10 326588.284 1714.892 ns/op o.s.Bench6375303.testToString avgt10 102.5820.295 ns/op --- Previous variant with volatile stringCache field, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testFirstToString avgt10 328795.783 2508.173 ns/op o.s.Bench6375303.testToString avgt10 105.7410.316 ns/op So both variants seem to be more or less the same but slower than baseline. Why would they be slower? Answer: they have more bytecodes. If I run with following JVM options: -XX:+UnlockDiagnosticVMOptions -XX:MaxInlineSize=100 (and only the testToString benchmark), I get: --- Baseline, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testToString avgt10 80.8390.742 ns/op --- Proposed webrev, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testToString avgt10 80.8510.771 ns/op --- Previous variant with volatile stringCache field, 1-thread --- Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testToString avgt10 80.8340.749 ns/op So the solution is to reduce number of bytecodes in toString(). For example, the following: public String toString() { String sc = stringCache; if (sc == null) { sc = toStringSlow(); } return sc; } private String toStringSlow() { String sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } return sc; } ...gives the good results even without special JVM options: Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testToString avgt10 80.9250.313 ns/op Regards, Peter
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
I On Mar 19 2014, at 15:01 , Brian Burkhalter brian.burkhal...@oracle.com wrote: On Mar 14, 2014, at 7:17 AM, Brian Burkhalter brian.burkhal...@oracle.com wrote: On Mar 14, 2014, at 3:39 AM, Peter Levart wrote: But in general it would be better to just use ThreadLocalRandom.current() everywhere you use rnd variable. This is precisely it's purpose - a random number generator that is never contended. The overhead of ThreadLocalRandom.current() call is hardly measurable by itself. I'll update that and re-run some of the benchmarks later. Following up on the content above and this earlier message in the thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-March/025676.html I have posted a revised patch (NB: I know lines 2897-2906 should be elsewhere) http://cr.openjdk.java.net/~bpb/6375303/webrev.01/ and updated benchmark source (using only ThreadLocalRandom.current()) http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java and updated benchmark results for three different variations http://cr.openjdk.java.net/~bpb/6375303/6375303-bench-2.html This version of toString() is from Peter and dispenses with the volatile qualifier on stringCache. At least on my system, there is no statistically significant micro-performance difference among the three versions tested, viz., baseline, toString() change only, toString() change plus other cleanup. Since the Unsafe.getObjectVolatile() allows use of volatile semantics without having to declare the field volatile I think this is a better idiom than what I had previously suggested. Hopefully this is a pattern we can use elsewhere. Are the java.util.concurrent.atomic imports still needed? I have not reviewed the other changes. Mike
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 19, 2014, at 4:32 PM, Mike Duigou mike.dui...@oracle.com wrote: Since the Unsafe.getObjectVolatile() allows use of volatile semantics without having to declare the field volatile I think this is a better idiom than what I had previously suggested. Hopefully this is a pattern we can use elsewhere. Are the java.util.concurrent.atomic imports still needed? No they are not. I can remove them (and move the code at lines 2897-2906 to following coding conventions) if this is eventually approved. I have not reviewed the other changes. Aside from toString() they are mostly straightforward cleanup. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Hi Peter, On Mar 19, 2014, at 4:32 PM, Peter Levart peter.lev...@gmail.com wrote: So the solution is to reduce number of bytecodes in toString(). For example, the following: public String toString() { String sc = stringCache; if (sc == null) { sc = toStringSlow(); } return sc; } private String toStringSlow() { String sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } return sc; } ...gives the good results even without special JVM options: Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testToString avgt10 80.9250.313 ns/op That’s great! I can re-try that version on my system for comparison. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 19, 2014, at 4:41 PM, Brian Burkhalter brian.burkhal...@oracle.com wrote: Benchmark Mode Samples Mean Mean error Units o.s.Bench6375303.testToString avgt10 80.9250.313 ns/op That’s great! I can re-try that version on my system for comparison. Here are the results on my system: http://cr.openjdk.java.net/~bpb/6375303/6375303-bench-3.html Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/14/2014 01:29 AM, Brian Burkhalter wrote: On Mar 12, 2014, at 2:08 AM, Peter Levart wrote: Huh! This is a ThreadLocalRandom anti-pattern. Thou should not use a ThreadLocalRandom instance in a thread that did not obtain it via a call to ThreadLocalRandom.current()… Good point. You could create a new BigInteger(512,*rnd*) in the static initializer and use it to create new BigDecima(bigInteger) in testFirstToStrin. I simply replaced ThreadLocalRandom with Random which probably creates a bottleneck but there is no significant difference in the results. That's probably because the time spent computing the toString representation initially (the concurrent part of testFirstToString benchmark) takes much more time than computing the random number (the synchronized/serialized part). The difference would probably show with bigger number of threads on a suitable machine of course. But in general it would be better to just use ThreadLocalRandom.current() everywhere you use rnd variable. This is precisely it's purpose - a random number generator that is never contended. The overhead of ThreadLocalRandom.current() call is hardly measurable by itself. Regards, Peter http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java http://cr.openjdk.java.net/%7Ebpb/6375303/Bench6375303.java http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html http://cr.openjdk.java.net/%7Ebpb/6375303/6375303-bench.html Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/14/2014 01:32 AM, Brian Burkhalter wrote: On Mar 12, 2014, at 1:45 AM, Peter Levart wrote: What would the following cost? private transient String stringCache; public String toString() { […] This version did not show any significant difference in the benchmark results (updated) http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html http://cr.openjdk.java.net/%7Ebpb/6375303/6375303-bench.html versus the other approaches. This is not to say that the benchmark is valid for any of them. This approach does have the advantage or not using volatile. I've forgot to run this on my Raspberry Pi. Stay tuned... Peter Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 14, 2014, at 3:39 AM, Peter Levart wrote: But in general it would be better to just use ThreadLocalRandom.current() everywhere you use rnd variable. This is precisely it's purpose - a random number generator that is never contended. The overhead of ThreadLocalRandom.current() call is hardly measurable by itself. I'll update that and re-run some of the benchmarks later. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 14, 2014, at 3:42 AM, Peter Levart wrote: This approach does have the advantage or not using volatile. I've forgot to run this on my Raspberry Pi. Stay tuned... I used to have a BeagleBoard on a previous project but do not have any of these sorts of devices at my disposal at this time. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 12, 2014, at 2:08 AM, Peter Levart wrote: Huh! This is a ThreadLocalRandom anti-pattern. Thou should not use a ThreadLocalRandom instance in a thread that did not obtain it via a call to ThreadLocalRandom.current()… Good point. You could create a new BigInteger(512, rnd) in the static initializer and use it to create new BigDecima(bigInteger) in testFirstToStrin. I simply replaced ThreadLocalRandom with Random which probably creates a bottleneck but there is no significant difference in the results. http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 12, 2014, at 1:45 AM, Peter Levart wrote: What would the following cost? private transient String stringCache; public String toString() { […] This version did not show any significant difference in the benchmark results (updated) http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html versus the other approaches. This is not to say that the benchmark is valid for any of them. This approach does have the advantage or not using volatile. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/11/2014 06:07 PM, Mike Duigou wrote: You are making stringCache volatile - performance penalty on ARM PPC. It is understood that the volatile is not free. For this purpose it was believed that the performance cost was a fair trade off to avoid the heap pollution. Regardless of the decision in this case the method used would seem to be the best caching pattern available. We do need to establish when it is appropriate to use this solution and what the tradeoffs are that would make other solutions more appropriate. Hi, What would the following cost? private transient String stringCache; public String toString() { String sc = stringCache; if (sc == null) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); if (sc == null) { sc = layoutChars(true); if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) { sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET); } } } return sc; } private static final Unsafe U = Unsafe.getUnsafe(); private static final long STRING_CACHE_OFFSET; static { try { STRING_CACHE_OFFSET = U.objectFieldOffset(BigDecimal.class.getDeclaredField(stringCache)); } catch (NoSuchFieldException e) { throw new Error(e); } } Regards, Peter
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/11/2014 06:10 PM, Brian Burkhalter wrote: Sergey, On Mar 11, 2014, at 1:18 AM, Sergey Kuksenko wrote: Could you share your benchmarks? Of course. Please see: benchmark source: http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java benchmark results: http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html Let us know should you find anything amiss. Do you have access to an ARM system (like Raspberry Pi ?) On Intel, volatile read is practically free - especially in micro benchmarks, when compared to normal read. The normal read can't be hoisted out of loops (the microbenchmark harness tries hard to prevent that)... I can try this on my Raspberry Pi when I get home. ;-) It would also be helpful if the JIT-ed assembly for the toString method could be dumped in various scenarios. Regards, Peter Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/11/2014 06:10 PM, Brian Burkhalter wrote: Sergey, On Mar 11, 2014, at 1:18 AM, Sergey Kuksenko wrote: Could you share your benchmarks? Of course. Please see: benchmark source: http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java public class Bench6375303 { * static final Random rnd = ThreadLocalRandom.current();* static final BigDecimal BIG_D = new BigDecimal(new BigInteger(512, rnd)); static final String BIG_D_STRING = BIG_D.toString(); @GenerateMicroBenchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public void testFirstToString(BlackHole bh) { BigDecimal bigD = new BigDecimal(new BigInteger(512,*rnd*)); bh.consume(bigD.toString()); } @GenerateMicroBenchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public void testToString(BlackHole bh) { bh.consume(BIG_D.toString()); } } Huh! This is a ThreadLocalRandom anti-pattern. Thou should not use a ThreadLocalRandom instance in a thread that did not obtain it via a call to ThreadLocalRandom.current()... You could create a new BigInteger(512, *rnd*) in the static initializer and use it to create new BigDecima(bigInteger) in testFirstToStrin. * Regards, Peter * benchmark results: http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html Let us know should you find anything amiss. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 11, 2014, at 9:05 AM, Sergey Kuksenko sergey.kukse...@oracle.com wrote: Brian, Mike. Could you explain what is the problem with the old caching? String is immutable, BigDecimal is immutable. So we have data race at that place, but it is safe data race. What is the problem if we create several identical strings? I was wondering the same. Not convinced it is so very important for toString() to guarantee one and only one reference to the same value. I would be inclined to leave this alone unless some data indicates otherwise. I do question the use of the ThreadLocal to in effect cache a StringBuilder and char array of 19 characters for memory reuse (that is heap pollution per thread! :-) ). This code was introduced ~5 years ago, perhaps it is time to revisit and measure using SPEC* benchmarks? Paul. You are making stringCache volatile - performance penalty on ARM PPC. Setting cache via CAS - performance penalty everywhere. If you insist to use AtomicFieldUpdater - the better way is to use lazySet, not CAS.
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Brian, Mike. Could you explain what is the problem with the old caching? String is immutable, BigDecimal is immutable. So we have data race at that place, but it is safe data race. What is the problem if we create several identical strings? You are making stringCache volatile - performance penalty on ARM PPC. Setting cache via CAS - performance penalty everywhere. If you insist to use AtomicFieldUpdater - the better way is to use lazySet, not CAS. On 03/04/2014 09:21 PM, Mike Duigou wrote: On Mar 4 2014, at 07:13 , Peter Levart peter.lev...@gmail.com wrote: On 03/04/2014 01:14 AM, Brian Burkhalter wrote: - add AtomicReferenceFieldUpdater-type constant for stringCache initialization Hi Brian, By using volatile read and CAS, there's still a chance that multiple concurrent threads will be invoking the layoutChars(true) concurrently, but you guarantee that only a single instance of String will ever be returned as a result of the toString() method. Is that what you were pursuing? Yes. (I provided the AtomicReferenceFieldUpdater code). The previous implementation had a benign data race that could result in a later layoutChars result replacing an earlier result and multiple string instances being returned. The new implementation, at small cost, prevents multiple different instances from being returned. Mike -- Best regards, Sergey Kuksenko
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Brian, Could you share your benchmarks? On 03/04/2014 04:14 AM, Brian Burkhalter wrote: Hello all, This review request concerns this issue and proposed patch: issue https://bugs.openjdk.java.net/browse/JDK-6375303 patch http://cr.openjdk.java.net/~bpb/6375303/webrev.00/ A review of the current code did not reveal anything clearly amiss in the nature of the the problems which instigated the original filing of this issue. Examination of the code did however suggest some code style and aesthetic improvements as well as a change to how one lazy initialization was implemented (here we go again ...). Summary: - clean up imports section - add AtomicReferenceFieldUpdater-type constant for stringCache initialization - change some existing static final constants to conform to Java coding style - change name of some local variables so as not to hide instance variables of the same name - add missing @Override annotations - modify lazy initialization of stringCache in toString() to use AtomicReferenceFieldUpdater constant - various other minor cleanup Note that the performance of initialization of stringCache was measured for the extant double-checked locking approach, an AtomicReferenceString, and the AtomicReferenceFieldUpdater and no significant degradation was observed for either of the latter versus the baseline for 1 and 4 threads. The updater has the advantage of adding only one object per class as opposed to one per instance. Thanks, Brian -- Best regards, Sergey Kuksenko
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 11 2014, at 01:05 , Sergey Kuksenko sergey.kukse...@oracle.com wrote: Brian, Mike. Could you explain what is the problem with the old caching? Concern over heap pollution with extra string copies was the motivation to ensure that only a single cached instance was ever returned. String is immutable, BigDecimal is immutable. So we have data race at that place, but it is safe data race. Agreed that it is safe. What is the problem if we create several identical strings? We can still create several identical strings. We will now only return the first assigned to stringCache. Other string instances will not escape. You are making stringCache volatile - performance penalty on ARM PPC. It is understood that the volatile is not free. For this purpose it was believed that the performance cost was a fair trade off to avoid the heap pollution. Regardless of the decision in this case the method used would seem to be the best caching pattern available. We do need to establish when it is appropriate to use this solution and what the tradeoffs are that would make other solutions more appropriate. Setting cache via CAS - performance penalty everywhere. This happens only once per BigDecimal (or not at all if the string format is never generated). If you insist to use AtomicFieldUpdater - the better way is to use lazySet, not CAS. That would seem to be inappropriate for this usage as lazySet would effectively remove that volatile. Brian did some benchmarking of various alternatives which he can probably provide if we think it is relevant to the performance issue you're concerned with. Mike
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Sergey, On Mar 11, 2014, at 1:18 AM, Sergey Kuksenko wrote: Could you share your benchmarks? Of course. Please see: benchmark source: http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java benchmark results: http://cr.openjdk.java.net/~bpb/6375303/6375303-bench.html Let us know should you find anything amiss. Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 3, 2014, at 4:14 PM, Brian Burkhalter wrote: PING-INg-Ing-ing! This review request concerns this issue and proposed patch: issue https://bugs.openjdk.java.net/browse/JDK-6375303 patch http://cr.openjdk.java.net/~bpb/6375303/webrev.00/ On Mar 4, 2014, at 9:21 AM, Mike Duigou wrote: On Mar 4 2014, at 07:13 , Peter Levart peter.lev...@gmail.com wrote: On 03/04/2014 01:14 AM, Brian Burkhalter wrote: - add AtomicReferenceFieldUpdater-type constant for stringCache initialization Hi Brian, By using volatile read and CAS, there's still a chance that multiple concurrent threads will be invoking the layoutChars(true) concurrently, but you guarantee that only a single instance of String will ever be returned as a result of the toString() method. Is that what you were pursuing? Yes. (I provided the AtomicReferenceFieldUpdater code). The previous implementation had a benign data race that could result in a later layoutChars result replacing an earlier result and multiple string instances being returned. The new implementation, at small cost, prevents multiple different instances from being returned. Mike If this looks copacetic, would a JDK 9 Reviewer kindly proffer their Imprimatur? Thanks, Brian
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On 03/04/2014 01:14 AM, Brian Burkhalter wrote: - add AtomicReferenceFieldUpdater-type constant for stringCache initialization Hi Brian, By using volatile read and CAS, there's still a chance that multiple concurrent threads will be invoking the layoutChars(true) concurrently, but you guarantee that only a single instance of String will ever be returned as a result of the toString() method. Is that what you were pursuing? Regards, Peter
Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal
On Mar 4 2014, at 07:13 , Peter Levart peter.lev...@gmail.com wrote: On 03/04/2014 01:14 AM, Brian Burkhalter wrote: - add AtomicReferenceFieldUpdater-type constant for stringCache initialization Hi Brian, By using volatile read and CAS, there's still a chance that multiple concurrent threads will be invoking the layoutChars(true) concurrently, but you guarantee that only a single instance of String will ever be returned as a result of the toString() method. Is that what you were pursuing? Yes. (I provided the AtomicReferenceFieldUpdater code). The previous implementation had a benign data race that could result in a later layoutChars result replacing an earlier result and multiple string instances being returned. The new implementation, at small cost, prevents multiple different instances from being returned. Mike
JDK 9 RFR of 6375303: Review use of caching in BigDecimal
Hello all, This review request concerns this issue and proposed patch: issue https://bugs.openjdk.java.net/browse/JDK-6375303 patch http://cr.openjdk.java.net/~bpb/6375303/webrev.00/ A review of the current code did not reveal anything clearly amiss in the nature of the the problems which instigated the original filing of this issue. Examination of the code did however suggest some code style and aesthetic improvements as well as a change to how one lazy initialization was implemented (here we go again ...). Summary: - clean up imports section - add AtomicReferenceFieldUpdater-type constant for stringCache initialization - change some existing static final constants to conform to Java coding style - change name of some local variables so as not to hide instance variables of the same name - add missing @Override annotations - modify lazy initialization of stringCache in toString() to use AtomicReferenceFieldUpdater constant - various other minor cleanup Note that the performance of initialization of stringCache was measured for the extant double-checked locking approach, an AtomicReferenceString, and the AtomicReferenceFieldUpdater and no significant degradation was observed for either of the latter versus the baseline for 1 and 4 threads. The updater has the advantage of adding only one object per class as opposed to one per instance. Thanks, Brian