Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-07 Thread Ulf Zibis
Am 07.02.2014 15:27, schrieb Remi Forax: if the JIT is not able to fold for(;iThen JIT must examine the complete for statement+block to prove that i will never become >len. I suspect such examination will be manageable. Am 07.02.2014 15:58, schrieb Vitaly Davidovich: One issue here though

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-07 Thread Vitaly Davidovich
I think JIT handles trivial loops with break fine. One issue here though is the overall method size and complexity (esp if some other methods are inlined into it). If the first loop represents a common/fast path (no surrogates and string already lower case) then I'd move the rest of the code out

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-07 Thread Remi Forax
On 02/07/2014 01:30 PM, Ulf Zibis wrote: Am 06.02.2014 18:40, schrieb Xueming Shen: Are we good enough to go? :-) While it took much longer than I would have expected, but I'm happy with the latest result. http://cr.openjdk.java.net/~sherman/8032012/webrev/ Except "if (first >= len)" instead

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-07 Thread Ulf Zibis
Am 06.02.2014 18:40, schrieb Xueming Shen: Are we good enough to go? :-) While it took much longer than I would have expected, but I'm happy with the latest result. http://cr.openjdk.java.net/~sherman/8032012/webrev/ Except "if (first >= len)" instead "if (first = len)" I'm with you. -Ulf

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-07 Thread Remi Forax
On 02/07/2014 11:32 AM, Paul Sandoz wrote: On Feb 6, 2014, at 6:40 PM, Xueming Shen wrote: Paul, toUpperCaseEx is overridden in CharacterData00/Latin1. Those two are under gensrc/java/lang. It might be possible to combine them some day (need to dig out some decade long history and probably th

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-07 Thread Paul Sandoz
On Feb 6, 2014, at 6:40 PM, Xueming Shen wrote: > Paul, > > toUpperCaseEx is overridden in CharacterData00/Latin1. Those two are > under gensrc/java/lang. It might be possible to combine them some day > (need to dig out some decade long history and probably there is compability > concern...), b

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Ulf Zibis
To cover this problem, I filed a new language suggestion: https://bugs.openjdk.java.net/browse/JDK-8033813 :-) -Ulf Am 06.02.2014 18:59, schrieb Ulf Zibis: I still more like the break-to-label approach. It looks more logical and saves one comparison. This might count on very very short strings

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Xueming Shen
On 02/06/2014 11:41 AM, Ulf Zibis wrote: Me again ;-) On 02/06/2014 10:30 AM, Ulf Zibis wrote: But why not just coding: 2558 char ch = value[first]; 2559 if (Character.isSurrogate(ch) { 2560 hasSurr = true; 2561 break; 2562 }

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Ulf Zibis
Me again ;-) On 02/06/2014 10:30 AM, Ulf Zibis wrote: But why not just coding: 2558 char ch = value[first]; 2559 if (Character.isSurrogate(ch) { 2560 hasSurr = true; 2561 break; 2562 } 2563 if (ch != Character.toLow

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Xueming Shen
On 02/06/2014 10:30 AM, Ulf Zibis wrote: Am 06.02.2014 18:40, schrieb Xueming Shen: Ulf, webrev has been updated to use isBmpCodePoint() as suggested. Another benefit of using isBmpCodePoint() is that some Character.ERROR checks are no longer necessary Great, I didn't see that. But why not

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Ulf Zibis
Am 06.02.2014 18:40, schrieb Xueming Shen: Ulf, webrev has been updated to use isBmpCodePoint() as suggested. Another benefit of using isBmpCodePoint() is that some Character.ERROR checks are no longer necessary Great, I didn't see that. But why not just coding: 2558 char ch = va

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Xueming Shen
Paul, toUpperCaseEx is overridden in CharacterData00/Latin1. Those two are under gensrc/java/lang. It might be possible to combine them some day (need to dig out some decade long history and probably there is compability concern...), but definitely is beyond the scope of this "improvement" :-) U

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Ulf Zibis
Am 05.02.2014 21:06, schrieb Xueming Shen: http://cr.openjdk.java.net/~sherman/8032012/webrev webrev has been updated accordingly. I still more like the break-to-label approach. It looks more logical and saves one comparison. This might count on very very short strings, but I would rename it

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Ulf Zibis
Hi, Am 06.02.2014 00:57, schrieb Xueming Shen: On 02/05/2014 03:28 PM, Ulf Zibis wrote: Additionally you could use Character.isSurrogate() and Character.isSupplementaryCodeappropriate places. Both are better optimized for JIT. j.l.C.isSupplementaryCodePoint() checks up boundary of supp, we pro

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-06 Thread Paul Sandoz
On Feb 6, 2014, at 5:37 AM, Xueming Shen wrote: > Fair enough. I don't think it's going to be a measurable difference. I have > updated the webrev > to use the Character.isSurrogate() for better readability. > > http://cr.openjdk.java.net/~sherman/8032012/webrev > One last point, sorry :-)

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Xueming Shen
Fair enough. I don't think it's going to be a measurable difference. I have updated the webrev to use the Character.isSurrogate() for better readability. http://cr.openjdk.java.net/~sherman/8032012/webrev -Sherman On 2/5/14 6:21 PM, Vitaly Davidovich wrote: i2c conversion should not cost any

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Vitaly Davidovich
i2c conversion should not cost anything; it'll just make jit use low 16 bits of the registers for (unsigned) comparisons. I haven't checked this though, but that's what I'd expect. Sent from my phone On Feb 5, 2014 7:27 PM, "Xueming Shen" wrote: On 02/05/2014 03:28 PM, Ulf Zibis wrote: > Addit

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Xueming Shen
On 02/05/2014 03:28 PM, Ulf Zibis wrote: Additionally you could use Character.isSurrogate() and Character.isSupplementaryCode j.l.C.isSupplementaryCodePoint() checks up boundary of supp, we probably don't need it here, as the returning code point is either a ERROR or a valid unicode code poin

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Ulf Zibis
Additionally you could use Character.isSurrogate() and Character.isSupplementaryCodeappropriate places. Both are better optimized for JIT. -Ulf Am 05.02.2014 22:30, schrieb Xueming Shen: Hi Remi, Good suggestion. Now the "common case" path is much simple and faster :-) I'm seeing a 5%-10% boo

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Remi Forax
On 02/05/2014 10:30 PM, Xueming Shen wrote: Hi Remi, Good suggestion. Now the "common case" path is much simple and faster :-) I'm seeing a 5%-10% boost for the normal-non-surrogates case. And it appears the bmp+surr mixed is getting faster as well. Though I would assume the it would get slowe

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Xueming Shen
Hi Remi, Good suggestion. Now the "common case" path is much simple and faster :-) I'm seeing a 5%-10% boost for the normal-non-surrogates case. And it appears the bmp+surr mixed is getting faster as well. Though I would assume the it would get slower in case of "no-case-folding" + surrogates. Bu

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Remi Forax
On 02/05/2014 07:43 PM, Xueming Shen wrote: On 02/05/2014 11:09 AM, Paul Sandoz wrote: On Feb 5, 2014, at 6:58 PM, Xueming Shen wrote: Hi, Let's try to wrap it up, otherwise I may drop the ball somewhere :-) On 01/22/2014 07:20 AM, Paul Sandoz wrote: if (lang == "tr" || lang == "az" || l

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Xueming Shen
Hi Paul, http://cr.openjdk.java.net/~sherman/8032012/webrev webrev has been updated accordingly. Thanks! -Sherman On 02/05/2014 10:43 AM, Xueming Shen wrote: On 02/05/2014 11:09 AM, Paul Sandoz wrote: On Feb 5, 2014, at 6:58 PM, Xueming Shen wrote: Hi, Let's try to wrap it up, otherwise

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Xueming Shen
On 02/05/2014 11:09 AM, Paul Sandoz wrote: On Feb 5, 2014, at 6:58 PM, Xueming Shen wrote: Hi, Let's try to wrap it up, otherwise I may drop the ball somewhere :-) On 01/22/2014 07:20 AM, Paul Sandoz wrote: if (lang == "tr" || lang == "az" || lang == "lt") { // local dependent return

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Paul Sandoz
On Feb 5, 2014, at 6:58 PM, Xueming Shen wrote: > Hi, > > Let's try to wrap it up, otherwise I may drop the ball somewhere :-) > > On 01/22/2014 07:20 AM, Paul Sandoz wrote: >> >> >>> >> if (lang == "tr" || lang == "az" || lang == "lt") { >> // local dependent >> return toLowerCaseEx(re

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-02-05 Thread Xueming Shen
Hi, Let's try to wrap it up, otherwise I may drop the ball somewhere :-) On 01/22/2014 07:20 AM, Paul Sandoz wrote: if (lang == "tr" || lang == "az" || lang == "lt") { // local dependent return toLowerCaseEx(result, firstUpper, locale, true); } // otherwise false is passed to subsequ

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-01-22 Thread Ulf Zibis
Am 22.01.2014 16:20, schrieb Paul Sandoz: On Jan 21, 2014, at 11:05 PM, Xueming Shen wrote: On 01/20/2014 09:24 AM, Paul Sandoz wrote: - it would be nice to get rid of the pseudo goto using the "scan" labelled block. webrev has been updated to remove the pseudo goto by checking the "first"

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-01-22 Thread Paul Sandoz
On Jan 21, 2014, at 11:05 PM, Xueming Shen wrote: > Hi Paul, thanks for reviewing the changeset, comment inlined below. > > On 01/20/2014 09:24 AM, Paul Sandoz wrote: >> >> Some quick comments. >> >> In String.toLowerCase: >> >> - it would be nice to get rid of the pseudo goto using the "sca

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-01-21 Thread Xueming Shen
Hi Paul, thanks for reviewing the changeset, comment inlined below. On 01/20/2014 09:24 AM, Paul Sandoz wrote: Some quick comments. In String.toLowerCase: - it would be nice to get rid of the pseudo goto using the "scan" labelled block. webrev has been updated to remove the pseudo goto by

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-01-20 Thread Paul Sandoz
On Jan 16, 2014, at 7:08 PM, Xueming Shen wrote: > Hi, > > The proposed changeset is to improve the performance (both speed and memory > usage) of String.toLowerCase/toUpperCase, by > > (1) to separate the "most likely" use scenario (non supplementary character, > no special > case mapping hand

Re: RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-01-16 Thread Ulf Zibis
Hi Sherman, nice performance trick :-) Do you remember the discussion about : https://bugs.openjdk.java.net/browse/JDK-6939278 ;-) (this bug was originally filed by me!) -Ulf Am 16.01.2014 19:08, schrieb Xueming Shen: Hi, The proposed changeset is to improve the performance (both speed and

RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

2014-01-16 Thread Xueming Shen
Hi, The proposed changeset is to improve the performance (both speed and memory usage) of String.toLowerCase/toUpperCase, by (1) to separate the "most likely" use scenario (non supplementary character, no special case mapping handling) into simple/quick iteration loop code path (2) to use the p