Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread Brent Christian
Hi, Naoto The latest changes look good to me. -Brent On 7/22/20 10:23 AM, naoto.s...@oracle.com wrote: Hi, I revised the fix again, based on further suggestions: https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.05/ Changes from v.04 are (all in StringUTF16.java): - The short cut

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread Joe Wang
On 7/22/20 1:43 PM, naoto.s...@oracle.com wrote: Thanks Roger, Ah, I just saw your email just after I sent mine! They probably saw each other crossing and said hi on the way to inboxes ;-) On 7/22/20 1:38 PM, Roger Riggs wrote: Hi Naoto, Looks fine. (with Joe's suggestion) On 7/22/20

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread naoto . sato
Thanks Roger, Ah, I just saw your email just after I sent mine! On 7/22/20 1:38 PM, Roger Riggs wrote: Hi Naoto, Looks fine. (with Joe's suggestion) On 7/22/20 4:20 PM, Joe Wang wrote: Hi Naoto, The change looks good to me. "supLower" is indeed super slow :-) The only minor comment I have

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread Roger Riggs
Hi Naoto, Looks fine. (with Joe's suggestion) On 7/22/20 4:20 PM, Joe Wang wrote: Hi Naoto, The change looks good to me. "supLower" is indeed super slow :-) The only minor comment I have is that the compareCodePointCI method performs toUpperCase unconditionally. That's not a problem for the

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread naoto . sato
Hi Joe, Thank you for the consecutive reviews! On 7/22/20 1:20 PM, Joe Wang wrote: Hi Naoto, The change looks good to me. "supLower" is indeed super slow :-) The only minor comment I have is that the compareCodePointCI method performs toUpperCase unconditionally. That's not a problem for

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread Joe Wang
Hi Naoto, The change looks good to me. "supLower" is indeed super slow :-) The only minor comment I have is that the compareCodePointCI method performs toUpperCase unconditionally. That's not a problem for the regular case, where a check on cp1 == cp2 (line 337) is done prior to the method

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-22 Thread naoto . sato
Hi, I revised the fix again, based on further suggestions: https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.05/ Changes from v.04 are (all in StringUTF16.java): - The short cut now does case insensitive comparison that makes the fix closer to the previous implementation (for BMP

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread naoto . sato
Hi Brent, On 7/21/20 2:56 PM, Brent Christian wrote: Hi, Naoto I have a few comments: src/java.base/share/classes/java/lang/StringUTF16.java 379 private static int getSupplementaryCodePoint(byte[] ba, int cp, int index, int start, int end) I think it would be worth a small addition to

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread Brent Christian
Hi, Naoto I have a few comments: src/java.base/share/classes/java/lang/StringUTF16.java 379 private static int getSupplementaryCodePoint(byte[] ba, int cp, int index, int start, int end) I think it would be worth a small addition to the comment to reflect that non-surrogate chars are

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread naoto . sato
Thank you, Joe. I got it now. Will try out and benchmark. Naoto On 7/21/20 10:05 AM, Joe Wang wrote: On 7/20/2020 8:58 PM, naoto.s...@oracle.com wrote: The short-cut worked well. There's maybe a further optimization we could do to rid us of the performance concern (or the need to run

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread naoto . sato
Thank you, Roger. On 7/21/20 7:50 AM, Roger Riggs wrote: Hi Naoto, StringUTF16.java: 343, 348: The masking and comparisons seem awkward. I'd suggest just negating the value and testing for < 0 to flag the less usual case. If you continue with the flag bit, define your own static final

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread Mark Davis ☕️
One option is to have a fast path that uses char functions, up to the point where you hit a high surrogate, then drop into the slower codepoint path. That saves time for the high-runner cases. On the other hand, if the times are good enough, you might not need the complication. Mark On Fri,

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread Joe Wang
On 7/20/2020 8:58 PM, naoto.s...@oracle.com wrote: The short-cut worked well. There's maybe a further optimization we could do to rid us of the performance concern (or the need to run additional performance tests). Consider the cases where strings in comparison don't contain any sup

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-21 Thread Roger Riggs
Hi Naoto, StringUTF16.java: 343, 348: The masking and comparisons seem awkward. I'd suggest just negating the value and testing for < 0 to flag the less usual case. If you continue with the flag bit, define your own static final constant for that bit; It looks odd to be using

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-20 Thread naoto . sato
Hi Joe, On 7/20/20 7:14 PM, Joe Wang wrote: Hi Naoto, "Unless it showed 100% slower", wow, your tolerance is quite high :-). On the other hand, I do agree it's unlikely to show in specJVM (that's a speculation though). I am not saying 100% slowing is permissible :-) That's an example of

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-20 Thread Joe Wang
Hi Naoto, "Unless it showed 100% slower", wow, your tolerance is quite high :-). On the other hand, I do agree it's unlikely to show in specJVM (that's a speculation though). The short-cut worked well. There's maybe a further optimization we could do to rid us of the performance concern (or

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-20 Thread naoto . sato
Small correction in the updated part: http://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.04/ Naoto On 7/20/20 2:39 PM, naoto.s...@oracle.com wrote: Hi Joe, Thank you for your comments. On 7/20/20 11:20 AM, Joe Wang wrote: Hi Naoto, StringUTF16: line 384 - 388 seem unnecessary since

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-20 Thread naoto . sato
Hi Joe, Thank you for your comments. On 7/20/20 11:20 AM, Joe Wang wrote: Hi Naoto, StringUTF16: line 384 - 388 seem unnecessary since you'd only get there if 389:isHighSurrogate is not true. Good point. But more importantly, StringUTF16 has existing method "codePointAt" you may want to

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-20 Thread Joe Wang
Hi Naoto, StringUTF16: line 384 - 388 seem unnecessary since you'd only get there if 389:isHighSurrogate is not true. But more importantly, StringUTF16 has existing method "codePointAt" you may want to consider instead of adding a new method. Comparing to the base benchmark:

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-19 Thread naoto . sato
Hi Mark, Thank you for your comments. On 7/17/20 8:03 PM, Mark Davis ☕ wrote: One option is to have a fast path that uses char functions, up to the point where you hit a high surrogate, then drop into the slower codepoint path. That saves time for the high-runner cases. On the other hand,

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-17 Thread naoto . sato
Hi, Based on the suggestions, I modified the fix as follows: https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.01/ Changes from the initial revision are: - Shared the implementation between compareToCI() and regionMatchesCI() - Enabled immediate short cut if two code points match. -

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread James Laskey
  > On Jul 15, 2020, at 4:32 PM, Joe Wang wrote: > > Jim: I was referring to the rest of the process (the calls to > toCodePoint/toUpperCase/toLowerCase), not isHighSurrogate itself. But Roger > has a more comprehensive review on performance, and Naoto is planning on > preparing a JMH

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread Joe Wang
Jim: I was referring to the rest of the process (the calls to toCodePoint/toUpperCase/toLowerCase), not isHighSurrogate itself. But Roger has a more comprehensive review on performance, and Naoto is planning on preparing a JMH test. -Joe On 7/15/2020 11:46 AM, Jim Laskey wrote: Joe: This is

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread Jim Laskey
Joe: This is a defensive approach that I believe has minimal cost. public static boolean isHighSurrogate(char ch) { // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1); } > On Jul 15,

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread naoto . sato
Hi Joe, Thank you for your review. On 7/15/20 10:57 AM, Joe Wang wrote: Hi Naoto, In StringUTF16.java, if one is isHighSurrogate and the other not, you may quickly return without going through the rest of the process, probably not significant as cp1 and cp2 and/or u1 and u2 won't be equal

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread naoto . sato
Hi Roger, Thank you for your review and suggestions. On 7/15/20 10:56 AM, Roger Riggs wrote: Hi Naoto, Given the extra tests in the body of the loop, I think its worth finding or creating a JMH test for this and checking the performance. Good point. With performance in mind, I would

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread Joe Wang
Hi Naoto, In StringUTF16.java, if one is isHighSurrogate and the other not, you may quickly return without going through the rest of the process, probably not significant as cp1 and cp2 and/or u1 and u2 won't be equal anyways. But it could skip a couple of toCodePoint/toUpperCase/toLowerCase

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread Roger Riggs
Hi Naoto, Given the extra tests in the body of the loop, I think its worth finding or creating a JMH test for this and checking the performance. With performance in mind, I would try to fall back to the UC/LC conversions only when the bytes don't match.  See

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread naoto . sato
Thank you, Jim, for the quick review! On 7/15/20 9:26 AM, Jim Laskey wrote: I think I'm good with this. +1 Asides: 325 int cp1 = (int)getChar(value, k1); 326 int cp2 = (int)getChar(other, k2); I would be tempted to short cut by exiting when not equal, but I think

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

2020-07-15 Thread Jim Laskey
I think I'm good with this. +1 Asides: 325 int cp1 = (int)getChar(value, k1); 326 int cp2 = (int)getChar(other, k2); I would be tempted to short cut by exiting when not equal, but I think we agreed we need to allow for upper/lowers on different planes. In the UTF-16