Hi, Naoto
The latest changes look good to me.
-Brent
On 7/22/20 10:23 AM, naoto.s...@oracle.com wrote:
Hi,
I revised the fix again, based on further suggestions:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.05/
Changes from v.04 are (all in StringUTF16.java):
- The short cut n
On 7/22/20 1:43 PM, naoto.s...@oracle.com wrote:
Thanks Roger,
Ah, I just saw your email just after I sent mine!
They probably saw each other crossing and said hi on the way to inboxes ;-)
On 7/22/20 1:38 PM, Roger Riggs wrote:
Hi Naoto,
Looks fine. (with Joe's suggestion)
On 7/22/20 4:
Thanks Roger,
Ah, I just saw your email just after I sent mine!
On 7/22/20 1:38 PM, Roger Riggs wrote:
Hi Naoto,
Looks fine. (with Joe's suggestion)
On 7/22/20 4:20 PM, Joe Wang wrote:
Hi Naoto,
The change looks good to me. "supLower" is indeed super slow :-)
The only minor comment I have
Hi Naoto,
Looks fine. (with Joe's suggestion)
On 7/22/20 4:20 PM, Joe Wang wrote:
Hi Naoto,
The change looks good to me. "supLower" is indeed super slow :-)
The only minor comment I have is that the compareCodePointCI method
performs toUpperCase unconditionally. That's not a problem for the
Hi Joe,
Thank you for the consecutive reviews!
On 7/22/20 1:20 PM, Joe Wang wrote:
Hi Naoto,
The change looks good to me. "supLower" is indeed super slow :-)
The only minor comment I have is that the compareCodePointCI method
performs toUpperCase unconditionally. That's not a problem for the
Hi Naoto,
The change looks good to me. "supLower" is indeed super slow :-)
The only minor comment I have is that the compareCodePointCI method
performs toUpperCase unconditionally. That's not a problem for the
regular case, where a check on cp1 == cp2 (line 337) is done prior to
the method ca
Hi,
I revised the fix again, based on further suggestions:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.05/
Changes from v.04 are (all in StringUTF16.java):
- The short cut now does case insensitive comparison that makes the fix
closer to the previous implementation (for BMP char
Hi Brent,
On 7/21/20 2:56 PM, Brent Christian wrote:
Hi, Naoto
I have a few comments:
src/java.base/share/classes/java/lang/StringUTF16.java
379 private static int getSupplementaryCodePoint(byte[] ba, int cp,
int index, int start, int end)
I think it would be worth a small addition to
Hi, Naoto
I have a few comments:
src/java.base/share/classes/java/lang/StringUTF16.java
379 private static int getSupplementaryCodePoint(byte[] ba, int cp,
int index, int start, int end)
I think it would be worth a small addition to the comment to reflect
that non-surrogate chars are re
Thank you, Joe. I got it now. Will try out and benchmark.
Naoto
On 7/21/20 10:05 AM, Joe Wang wrote:
On 7/20/2020 8:58 PM, naoto.s...@oracle.com wrote:
The short-cut worked well. There's maybe a further optimization we
could do to rid us of the performance concern (or the need to run
additi
Thank you, Roger.
On 7/21/20 7:50 AM, Roger Riggs wrote:
Hi Naoto,
StringUTF16.java:
343, 348: The masking and comparisons seem awkward.
I'd suggest just negating the value and testing for < 0 to flag the less
usual case.
If you continue with the flag bit, define your own static final constan
One option is to have a fast path that uses char functions, up to the point
where you hit a high surrogate, then drop into the slower codepoint path.
That saves time for the high-runner cases.
On the other hand, if the times are good enough, you might not need the
complication.
Mark
On Fri, Jul
On 7/20/2020 8:58 PM, naoto.s...@oracle.com wrote:
The short-cut worked well. There's maybe a further optimization we
could do to rid us of the performance concern (or the need to run
additional performance tests). Consider the cases where strings in
comparison don't contain any sup characte
Hi Naoto,
StringUTF16.java:
343, 348: The masking and comparisons seem awkward.
I'd suggest just negating the value and testing for < 0 to flag the less
usual case.
If you continue with the flag bit, define your own static final constant
for that bit;
It looks odd to be using Integer.MIN_VALU
Hi Joe,
On 7/20/20 7:14 PM, Joe Wang wrote:
Hi Naoto,
"Unless it showed 100% slower", wow, your tolerance is quite high :-).
On the other hand, I do agree it's unlikely to show in specJVM (that's a
speculation though).
I am not saying 100% slowing is permissible :-) That's an example of
de
Hi Naoto,
"Unless it showed 100% slower", wow, your tolerance is quite high :-).
On the other hand, I do agree it's unlikely to show in specJVM (that's a
speculation though).
The short-cut worked well. There's maybe a further optimization we could
do to rid us of the performance concern (or
Small correction in the updated part:
http://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.04/
Naoto
On 7/20/20 2:39 PM, naoto.s...@oracle.com wrote:
Hi Joe,
Thank you for your comments.
On 7/20/20 11:20 AM, Joe Wang wrote:
Hi Naoto,
StringUTF16: line 384 - 388 seem unnecessary since y
Hi Joe,
Thank you for your comments.
On 7/20/20 11:20 AM, Joe Wang wrote:
Hi Naoto,
StringUTF16: line 384 - 388 seem unnecessary since you'd only get there
if 389:isHighSurrogate is not true.
Good point.
But more importantly, StringUTF16
has existing method "codePointAt" you may want to c
Hi Naoto,
StringUTF16: line 384 - 388 seem unnecessary since you'd only get there
if 389:isHighSurrogate is not true. But more importantly, StringUTF16
has existing method "codePointAt" you may want to consider instead of
adding a new method.
Comparing to the base benchmark:
StringCompareToI
Hi Mark,
Thank you for your comments.
On 7/17/20 8:03 PM, Mark Davis ☕ wrote:
One option is to have a fast path that uses char functions, up to the
point where you hit a high surrogate, then drop into the slower
codepoint path. That saves time for the high-runner cases.
On the other hand, if
Hi,
Based on the suggestions, I modified the fix as follows:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.01/
Changes from the initial revision are:
- Shared the implementation between compareToCI() and regionMatchesCI()
- Enabled immediate short cut if two code points match.
- Cr
👍
📱
> On Jul 15, 2020, at 4:32 PM, Joe Wang wrote:
>
> Jim: I was referring to the rest of the process (the calls to
> toCodePoint/toUpperCase/toLowerCase), not isHighSurrogate itself. But Roger
> has a more comprehensive review on performance, and Naoto is planning on
> preparing a JMH tes
Jim: I was referring to the rest of the process (the calls to
toCodePoint/toUpperCase/toLowerCase), not isHighSurrogate itself. But
Roger has a more comprehensive review on performance, and Naoto is
planning on preparing a JMH test.
-Joe
On 7/15/2020 11:46 AM, Jim Laskey wrote:
Joe: This is
Joe: This is a defensive approach that I believe has minimal cost.
public static boolean isHighSurrogate(char ch) {
// Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
}
> On Jul 15, 2020,
Hi Joe,
Thank you for your review.
On 7/15/20 10:57 AM, Joe Wang wrote:
Hi Naoto,
In StringUTF16.java, if one is isHighSurrogate and the other not, you
may quickly return without going through the rest of the process,
probably not significant as cp1 and cp2 and/or u1 and u2 won't be equal
a
Hi Roger,
Thank you for your review and suggestions.
On 7/15/20 10:56 AM, Roger Riggs wrote:
Hi Naoto,
Given the extra tests in the body of the loop, I think its worth finding
or creating
a JMH test for this and checking the performance.
Good point.
With performance in mind, I would try
Hi Naoto,
In StringUTF16.java, if one is isHighSurrogate and the other not, you
may quickly return without going through the rest of the process,
probably not significant as cp1 and cp2 and/or u1 and u2 won't be equal
anyways. But it could skip a couple of
toCodePoint/toUpperCase/toLowerCase
Hi Naoto,
Given the extra tests in the body of the loop, I think its worth finding
or creating
a JMH test for this and checking the performance.
With performance in mind, I would try to fall back to the UC/LC
conversions only
when the bytes don't match. See java.util.Arrays.mismatch(byte[],
Thank you, Jim, for the quick review!
On 7/15/20 9:26 AM, Jim Laskey wrote:
I think I'm good with this. +1
Asides:
325 int cp1 = (int)getChar(value, k1);
326 int cp2 = (int)getChar(other, k2);
I would be tempted to short cut by exiting when not equal, but I think w
I think I'm good with this. +1
Asides:
325 int cp1 = (int)getChar(value, k1);
326 int cp2 = (int)getChar(other, k2);
I would be tempted to short cut by exiting when not equal, but I think we
agreed we need to allow for upper/lowers on different planes.
In the UTF-16
30 matches
Mail list logo