Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-02-02 Thread Ulf Zibis

Hi Claes,

sorry for the delay, I didn't catch you message before, because you CC'd 
me which causes that my email filter misses the ListID tag to catch.


Thanks for the update, times change with the compact strings feature.

Another approach could be to access the String constant via Unsafe.

-Ulf


Am 30.01.2018 um 11:12 schrieb Claes Redestad:

Hi,

not sure what you're suggesting, exactly, but it seems Charset coders 
generate
String literals which are then unpacked to char[]/byte[]'s in static 
blocks.


While constructing arrays from String literals might have been a startup
optimization in the past, it is now mainly a workaround to method 
bytecode

limits, see https://bugs.openjdk.java.net/browse/JDK-8185104

/Claes

On 2018-01-30 01:10, Ulf Zibis wrote:

Hi,

you may can construct the lookup table as a static String constant to 
slim down the footprint and treat it as a byte[] as we do in the 
Charset coders.


-Ulf


Am 29.01.2018 um 22:04 schrieb Claes Redestad:

I'll experiment and analyze a bit more tomorrow to see if I can find a
good explanation for the observed difference and/or a solution that
allows us to slim down the lookup array.









Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-30 Thread Paul Sandoz
OK! A surprising increase in code size. Thanks for checking.

Paul.

> On Jan 30, 2018, at 2:40 AM, Claes Redestad  wrote:
> 
> The ASM is harder than usual to follow and compare since everything is
> inlined aggressively, but it seems that since CharacterDataLatin1 is only
> invoked for 0 <= ch < 256 (invariant established in CharacterData.of(int ch))
> then the compiler is able to elide bounds check entirely when the byte[] is
> also 256 elements.
> 
> Shrinking the array adds more branches and grows the compiled code size
> for UUID.fromString from 751 to 1341 bytes, so it seems that even from a
> footprint perspective then keeping the array at 256 elements is a win. :-)
> 
> /Claes
> 
> On 2018-01-29 22:04, Claes Redestad wrote:
>> Right, I can't really explain why, but the effect is very visible and
>> reproducible in microbenchmarks. Differences in generated ASM might
>> be indicating that the inlining behavior changes enough to shift the
>> result around; maybe a job for @ForceInline?
>> 
>> I'll experiment and analyze a bit more tomorrow to see if I can find a
>> good explanation for the observed difference and/or a solution that
>> allows us to slim down the lookup array.
>> 
>> /Claes
>> 
>> On 2018-01-29 20:38, Paul Sandoz wrote:
>>> Smaller in only the upper bound? I would an explicit upper bounds check 
>>> would dominate that of the bounds check for the array itself.
>>> 
>>> Paul.
>>> 
 On Jan 29, 2018, at 11:37 AM, Claes Redestad > wrote:
 
 I ran with a smaller byte[] initially and saw about a 10% improvement from 
 removing the branch, which is why I felt the superfluous bytes were 
 motivated.
 
 /Claes
 
 Paul Sandoz > 
 skrev: (29 januari 2018 19:14:44 CET)
 
 
 On Jan 29, 2018, at 9:44 AM, Martin Buchholz
 > wrote:
 Thanks. I might try to shrink the size of the static array,
 by only storing values between '0' and 'z', at the cost of
 slightly greater lookup costs for each char.
 
 I was wondering the same, or just clip the end of the array to’z'
 
 if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds check 
 for DIGITS
value = DIGITS[ch];
...
 }
 
 Paul.
 
 On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad
 > wrote:
 
 Hi, for the latin1 block of CharacterData we can improve
 the digit(int, int) implementation by providing an
 optimized lookup table. This improves microbenchmarks
 exercising Integer.parseInt, Long.parseLong and
 UUID.fromString etc by around 50%for typical inputs.
 Webrev:
 http://cr.openjdk.java.net/~redestad/8196331/open.00/
 
 Bug: https://bugs.openjdk.java.net/browse/JDK-8196331 The
 lookup array is pre-calculated to minimize startup impact
 (adds 1,027 bytecodes executed during initialization) /Claes
 
 
 -- 
 Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>> 
>> 
> 



Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-30 Thread Xueming Shen
+1

On Jan 30, 2018, at 2:40 AM, Claes Redestad  wrote:

The ASM is harder than usual to follow and compare since everything is
inlined aggressively, but it seems that since CharacterDataLatin1 is only
invoked for 0 <= ch < 256 (invariant established in CharacterData.of(int ch))
then the compiler is able to elide bounds check entirely when the byte[] is
also 256 elements.

Shrinking the array adds more branches and grows the compiled code size
for UUID.fromString from 751 to 1341 bytes, so it seems that even from a
footprint perspective then keeping the array at 256 elements is a win. :-)

/Claes

> On 2018-01-29 22:04, Claes Redestad wrote:
> Right, I can't really explain why, but the effect is very visible and
> reproducible in microbenchmarks. Differences in generated ASM might
> be indicating that the inlining behavior changes enough to shift the
> result around; maybe a job for @ForceInline?
> 
> I'll experiment and analyze a bit more tomorrow to see if I can find a
> good explanation for the observed difference and/or a solution that
> allows us to slim down the lookup array.
> 
> /Claes
> 
>> On 2018-01-29 20:38, Paul Sandoz wrote:
>> Smaller in only the upper bound? I would an explicit upper bounds check 
>> would dominate that of the bounds check for the array itself.
>> 
>> Paul.
>> 
>>> On Jan 29, 2018, at 11:37 AM, Claes Redestad >> > wrote:
>>> 
>>> I ran with a smaller byte[] initially and saw about a 10% improvement from 
>>> removing the branch, which is why I felt the superfluous bytes were 
>>> motivated.
>>> 
>>> /Claes
>>> 
>>> Paul Sandoz > skrev: 
>>> (29 januari 2018 19:14:44 CET)
>>> 
>>> 
>>> On Jan 29, 2018, at 9:44 AM, Martin Buchholz
>>> > wrote:
>>> Thanks. I might try to shrink the size of the static array,
>>> by only storing values between '0' and 'z', at the cost of
>>> slightly greater lookup costs for each char.
>>> 
>>> I was wondering the same, or just clip the end of the array to’z'
>>> 
>>> if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds check 
>>> for DIGITS
>>>value = DIGITS[ch];
>>>...
>>> }
>>> 
>>> Paul.
>>> 
>>> On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad
>>> >> > wrote:
>>> 
>>> Hi, for the latin1 block of CharacterData we can improve
>>> the digit(int, int) implementation by providing an
>>> optimized lookup table. This improves microbenchmarks
>>> exercising Integer.parseInt, Long.parseLong and
>>> UUID.fromString etc by around 50%for typical inputs.
>>> Webrev:
>>> http://cr.openjdk.java.net/~redestad/8196331/open.00/
>>> 
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8196331 The
>>> lookup array is pre-calculated to minimize startup impact
>>> (adds 1,027 bytecodes executed during initialization) /Claes
>>> 
>>> 
>>> -- 
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>> 
> 




Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-30 Thread Claes Redestad

The ASM is harder than usual to follow and compare since everything is
inlined aggressively, but it seems that since CharacterDataLatin1 is only
invoked for 0 <= ch < 256 (invariant established in CharacterData.of(int 
ch))

then the compiler is able to elide bounds check entirely when the byte[] is
also 256 elements.

Shrinking the array adds more branches and grows the compiled code size
for UUID.fromString from 751 to 1341 bytes, so it seems that even from a
footprint perspective then keeping the array at 256 elements is a win. :-)

/Claes

On 2018-01-29 22:04, Claes Redestad wrote:

Right, I can't really explain why, but the effect is very visible and
reproducible in microbenchmarks. Differences in generated ASM might
be indicating that the inlining behavior changes enough to shift the
result around; maybe a job for @ForceInline?

I'll experiment and analyze a bit more tomorrow to see if I can find a
good explanation for the observed difference and/or a solution that
allows us to slim down the lookup array.

/Claes

On 2018-01-29 20:38, Paul Sandoz wrote:
Smaller in only the upper bound? I would an explicit upper bounds 
check would dominate that of the bounds check for the array itself.


Paul.

On Jan 29, 2018, at 11:37 AM, Claes Redestad 
> wrote:


I ran with a smaller byte[] initially and saw about a 10% 
improvement from removing the branch, which is why I felt the 
superfluous bytes were motivated.


/Claes

Paul Sandoz > 
skrev: (29 januari 2018 19:14:44 CET)



    On Jan 29, 2018, at 9:44 AM, Martin Buchholz
    > wrote:
    Thanks. I might try to shrink the size of the static array,
    by only storing values between '0' and 'z', at the cost of
    slightly greater lookup costs for each char.

    I was wondering the same, or just clip the end of the array to’z'

    if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds 
check for DIGITS

   value = DIGITS[ch];
   ...
    }

    Paul.

    On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad
    > wrote:

    Hi, for the latin1 block of CharacterData we can improve
    the digit(int, int) implementation by providing an
    optimized lookup table. This improves microbenchmarks
    exercising Integer.parseInt, Long.parseLong and
    UUID.fromString etc by around 50%for typical inputs.
    Webrev:
http://cr.openjdk.java.net/~redestad/8196331/open.00/

    Bug: https://bugs.openjdk.java.net/browse/JDK-8196331 The
    lookup array is pre-calculated to minimize startup impact
    (adds 1,027 bytecodes executed during initialization) 
/Claes



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.








Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-30 Thread Claes Redestad

Hi,

not sure what you're suggesting, exactly, but it seems Charset coders 
generate
String literals which are then unpacked to char[]/byte[]'s in static 
blocks.


While constructing arrays from String literals might have been a startup
optimization in the past, it is now mainly a workaround to method bytecode
limits, see https://bugs.openjdk.java.net/browse/JDK-8185104

/Claes

On 2018-01-30 01:10, Ulf Zibis wrote:

Hi,

you may can construct the lookup table as a static String constant to 
slim down the footprint and treat it as a byte[] as we do in the 
Charset coders.


-Ulf


Am 29.01.2018 um 22:04 schrieb Claes Redestad:

I'll experiment and analyze a bit more tomorrow to see if I can find a
good explanation for the observed difference and/or a solution that
allows us to slim down the lookup array.






Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-29 Thread Ulf Zibis

Hi,

you may can construct the lookup table as a static String constant to 
slim down the footprint and treat it as a byte[] as we do in the Charset 
coders.


-Ulf


Am 29.01.2018 um 22:04 schrieb Claes Redestad:

I'll experiment and analyze a bit more tomorrow to see if I can find a
good explanation for the observed difference and/or a solution that
allows us to slim down the lookup array.




Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-29 Thread Claes Redestad

Right, I can't really explain why, but the effect is very visible and
reproducible in microbenchmarks. Differences in generated ASM might
be indicating that the inlining behavior changes enough to shift the
result around; maybe a job for @ForceInline?

I'll experiment and analyze a bit more tomorrow to see if I can find a
good explanation for the observed difference and/or a solution that
allows us to slim down the lookup array.

/Claes

On 2018-01-29 20:38, Paul Sandoz wrote:
Smaller in only the upper bound? I would an explicit upper bounds 
check would dominate that of the bounds check for the array itself.


Paul.

On Jan 29, 2018, at 11:37 AM, Claes Redestad 
> wrote:


I ran with a smaller byte[] initially and saw about a 10% improvement 
from removing the branch, which is why I felt the superfluous bytes 
were motivated.


/Claes

Paul Sandoz > 
skrev: (29 januari 2018 19:14:44 CET)



On Jan 29, 2018, at 9:44 AM, Martin Buchholz
> wrote:
Thanks. I might try to shrink the size of the static array,
by only storing values between '0' and 'z', at the cost of
slightly greater lookup costs for each char. 



I was wondering the same, or just clip the end of the array to’z'

if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds check for 
DIGITS
   value = DIGITS[ch];
   ...
}

Paul.

On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad
> wrote:

Hi, for the latin1 block of CharacterData we can improve
the digit(int, int) implementation by providing an
optimized lookup table. This improves microbenchmarks
exercising Integer.parseInt, Long.parseLong and
UUID.fromString etc by around 50%for typical inputs.
Webrev:
http://cr.openjdk.java.net/~redestad/8196331/open.00/

Bug: https://bugs.openjdk.java.net/browse/JDK-8196331 The
lookup array is pre-calculated to minimize startup impact
(adds 1,027 bytecodes executed during initialization) /Claes 




--
Sent from my Android device with K-9 Mail. Please excuse my brevity.






Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-29 Thread Paul Sandoz
Smaller in only the upper bound? I would an explicit upper bounds check would 
dominate that of the bounds check for the array itself.

Paul.

> On Jan 29, 2018, at 11:37 AM, Claes Redestad  
> wrote:
> 
> I ran with a smaller byte[] initially and saw about a 10% improvement from 
> removing the branch, which is why I felt the superfluous bytes were motivated.
> 
> /Claes
> 
> Paul Sandoz  skrev: (29 januari 2018 19:14:44 CET)
> 
> 
>  On Jan 29, 2018, at 9:44 AM, Martin Buchholz  wrote:
>  
>  Thanks.  I might try to shrink the size of the static array, by only
>  storing values between '0' and 'z', at the cost of slightly greater lookup
>  costs for each char.
>  
> 
> I was wondering the same, or just clip the end of the array to’z'
> 
> if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds check for 
> DIGITS
>   value = DIGITS[ch];
>   ...
> }
> 
> Paul.
> 
>  On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad 
>  wrote:
>  
>  Hi,
>  
>  for the latin1 block of CharacterData we can improve the
>  digit(int, int) implementation by providing an optimized lookup
>  table. This improves microbenchmarks exercising Integer.parseInt,
>  Long.parseLong and UUID.fromString etc by around 50%for typical
>  inputs.
>  
>  Webrev: http://cr.openjdk.java.net/~redestad/8196331/open.00/ 
> 
>  Bug: https://bugs.openjdk.java.net/browse/JDK-8196331 
> 
>  
>  The lookup array is pre-calculated to minimize startup impact
>  (adds 1,027 bytecodes executed during initialization)
>  
>  /Claes
>  
> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-29 Thread Claes Redestad
I ran with a smaller byte[] initially and saw about a 10% improvement from 
removing the branch, which is why I felt the superfluous bytes were motivated.

/Claes

Paul Sandoz  skrev: (29 januari 2018 19:14:44 CET)
>
>
>> On Jan 29, 2018, at 9:44 AM, Martin Buchholz 
>wrote:
>> 
>> Thanks.  I might try to shrink the size of the static array, by only
>> storing values between '0' and 'z', at the cost of slightly greater
>lookup
>> costs for each char.
>> 
>
>I was wondering the same, or just clip the end of the array to’z'
>
>if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds check
>for DIGITS
>  value = DIGITS[ch];
>  ...
>}
>
>Paul.
>
>> On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad
>
>> wrote:
>> 
>>> Hi,
>>> 
>>> for the latin1 block of CharacterData we can improve the
>>> digit(int, int) implementation by providing an optimized lookup
>>> table. This improves microbenchmarks exercising Integer.parseInt,
>>> Long.parseLong and UUID.fromString etc by around 50%for typical
>>> inputs.
>>> 
>>> Webrev: http://cr.openjdk.java.net/~redestad/8196331/open.00/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8196331
>>> 
>>> The lookup array is pre-calculated to minimize startup impact
>>> (adds 1,027 bytecodes executed during initialization)
>>> 
>>> /Claes
>>> 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-29 Thread Paul Sandoz


> On Jan 29, 2018, at 9:44 AM, Martin Buchholz  wrote:
> 
> Thanks.  I might try to shrink the size of the static array, by only
> storing values between '0' and 'z', at the cost of slightly greater lookup
> costs for each char.
> 

I was wondering the same, or just clip the end of the array to’z'

if (ch <= ‘z’ && radix …) { // Might even fold the upper bounds check for DIGITS
  value = DIGITS[ch];
  ...
}

Paul.

> On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad 
> wrote:
> 
>> Hi,
>> 
>> for the latin1 block of CharacterData we can improve the
>> digit(int, int) implementation by providing an optimized lookup
>> table. This improves microbenchmarks exercising Integer.parseInt,
>> Long.parseLong and UUID.fromString etc by around 50%for typical
>> inputs.
>> 
>> Webrev: http://cr.openjdk.java.net/~redestad/8196331/open.00/
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8196331
>> 
>> The lookup array is pre-calculated to minimize startup impact
>> (adds 1,027 bytecodes executed during initialization)
>> 
>> /Claes
>> 



Re: RFR: 8196331: Optimize Character.digit for latin1 input

2018-01-29 Thread Martin Buchholz
Thanks.  I might try to shrink the size of the static array, by only
storing values between '0' and 'z', at the cost of slightly greater lookup
costs for each char.

On Mon, Jan 29, 2018 at 3:15 AM, Claes Redestad 
wrote:

> Hi,
>
> for the latin1 block of CharacterData we can improve the
> digit(int, int) implementation by providing an optimized lookup
> table. This improves microbenchmarks exercising Integer.parseInt,
> Long.parseLong and UUID.fromString etc by around 50%for typical
> inputs.
>
> Webrev: http://cr.openjdk.java.net/~redestad/8196331/open.00/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8196331
>
> The lookup array is pre-calculated to minimize startup impact
> (adds 1,027 bytecodes executed during initialization)
>
> /Claes
>