Re: RFR 8203279 : Faster calculation of power of two

2018-05-18 Thread Claes Redestad

On 2018-05-17 22:44, Ivan Gerasimov wrote:

The following variant showed slightly better performance on my machine:

    static final int numberOfLeadingZeros(int i) {
    if (i <= 0)
    return i == 0 ? 32 : 0;
    int n = 31;
    if (i >= 1 << 16) { n -= 16; i >>>= 16; }
    if (i >= 1 <<  8) { n -=  8; i >>>=  8; }
    if (i >= 1 <<  4) { n -=  4; i >>>=  4; }
    if (i >= 1 <<  2) { n -=  2; i >>>=  2; }
    return n - (i >>> 1);
    }


Nice, this version also wins on all of -Xint and 
-XX:TieredStopAtLevel=1-3 (my version lost out slightly versus
the baseline on -Xint), so is potentially a startup enhancement even on 
platforms with C2 intrinsics.




I agree that improving Java implementation of numberOfLeadingZeros() 
can be done as a separate RFE. 


I filed https://bugs.openjdk.java.net/browse/JDK-8203352 for this.

/Claes


Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread Ivan Gerasimov

Thank you Claes for your review and the detailed analysis!


On 5/17/18 4:07 AM, Claes Redestad wrote:

Shouldn't this be called "Faster rounding up to nearest power of two"?


Yes, it's more accurate.

Patch looks OK to me, but I'd like to see numbers with the 
numberOfLeadingZeros intrinsic
disabled so that we ensure we're not incurring an unreasonable penalty 
on platforms who don't

have an intrinsic for this.

Running your benchmark with the intrinsic disabled[1] on my machine I 
see a 25-30% penalty
with testNew relative to testOld, which is perhaps a bit toomuch for 
comfort..


So I took a look at profiles for numberOfLeadingZeros with the 
intrinsic disabled and realized

it might be possible to improve:

diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 
16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 
12:44:53 2018 +0200

@@ -1621,12 +1621,12 @@
 // HD, Figure 5-6
 if (i <= 0)
 return i == 0 ? 32 : 0;
-int n = 1;
-if (i >>> 16 == 0) { n += 16; i <<= 16; }
-if (i >>> 24 == 0) { n +=  8; i <<=  8; }
-if (i >>> 28 == 0) { n +=  4; i <<=  4; }
-if (i >>> 30 == 0) { n +=  2; i <<=  2; }
-n -= i >>> 31;
+int n = 0;
+if (  i < 1 << 16) { n += 16; i <<= 16; }
+if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
+if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
+if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
+if (i >= 0) n++;
 return n;
 }

Adding a case that uses this version to your benchmark[2] and the new 
version is only about
10% slower than the baseline, with the added benefit that other uses 
of numberOfLeadingZeros
might see a speed-upif there's no intrinsic (runs with intrinsic 
disabled[1]):



Interesting.
It can probably be done with fewer comparisons, if the direction of all 
the shifts is inverted:


The following variant showed slightly better performance on my machine:

static final int numberOfLeadingZeros(int i) {
if (i <= 0)
return i == 0 ? 32 : 0;
int n = 31;
if (i >= 1 << 16) { n -= 16; i >>>= 16; }
if (i >= 1 <<  8) { n -=  8; i >>>=  8; }
if (i >= 1 <<  4) { n -=  4; i >>>=  4; }
if (i >= 1 <<  2) { n -=  2; i >>>=  2; }
return n - (i >>> 1);
}

I agree that improving Java implementation of numberOfLeadingZeros() can 
be done as a separate RFE.


With kind regards,
Ivan



Benchmark  (arg)  Mode  Cnt   Score   Error Units
TableFor.testNew   1  avgt6  28.343 ± 0.534  ns/op
TableFor.testNew  42  avgt6  26.458 ± 0.064  ns/op
TableFor.testNew2  1  avgt6  23.969 ± 0.201  ns/op
TableFor.testNew2 42  avgt6  23.934 ± 0.107  ns/op
TableFor.testOld   1  avgt6  21.615 ± 0.803  ns/op
TableFor.testOld  42  avgt6  21.418 ± 0.106  ns/op

So I think with the above patch to Integer.numberOfLeadingZeros we can 
get the benefit for
our supported platforms while staying roughly performance neutral on 
platforms without

an intrinsic. Not strictly necessary to fold it into this RFE, of course.

Thanks!

/Claes

[1] -XX:+UnlockDiagnosticVMOptions 
-XX:DisableIntrinsic=_numberOfLeadingZeros_i

[2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java

On 2018-05-17 05:48, David Holmes wrote:

Do you think it's good to go?


I think I'd rather defer to a more performance oriented reviewer - 
paging Claes! ;-)


David
- 




--
With kind regards,
Ivan Gerasimov



Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread Doug Lea
On 05/17/2018 07:07 AM, Claes Redestad wrote:
> Shouldn't this be called "Faster rounding up to nearest power of two"?
> 
> Patch looks OK to me, but I'd like to see numbers with the
> numberOfLeadingZeros intrinsic
> disabled so that we ensure we're not incurring an unreasonable penalty
> on platforms who don't
> have an intrinsic for this.
> 
> Running your benchmark with the intrinsic disabled[1] on my machine I
> see a 25-30% penalty
> with testNew relative to testOld, which is perhaps a bit toomuch for
> comfort..

This was the original reason for using the existing code -- hardware
support was uncommon. I like your idea of tweaking non-intrinsic
numberOfLeadingZeros to reduce penalty on the now-uncommon platforms
without support.

-Doug


> 
> So I took a look at profiles for numberOfLeadingZeros with the intrinsic
> disabled and realized
> it might be possible to improve:
> 
> diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
> --- a/src/java.base/share/classes/java/lang/Integer.java    Mon May
> 14 16:21:08 2018 +0200
> +++ b/src/java.base/share/classes/java/lang/Integer.java    Thu May
> 17 12:44:53 2018 +0200
> @@ -1621,12 +1621,12 @@
>  // HD, Figure 5-6
>  if (i <= 0)
>  return i == 0 ? 32 : 0;
> -    int n = 1;
> -    if (i >>> 16 == 0) { n += 16; i <<= 16; }
> -    if (i >>> 24 == 0) { n +=  8; i <<=  8; }
> -    if (i >>> 28 == 0) { n +=  4; i <<=  4; }
> -    if (i >>> 30 == 0) { n +=  2; i <<=  2; }
> -    n -= i >>> 31;
> +    int n = 0;
> +    if (  i < 1 << 16) { n += 16; i <<= 16; }
> +    if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
> +    if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
> +    if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
> +    if (i >= 0) n++;
>  return n;
>  }
> 
> Adding a case that uses this version to your benchmark[2] and the new
> version is only about
> 10% slower than the baseline, with the added benefit that other uses of
> numberOfLeadingZeros
> might see a speed-upif there's no intrinsic (runs with intrinsic
> disabled[1]):
> 
> Benchmark  (arg)  Mode  Cnt   Score   Error Units
> TableFor.testNew   1  avgt    6  28.343 ± 0.534  ns/op
> TableFor.testNew  42  avgt    6  26.458 ± 0.064  ns/op
> TableFor.testNew2  1  avgt    6  23.969 ± 0.201  ns/op
> TableFor.testNew2 42  avgt    6  23.934 ± 0.107  ns/op
> TableFor.testOld       1  avgt    6  21.615 ± 0.803  ns/op
> TableFor.testOld      42  avgt    6  21.418 ± 0.106  ns/op
> 
> So I think with the above patch to Integer.numberOfLeadingZeros we can
> get the benefit for
> our supported platforms while staying roughly performance neutral on
> platforms without
> an intrinsic. Not strictly necessary to fold it into this RFE, of course.
> 
> Thanks!
> 
> /Claes
> 
> [1] -XX:+UnlockDiagnosticVMOptions
> -XX:DisableIntrinsic=_numberOfLeadingZeros_i
> [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java
> 
> On 2018-05-17 05:48, David Holmes wrote:
>>> Do you think it's good to go?
>>
>> I think I'd rather defer to a more performance oriented reviewer -
>> paging Claes! ;-)
>>
>> David
>> - 
> 




Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread Paul Sandoz


> On May 17, 2018, at 4:07 AM, Claes Redestad  wrote:
> 
> Shouldn't this be called "Faster rounding up to nearest power of two"?
> 
> Patch looks OK to me, but I'd like to see numbers with the 
> numberOfLeadingZeros intrinsic
> disabled so that we ensure we're not incurring an unreasonable penalty on 
> platforms who don't
> have an intrinsic for this.
> 

I did a quick check and i believe it is intrinsic for C2 on all hotspot cpu 
platform code, zero does not count! (did not check Graal but i suspect it 
supports it too). 

Extra points for comparing the two approaches with C1 :-) I don’t think that is 
necessary but i am curious.

Paul.
 
> Running your benchmark with the intrinsic disabled[1] on my machine I see a 
> 25-30% penalty
> with testNew relative to testOld, which is perhaps a bit toomuch for comfort..
> 
> So I took a look at profiles for numberOfLeadingZeros with the intrinsic 
> disabled and realized
> it might be possible to improve:
> 
> diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
> --- a/src/java.base/share/classes/java/lang/Integer.javaMon May 14 
> 16:21:08 2018 +0200
> +++ b/src/java.base/share/classes/java/lang/Integer.javaThu May 17 
> 12:44:53 2018 +0200
> @@ -1621,12 +1621,12 @@
>  // HD, Figure 5-6
>  if (i <= 0)
>  return i == 0 ? 32 : 0;
> -int n = 1;
> -if (i >>> 16 == 0) { n += 16; i <<= 16; }
> -if (i >>> 24 == 0) { n +=  8; i <<=  8; }
> -if (i >>> 28 == 0) { n +=  4; i <<=  4; }
> -if (i >>> 30 == 0) { n +=  2; i <<=  2; }
> -n -= i >>> 31;
> +int n = 0;
> +if (  i < 1 << 16) { n += 16; i <<= 16; }
> +if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
> +if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
> +if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
> +if (i >= 0) n++;
>  return n;
>  }
> 
> Adding a case that uses this version to your benchmark[2] and the new version 
> is only about
> 10% slower than the baseline, with the added benefit that other uses of 
> numberOfLeadingZeros
> might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]):
> 
> Benchmark  (arg)  Mode  Cnt   Score   Error Units
> TableFor.testNew   1  avgt6  28.343 ± 0.534  ns/op
> TableFor.testNew  42  avgt6  26.458 ± 0.064  ns/op
> TableFor.testNew2  1  avgt6  23.969 ± 0.201  ns/op
> TableFor.testNew2 42  avgt6  23.934 ± 0.107  ns/op
> TableFor.testOld   1  avgt6  21.615 ± 0.803  ns/op
> TableFor.testOld  42  avgt6  21.418 ± 0.106  ns/op
> 
> So I think with the above patch to Integer.numberOfLeadingZeros we can get 
> the benefit for
> our supported platforms while staying roughly performance neutral on 
> platforms without
> an intrinsic. Not strictly necessary to fold it into this RFE, of course.
> 
> Thanks!
> 
> /Claes
> 
> [1] -XX:+UnlockDiagnosticVMOptions 
> -XX:DisableIntrinsic=_numberOfLeadingZeros_i
> [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java
> 
> On 2018-05-17 05:48, David Holmes wrote:
>>> Do you think it's good to go?
>> 
>> I think I'd rather defer to a more performance oriented reviewer - paging 
>> Claes! ;-)
>> 
>> David
>> - 
> 



Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread David Holmes

On 17/05/2018 9:24 PM, Claes Redestad wrote:

On 2018-05-17 13:15, David Holmes wrote:

diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 
16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 
12:44:53 2018 +0200

@@ -1621,12 +1621,12 @@
  // HD, Figure 5-6


The code no longer maps to that from HD so the comment would need fixing.


Seems that's a pre-existing issue, but yes.


The comment is correct for the first edition of the book. It's 5-11 in 
the second edition. :)


David


/Claes


Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread Claes Redestad



On 2018-05-17 13:15, David Holmes wrote:

diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 
16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 
12:44:53 2018 +0200

@@ -1621,12 +1621,12 @@
  // HD, Figure 5-6


The code no longer maps to that from HD so the comment would need fixing.


Seems that's a pre-existing issue, but yes.

/Claes


Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread David Holmes

On 17/05/2018 9:07 PM, Claes Redestad wrote:

Shouldn't this be called "Faster rounding up to nearest power of two"?

Patch looks OK to me, but I'd like to see numbers with the 
numberOfLeadingZeros intrinsic
disabled so that we ensure we're not incurring an unreasonable penalty 
on platforms who don't

have an intrinsic for this.

Running your benchmark with the intrinsic disabled[1] on my machine I 
see a 25-30% penalty
with testNew relative to testOld, which is perhaps a bit toomuch for 
comfort..


So I took a look at profiles for numberOfLeadingZeros with the intrinsic 
disabled and realized

it might be possible to improve:

diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java    Mon May 
14 16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java    Thu May 
17 12:44:53 2018 +0200

@@ -1621,12 +1621,12 @@
  // HD, Figure 5-6


The code no longer maps to that from HD so the comment would need fixing.

David
-


  if (i <= 0)
  return i == 0 ? 32 : 0;
-    int n = 1;
-    if (i >>> 16 == 0) { n += 16; i <<= 16; }
-    if (i >>> 24 == 0) { n +=  8; i <<=  8; }
-    if (i >>> 28 == 0) { n +=  4; i <<=  4; }
-    if (i >>> 30 == 0) { n +=  2; i <<=  2; }
-    n -= i >>> 31;
+    int n = 0;
+    if (  i < 1 << 16) { n += 16; i <<= 16; }
+    if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
+    if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
+    if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
+    if (i >= 0) n++;
  return n;
  }

Adding a case that uses this version to your benchmark[2] and the new 
version is only about
10% slower than the baseline, with the added benefit that other uses of 
numberOfLeadingZeros
might see a speed-upif there's no intrinsic (runs with intrinsic 
disabled[1]):


Benchmark  (arg)  Mode  Cnt   Score   Error Units
TableFor.testNew   1  avgt    6  28.343 ± 0.534  ns/op
TableFor.testNew  42  avgt    6  26.458 ± 0.064  ns/op
TableFor.testNew2  1  avgt    6  23.969 ± 0.201  ns/op
TableFor.testNew2 42  avgt    6  23.934 ± 0.107  ns/op
TableFor.testOld       1  avgt    6  21.615 ± 0.803  ns/op
TableFor.testOld      42  avgt    6  21.418 ± 0.106  ns/op

So I think with the above patch to Integer.numberOfLeadingZeros we can 
get the benefit for
our supported platforms while staying roughly performance neutral on 
platforms without

an intrinsic. Not strictly necessary to fold it into this RFE, of course.

Thanks!

/Claes

[1] -XX:+UnlockDiagnosticVMOptions 
-XX:DisableIntrinsic=_numberOfLeadingZeros_i

[2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java

On 2018-05-17 05:48, David Holmes wrote:

Do you think it's good to go?


I think I'd rather defer to a more performance oriented reviewer - 
paging Claes! ;-)


David
- 




Re: RFR 8203279 : Faster calculation of power of two

2018-05-17 Thread Claes Redestad

Shouldn't this be called "Faster rounding up to nearest power of two"?

Patch looks OK to me, but I'd like to see numbers with the 
numberOfLeadingZeros intrinsic
disabled so that we ensure we're not incurring an unreasonable penalty 
on platforms who don't

have an intrinsic for this.

Running your benchmark with the intrinsic disabled[1] on my machine I 
see a 25-30% penalty
with testNew relative to testOld, which is perhaps a bit toomuch for 
comfort..


So I took a look at profiles for numberOfLeadingZeros with the intrinsic 
disabled and realized

it might be possible to improve:

diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java    Mon May 
14 16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java    Thu May 
17 12:44:53 2018 +0200

@@ -1621,12 +1621,12 @@
 // HD, Figure 5-6
 if (i <= 0)
 return i == 0 ? 32 : 0;
-    int n = 1;
-    if (i >>> 16 == 0) { n += 16; i <<= 16; }
-    if (i >>> 24 == 0) { n +=  8; i <<=  8; }
-    if (i >>> 28 == 0) { n +=  4; i <<=  4; }
-    if (i >>> 30 == 0) { n +=  2; i <<=  2; }
-    n -= i >>> 31;
+    int n = 0;
+    if (  i < 1 << 16) { n += 16; i <<= 16; }
+    if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
+    if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
+    if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
+    if (i >= 0) n++;
 return n;
 }

Adding a case that uses this version to your benchmark[2] and the new 
version is only about
10% slower than the baseline, with the added benefit that other uses of 
numberOfLeadingZeros
might see a speed-upif there's no intrinsic (runs with intrinsic 
disabled[1]):


Benchmark  (arg)  Mode  Cnt   Score   Error Units
TableFor.testNew   1  avgt    6  28.343 ± 0.534  ns/op
TableFor.testNew  42  avgt    6  26.458 ± 0.064  ns/op
TableFor.testNew2  1  avgt    6  23.969 ± 0.201  ns/op
TableFor.testNew2 42  avgt    6  23.934 ± 0.107  ns/op
TableFor.testOld       1  avgt    6  21.615 ± 0.803  ns/op
TableFor.testOld      42  avgt    6  21.418 ± 0.106  ns/op

So I think with the above patch to Integer.numberOfLeadingZeros we can 
get the benefit for
our supported platforms while staying roughly performance neutral on 
platforms without

an intrinsic. Not strictly necessary to fold it into this RFE, of course.

Thanks!

/Claes

[1] -XX:+UnlockDiagnosticVMOptions 
-XX:DisableIntrinsic=_numberOfLeadingZeros_i

[2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java

On 2018-05-17 05:48, David Holmes wrote:

Do you think it's good to go?


I think I'd rather defer to a more performance oriented reviewer - 
paging Claes! ;-)


David
- 




Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread David Holmes



On 17/05/2018 1:01 PM, Ivan Gerasimov wrote:



On 5/16/18 7:44 PM, David Holmes wrote:

On 17/05/2018 12:30 PM, Ivan Gerasimov wrote:

Hi David!


On 5/16/18 6:12 PM, David Holmes wrote:

Hi Ivan,

Surely you need to back this up with some performance numbers! And 
verify not assume that numberOfLeadingZeroes is intrinsified!



Yes, good point.
Please find the benchmark with the results here:
http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java

It shows that the new version of HashMap. tableSizeFor() was 
(approximately) two times faster.


Impressive. :)


Thanks :)
Of course it's not performance critical code, but why not optimize it by 
a tiny bit if possible?


Do you think it's good to go?


I think I'd rather defer to a more performance oriented reviewer - 
paging Claes! ;-)


David
-



With kind regards,
Ivan


Thanks,
David


With kind regards,
Ivan


Cheers,
David

On 17/05/2018 10:32 AM, Ivan Gerasimov wrote:

Hello!

In a few places we have code that rounds an integer up to the 
nearest power of two.


It is done with a series of RSHOTFs and ORs, but it can possibly be 
done faster with the use of Integer.numberOfLeadingZeros (assuming 
it is intrinsified).


Would you please help review this trivial optimization:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/

For HashMap.tableSizeFor() I created a simple test with a loop from 
Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure 
the result is the same.


For TimSort.ensureCapacity() I checked all positive values of 
minCapacity.












Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread David Holmes

On 17/05/2018 12:55 PM, Martin Buchholz wrote:

Hello Ivan,

I've wondered about this myself.
Probably the performance is architecture dependent.
Probably a new method should be added to Integer and Long with
@HotspotIntrinsicCandidate.
That gives David another chance to practice his blind aarch64 assembly
language coding skills.


Ha! :)


It's hard to give a good name for this.  Hacker's delight calls it dclp2
and we can probably do better than that.
http://www.hackersdelight.org/hdcodetxt/clp2.c.txt
But I approve either way.


Interesting that the existing code is almost, but not quite clp2.

And numberOfLeadingZeroes also comes from HD.

If both forms get optimally compiled I'm very surprised that we would 
see such a performance difference.


David



On Wed, May 16, 2018 at 5:32 PM, Ivan Gerasimov 
wrote:


Hello!

In a few places we have code that rounds an integer up to the nearest
power of two.

It is done with a series of RSHOTFs and ORs, but it can possibly be done
faster with the use of Integer.numberOfLeadingZeros (assuming it is
intrinsified).

Would you please help review this trivial optimization:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/

For HashMap.tableSizeFor() I created a simple test with a loop from
Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result
is the same.

For TimSort.ensureCapacity() I checked all positive values of minCapacity.

--
With kind regards,
Ivan Gerasimov




Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread Ivan Gerasimov



On 5/16/18 7:44 PM, David Holmes wrote:

On 17/05/2018 12:30 PM, Ivan Gerasimov wrote:

Hi David!


On 5/16/18 6:12 PM, David Holmes wrote:

Hi Ivan,

Surely you need to back this up with some performance numbers! And 
verify not assume that numberOfLeadingZeroes is intrinsified!



Yes, good point.
Please find the benchmark with the results here:
http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java

It shows that the new version of HashMap. tableSizeFor() was 
(approximately) two times faster.


Impressive. :)


Thanks :)
Of course it's not performance critical code, but why not optimize it by 
a tiny bit if possible?


Do you think it's good to go?

With kind regards,
Ivan


Thanks,
David


With kind regards,
Ivan


Cheers,
David

On 17/05/2018 10:32 AM, Ivan Gerasimov wrote:

Hello!

In a few places we have code that rounds an integer up to the 
nearest power of two.


It is done with a series of RSHOTFs and ORs, but it can possibly be 
done faster with the use of Integer.numberOfLeadingZeros (assuming 
it is intrinsified).


Would you please help review this trivial optimization:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/

For HashMap.tableSizeFor() I created a simple test with a loop from 
Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure 
the result is the same.


For TimSort.ensureCapacity() I checked all positive values of 
minCapacity.










--
With kind regards,
Ivan Gerasimov



Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread Martin Buchholz
Hello Ivan,

I've wondered about this myself.
Probably the performance is architecture dependent.
Probably a new method should be added to Integer and Long with
@HotspotIntrinsicCandidate.
That gives David another chance to practice his blind aarch64 assembly
language coding skills.
It's hard to give a good name for this.  Hacker's delight calls it dclp2
and we can probably do better than that.
http://www.hackersdelight.org/hdcodetxt/clp2.c.txt
But I approve either way.


On Wed, May 16, 2018 at 5:32 PM, Ivan Gerasimov 
wrote:

> Hello!
>
> In a few places we have code that rounds an integer up to the nearest
> power of two.
>
> It is done with a series of RSHOTFs and ORs, but it can possibly be done
> faster with the use of Integer.numberOfLeadingZeros (assuming it is
> intrinsified).
>
> Would you please help review this trivial optimization:
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
> WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/
>
> For HashMap.tableSizeFor() I created a simple test with a loop from
> Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result
> is the same.
>
> For TimSort.ensureCapacity() I checked all positive values of minCapacity.
>
> --
> With kind regards,
> Ivan Gerasimov
>
>


Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread David Holmes

On 17/05/2018 12:30 PM, Ivan Gerasimov wrote:

Hi David!


On 5/16/18 6:12 PM, David Holmes wrote:

Hi Ivan,

Surely you need to back this up with some performance numbers! And 
verify not assume that numberOfLeadingZeroes is intrinsified!



Yes, good point.
Please find the benchmark with the results here:
http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java

It shows that the new version of HashMap. tableSizeFor() was 
(approximately) two times faster.


Impressive. :)

Thanks,
David


With kind regards,
Ivan


Cheers,
David

On 17/05/2018 10:32 AM, Ivan Gerasimov wrote:

Hello!

In a few places we have code that rounds an integer up to the nearest 
power of two.


It is done with a series of RSHOTFs and ORs, but it can possibly be 
done faster with the use of Integer.numberOfLeadingZeros (assuming it 
is intrinsified).


Would you please help review this trivial optimization:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/

For HashMap.tableSizeFor() I created a simple test with a loop from 
Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the 
result is the same.


For TimSort.ensureCapacity() I checked all positive values of 
minCapacity.








Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread Ivan Gerasimov

Hi David!


On 5/16/18 6:12 PM, David Holmes wrote:

Hi Ivan,

Surely you need to back this up with some performance numbers! And 
verify not assume that numberOfLeadingZeroes is intrinsified!



Yes, good point.
Please find the benchmark with the results here:
http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java

It shows that the new version of HashMap. tableSizeFor() was 
(approximately) two times faster.


With kind regards,
Ivan


Cheers,
David

On 17/05/2018 10:32 AM, Ivan Gerasimov wrote:

Hello!

In a few places we have code that rounds an integer up to the nearest 
power of two.


It is done with a series of RSHOTFs and ORs, but it can possibly be 
done faster with the use of Integer.numberOfLeadingZeros (assuming it 
is intrinsified).


Would you please help review this trivial optimization:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/

For HashMap.tableSizeFor() I created a simple test with a loop from 
Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the 
result is the same.


For TimSort.ensureCapacity() I checked all positive values of 
minCapacity.






--
With kind regards,
Ivan Gerasimov



Re: RFR 8203279 : Faster calculation of power of two

2018-05-16 Thread David Holmes

Hi Ivan,

Surely you need to back this up with some performance numbers! And 
verify not assume that numberOfLeadingZeroes is intrinsified!


Cheers,
David

On 17/05/2018 10:32 AM, Ivan Gerasimov wrote:

Hello!

In a few places we have code that rounds an integer up to the nearest 
power of two.


It is done with a series of RSHOTFs and ORs, but it can possibly be done 
faster with the use of Integer.numberOfLeadingZeros (assuming it is 
intrinsified).


Would you please help review this trivial optimization:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279
WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/

For HashMap.tableSizeFor() I created a simple test with a loop from 
Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the 
result is the same.


For TimSort.ensureCapacity() I checked all positive values of minCapacity.