Re: How to extract matches from (\d+[hms])+ ?

2014-09-25 Thread Xueming Shen
java/time/Duration.java has the pattern for the duration, which Is similar to 
Guy's suggestion.


 On Sep 24, 2014, at 10:08 PM, Wang Weijun weijun.w...@oracle.com wrote:
 
 
 On Sep 25, 2014, at 13:06, Guy Steele guy.ste...@oracle.com wrote:
 
 (A lurker sticking his nose in here! :-)  Is it your intent also to match 
 30s1h or 20m30m as a time duration?
 
 If not, you might be better off with a pattern such as 
 ((\\d+)h)?((\\d+)m)?((\\d+)s)? and then the whole problem caused by the 
 outer + iteration disappear (but you may need to check whether the 
 original string was empty).
 
 Yes, this is much better.
 
 
 But maybe that takes all the fun out of it.
 
 Let someone else enjoy it then. :-)
 
 Thanks
 Max
 
 
 --Guy Steele
 
 On Sep 25, 2014, at 12:51 AM, Wang Weijun weijun.w...@oracle.com wrote:
 
 Hi Sherman
 
 I want to match a time duration like 1h20m30s and 2h. It looks like if 
 I directly use the pattern ((\\d+)([hms]))+, group(2) and group (3) only 
 return the last match (i.e. 30 and s for 1h20m30s). So I tried multiple 
 matching with (\\d)([hms]) only, but find() does not always match from 
 the beginning, and lookingAt() does not advance after one call.
 
 This is my code now;
 
 int start = 0;
 while (true) {
 if (!m.find() || m.start() != start) {
  throw new Exception();
 }
 start = m.end();
 print(m.group(1), m.group(2));
 if (m.hitEnd()) break;
 }
 print(Done);
 
 Is this the correct way?
 
 Thanks
 Max
 


Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation

2014-09-25 Thread Aleksey Shipilev
Hi Peter,

On 09/25/2014 02:46 AM, Peter Levart wrote:
 http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java

Interesting.

I have to say it once again:
 a) It is *an error* to use static finals as the benchmark input. They
are perfectly constant-foldable in way too many cases. Break this habit,
please.
 b) Explicit Blackholes are not needed, and just returning int from
@Benchmark method helps readability a lot. Please break this habit as
well. Having readable and maintainable benchmarks is a key.

 This is really great!

 Couldn't this be a tweak via HotSpot, instead uglifying and bloating
 the Java and hence the byte code?

+1

 This is for HotSpot compiler guys to answer. Theoretically I think it is
 possible. But it would have to be tailored to the very specific use case
 and I don't know if such specific transformation would have wide-enough
 applicability. If it would only speed-up String.hashCode and very
 similar loops, it is less trouble to do that by hand in one or few
 places...

I would think this happens in user-specified hashCode() over arrays.
IDEs would routinely inject the loop like that or delegate to
Arrays.hashCode, that does the same loop.

In other words, I would like to see this fixed on compiler side. This
seems to be the strength-reduction playing tricks with loop unrolling,
I'll submit a bug shortly.

-Aleksey.




Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation

2014-09-25 Thread Aleksey Shipilev
On 09/25/2014 11:40 AM, Aleksey Shipilev wrote:
 This is for HotSpot compiler guys to answer. Theoretically I think it is
 possible. But it would have to be tailored to the very specific use case
 and I don't know if such specific transformation would have wide-enough
 applicability. If it would only speed-up String.hashCode and very
 similar loops, it is less trouble to do that by hand in one or few
 places...
 
 I would think this happens in user-specified hashCode() over arrays.
 IDEs would routinely inject the loop like that or delegate to
 Arrays.hashCode, that does the same loop.
 
 In other words, I would like to see this fixed on compiler side. This
 seems to be the strength-reduction playing tricks with loop unrolling,
 I'll submit a bug shortly.

https://bugs.openjdk.java.net/browse/JDK-8059113

-Aleksey.



Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation

2014-09-25 Thread Peter Levart

On 09/25/2014 09:40 AM, Aleksey Shipilev wrote:

Hi Peter,

On 09/25/2014 02:46 AM, Peter Levart wrote:

http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java

Interesting.

I have to say it once again:
  a) It is *an error* to use static finals as the benchmark input. They
are perfectly constant-foldable in way too many cases. Break this habit,
please.


Hi Aleksey,

The constant in this example is only a reference to the char[] array. 
It's content is not. In String, this is a final instance field, which 
behaves similarly inside an instance method (usually it is read only 
once per method invocation).



  b) Explicit Blackholes are not needed, and just returning int from
@Benchmark method helps readability a lot. Please break this habit as
well. Having readable and maintainable benchmarks is a key.


Ok, here's a modified benchmark:

http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench2.java

Which behaves similarly.

Here are it's results:

 * Results on JDK9, Linux, i7-2600K CPU, JMH args: -f 1 -wi 5 -i 8 -gc true
 *
 * Benchmark Mode  Samples Score  Score 
error  Units
 * j.t.HashBench2._hashCode thrpt8   8308858.217 
353019.084  ops/s
 * j.t.HashBench2.hashCode0 thrpt8   8207337.729 
217048.634  ops/s
 * j.t.HashBench2.hashCode1 thrpt8  13359572.359 
345736.675  ops/s
 * j.t.HashBench2.hashCode2 thrpt8  15310621.202 
238369.017  ops/s
 * j.t.HashBench2.hashCode3 thrpt8  17637944.829 
232155.847  ops/s
 * j.t.HashBench2.hashCode3ithrpt8  17724181.444 
509913.288  ops/s
 * j.t.HashBench2.hashCode3xthrpt8   8344128.432 
159508.813  ops/s
 * j.t.HashBench2.hashCode4 thrpt8  16526850.489 
969549.448  ops/s
 * j.t.HashBench2.hashCode5 thrpt8  17567765.554 
917934.885  ops/s
 * j.t.HashBench2.hashCode6 thrpt8  17705074.332 
419405.652  ops/s
 * j.t.HashBench2.hashCode7 thrpt8  18805633.563 
209181.299  ops/s
 * j.t.HashBench2.hashCode8 thrpt8  18300123.201 
376681.550  ops/s


It would be interesting to see how it behaves on different CPUs.




This is really great!

Couldn't this be a tweak via HotSpot, instead uglifying and bloating
the Java and hence the byte code?

+1


This is for HotSpot compiler guys to answer. Theoretically I think it is
possible. But it would have to be tailored to the very specific use case
and I don't know if such specific transformation would have wide-enough
applicability. If it would only speed-up String.hashCode and very
similar loops, it is less trouble to do that by hand in one or few
places...

I would think this happens in user-specified hashCode() over arrays.
IDEs would routinely inject the loop like that or delegate to
Arrays.hashCode, that does the same loop.


Arrays.hasCode() can be improved this way too.



In other words, I would like to see this fixed on compiler side. This
seems to be the strength-reduction playing tricks with loop unrolling,
I'll submit a bug shortly.


As I said, I don't think it has anything to do with loop unrolling. The 
transformation I applied in hashCode1,2,3, ... 8 produces code that 
executes 2, 3, 4, ... 9 independent multiplications in each chunk, which 
allow CPU's pipeline to execute them in parallel. I had to manually 
unroll the loop a bit just to achieve this transformation. But my manual 
unrolling does not bring the speed-up per se. The parallelization of 
multiplication does. This can be seen by observing the score of 
hashCode3x benchmark, which has the same loop structure as hashCode3, 
but performs multiplications in a way where each of them depends on the 
result of a previous one, which prevents the CPU from parallelizing them.


This is not to say that such transformation couldn't be done on the JIT 
side. I just have a feeling that such transformation won't be widely 
used because it is very specific. It can only be used within integer 
arithmetic of the homogeneous width (since it changes the order of 
operations applied, the final result depends on which width is used when 
overflow happens). Floating arithmetic is equally unsiutable for such 
transformations that change order of operations. It can only help when 
the sequence of operations that are dependent on one another are changed 
into a sequence of independent operations and those operations have a 
weight that matters (such as multiplication).


Regards, Peter



-Aleksey.






Re: Lower overhead String encoding/decoding

2014-09-25 Thread Richard Warburton
Hi Alan,

Thanks for the feedback.

The direction seems reasonable but I wonder about the offset/length (and
 destOffet) parameters as this isn't consistent with how ByteBuffers were
 originally intended to be used. That is, when you read the bytes from the
 wire into a ByteBuffer and flip it then the position and limit will delimit
 the bytes to be decoded.

 If the constructor is changed to String(ByteBuffer in, Charset cs) and
 decodes the remaining bytes in the buffer to a String using the specified
 Charset then I think would be more consistent. Also I think this would give
 you a solution to the underflow case.


I've updated the webrev to reflect this, removing the offset and length
parameters and using position() and limit() instead.

http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/

Similarly if getBytes is replaced with with a getBytes or
 encode(ByteBuffer, Charset cs) then then it would encode as many characters
 as possible into the output buffer and I think would be more consistent and
 also help with overflow case.


I've also applied the this to the getBytes() method. I chose the getBytes()
method name for consistency with the existing getBytes() method that
returns a byte[]. To my mind encode() is a more natural name for the
method, which you mention in your email, do people have a preference here?

regards,

  Richard Warburton

  http://insightfullogic.com
  @RichardWarburto http://twitter.com/richardwarburto


Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation

2014-09-25 Thread Vitaly Davidovich
Note that compiler does not use multiplication here.  This is strength
reduced into a bit shift followed by a subtraction (i.e. h * 31 = h * 32
(i.e. sal 5) - h).  The point about allowing execution units to run in
parallel (given the break in data dependency) is very valid though.

For kicks and giggles, I looked at what gcc, clang, and icc generate for
similar C code at O2 and O3, and did not observe them making any
transformations to break the data dependency.

While it would be cool if compiler could do this transformation on its own,
as you say Peter, the scope of where this transform is applicable is
somewhat narrow(?).  If speeding up the non-cached String.hashCode() or
Arrays.hashCode() methods is desired, perhaps intrinsics could be
implemented (utility for String.hashCode seems low though).  In that case,
one could then do this transformation manually and also vectorize it for
long inputs.

On Thu, Sep 25, 2014 at 7:09 AM, Peter Levart peter.lev...@gmail.com
wrote:

 On 09/25/2014 09:40 AM, Aleksey Shipilev wrote:

 Hi Peter,

 On 09/25/2014 02:46 AM, Peter Levart wrote:

 http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java

 Interesting.

 I have to say it once again:
   a) It is *an error* to use static finals as the benchmark input. They
 are perfectly constant-foldable in way too many cases. Break this habit,
 please.


 Hi Aleksey,

 The constant in this example is only a reference to the char[] array.
 It's content is not. In String, this is a final instance field, which
 behaves similarly inside an instance method (usually it is read only once
 per method invocation).

b) Explicit Blackholes are not needed, and just returning int from
 @Benchmark method helps readability a lot. Please break this habit as
 well. Having readable and maintainable benchmarks is a key.


 Ok, here's a modified benchmark:

 http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench2.java

 Which behaves similarly.

 Here are it's results:

  * Results on JDK9, Linux, i7-2600K CPU, JMH args: -f 1 -wi 5 -i 8 -gc true
  *
  * Benchmark Mode  Samples Score  Score error
 Units
  * j.t.HashBench2._hashCode thrpt8   8308858.217 353019.084
 ops/s
  * j.t.HashBench2.hashCode0 thrpt8   8207337.729 217048.634
 ops/s
  * j.t.HashBench2.hashCode1 thrpt8  13359572.359 345736.675
 ops/s
  * j.t.HashBench2.hashCode2 thrpt8  15310621.202 238369.017
 ops/s
  * j.t.HashBench2.hashCode3 thrpt8  17637944.829 232155.847
 ops/s
  * j.t.HashBench2.hashCode3ithrpt8  17724181.444 509913.288
 ops/s
  * j.t.HashBench2.hashCode3xthrpt8   8344128.432 159508.813
 ops/s
  * j.t.HashBench2.hashCode4 thrpt8  16526850.489 969549.448
 ops/s
  * j.t.HashBench2.hashCode5 thrpt8  17567765.554 917934.885
 ops/s
  * j.t.HashBench2.hashCode6 thrpt8  17705074.332 419405.652
 ops/s
  * j.t.HashBench2.hashCode7 thrpt8  18805633.563 209181.299
 ops/s
  * j.t.HashBench2.hashCode8 thrpt8  18300123.201 376681.550
 ops/s

 It would be interesting to see how it behaves on different CPUs.


  This is really great!

 Couldn't this be a tweak via HotSpot, instead uglifying and bloating
 the Java and hence the byte code?

 +1

  This is for HotSpot compiler guys to answer. Theoretically I think it is
 possible. But it would have to be tailored to the very specific use case
 and I don't know if such specific transformation would have wide-enough
 applicability. If it would only speed-up String.hashCode and very
 similar loops, it is less trouble to do that by hand in one or few
 places...

 I would think this happens in user-specified hashCode() over arrays.
 IDEs would routinely inject the loop like that or delegate to
 Arrays.hashCode, that does the same loop.


 Arrays.hasCode() can be improved this way too.


 In other words, I would like to see this fixed on compiler side. This
 seems to be the strength-reduction playing tricks with loop unrolling,
 I'll submit a bug shortly.


 As I said, I don't think it has anything to do with loop unrolling. The
 transformation I applied in hashCode1,2,3, ... 8 produces code that
 executes 2, 3, 4, ... 9 independent multiplications in each chunk, which
 allow CPU's pipeline to execute them in parallel. I had to manually unroll
 the loop a bit just to achieve this transformation. But my manual unrolling
 does not bring the speed-up per se. The parallelization of multiplication
 does. This can be seen by observing the score of hashCode3x benchmark,
 which has the same loop structure as hashCode3, but performs
 multiplications in a way where each of them depends on the result of a
 previous one, which prevents the CPU from parallelizing them.

 This is not to say that such transformation couldn't be done on the JIT
 side. I just have a feeling that such transformation won't be widely used
 because it is very specific. It can only be 

Re: Lower overhead String encoding/decoding

2014-09-25 Thread Xueming Shen

Hi Richard, couple comments after a quick scan.

(1)  #474, the IOBE probably is no longer needed in case of ByteBuffer. 
parameter.


(2) for methods that have the ByteBuffer as input, it would be desirable 
to specify clearly that
 the bytes are read from position to limit, and whether or not 
the position will be advanced

 after decoding/encoding (important).

(3) getBytes(byte[], offset, charset)
 while I understand it might be useful to have such a method in 
certain circumstance, it is
 usually complicated to make it right/easy for user to actually use 
it. Consider the fact that
 the returned number of bytes encoded has no straightforward 
connection to how many
 underlying chars have been encoded (so the user of this method 
really have no idea how
 many underlying chars have been encoded into the dest buffer, is 
that enough? how big
 the buffer need to be to encode the whole string? ) especially 
the possibility that the last
 couple byte space might be short of encoding a char. Not like 
the getChars(), which has
 a easy, clear and direct link between the out going chars and 
underlying chars. I would

 suggest it might be better to leave it out.

(4) StringCoding.decode() #239  remaining() should be used to return 
limit - position?


(5) in case of untrusted, it might be more straightforward to get all 
bytes out of the buffer
 first (you are allocating a byte buffer here anyway, I don;t see 
obvious benefit to get a
 direct buffer, btw) and then pass it to the original/existing 
byte[]-char[] decoding
 implementation. We probably will take a deep look at the 
implementation later when

 the public api settled.

-Sherman


Richard Warburton wrote:

Hi Alan,

Thanks for the feedback.

The direction seems reasonable but I wonder about the offset/length (and

destOffet) parameters as this isn't consistent with how ByteBuffers were
originally intended to be used. That is, when you read the bytes from the
wire into a ByteBuffer and flip it then the position and limit will delimit
the bytes to be decoded.

If the constructor is changed to String(ByteBuffer in, Charset cs) and
decodes the remaining bytes in the buffer to a String using the specified
Charset then I think would be more consistent. Also I think this would give
you a solution to the underflow case.


I've updated the webrev to reflect this, removing the offset and length
parameters and using position() and limit() instead.

http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/

Similarly if getBytes is replaced with with a getBytes or

encode(ByteBuffer, Charset cs) then then it would encode as many characters
as possible into the output buffer and I think would be more consistent and
also help with overflow case.


I've also applied the this to the getBytes() method. I chose the getBytes()
method name for consistency with the existing getBytes() method that
returns a byte[]. To my mind encode() is a more natural name for the
method, which you mention in your email, do people have a preference here?

regards,

   Richard Warburton

   http://insightfullogic.com
   @RichardWarburto http://twitter.com/richardwarburto




Re: Lower overhead String encoding/decoding

2014-09-25 Thread Xueming Shen

On 9/25/14 9:24 AM, Xueming Shen wrote:

Hi Richard, couple comments after a quick scan.

(1)  #474, the IOBE probably is no longer needed in case of 
ByteBuffer. parameter.


(2) for methods that have the ByteBuffer as input, it would be 
desirable to specify clearly that
 the bytes are read from position to limit, and whether or not 
the position will be advanced

 after decoding/encoding (important).

(3) getBytes(byte[], offset, charset)
 while I understand it might be useful to have such a method in 
certain circumstance, it is
 usually complicated to make it right/easy for user to actually 
use it. Consider the fact that
 the returned number of bytes encoded has no straightforward 
connection to how many
 underlying chars have been encoded (so the user of this method 
really have no idea how
 many underlying chars have been encoded into the dest buffer, 
is that enough? how big
 the buffer need to be to encode the whole string? ) 
especially the possibility that the last
 couple byte space might be short of encoding a char. Not like 
the getChars(), which has
 a easy, clear and direct link between the out going chars and 
underlying chars. I would

 suggest it might be better to leave it out.


I would assume this is true for the ByteBuffer version as well...just 
doubt the usefulness of
a method that it might only give you the bytes of part of the target 
string object. Given the
getBytes(ByteBuffer) version has the additional position/limit info form 
the buffer itself, a
workaround is to return the number of underlying chars copied...then 
you need one more

parameter to let user to copy from index' for next round.

-sherman



(4) StringCoding.decode() #239  remaining() should be used to return 
limit - position?


(5) in case of untrusted, it might be more straightforward to get 
all bytes out of the buffer
 first (you are allocating a byte buffer here anyway, I don;t see 
obvious benefit to get a
 direct buffer, btw) and then pass it to the original/existing 
byte[]-char[] decoding
 implementation. We probably will take a deep look at the 
implementation later when

 the public api settled.

-Sherman


Richard Warburton wrote:

Hi Alan,

Thanks for the feedback.

The direction seems reasonable but I wonder about the offset/length (and
destOffet) parameters as this isn't consistent with how ByteBuffers 
were
originally intended to be used. That is, when you read the bytes 
from the
wire into a ByteBuffer and flip it then the position and limit will 
delimit

the bytes to be decoded.

If the constructor is changed to String(ByteBuffer in, Charset cs) and
decodes the remaining bytes in the buffer to a String using the 
specified
Charset then I think would be more consistent. Also I think this 
would give

you a solution to the underflow case.


I've updated the webrev to reflect this, removing the offset and length
parameters and using position() and limit() instead.

http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/

Similarly if getBytes is replaced with with a getBytes or
encode(ByteBuffer, Charset cs) then then it would encode as many 
characters
as possible into the output buffer and I think would be more 
consistent and

also help with overflow case.

I've also applied the this to the getBytes() method. I chose the 
getBytes()

method name for consistency with the existing getBytes() method that
returns a byte[]. To my mind encode() is a more natural name for the
method, which you mention in your email, do people have a preference 
here?


regards,

   Richard Warburton

   http://insightfullogic.com
   @RichardWarburto http://twitter.com/richardwarburto






Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation

2014-09-25 Thread Peter Levart


On 09/25/2014 06:00 PM, Vitaly Davidovich wrote:
Note that compiler does not use multiplication here.  This is strength 
reduced into a bit shift followed by a subtraction (i.e. h * 31 = h * 
32 (i.e. sal 5) - h).


I've noticed that in the disassembly files provided by Aleksey. Good to 
know, so one is not bothered to use bit shifts in situations where 
multiplication / division look clearer in source code.


  The point about allowing execution units to run in parallel (given 
the break in data dependency) is very valid though.


For kicks and giggles, I looked at what gcc, clang, and icc generate 
for similar C code at O2 and O3, and did not observe them making any 
transformations to break the data dependency.


So HotSpot can break the ice here ;-)



While it would be cool if compiler could do this transformation on its 
own, as you say Peter, the scope of where this transform is applicable 
is somewhat narrow(?).  If speeding up the non-cached 
String.hashCode() or Arrays.hashCode() methods is desired, perhaps 
intrinsics could be implemented (utility for String.hashCode seems low 
though).  In that case, one could then do this transformation manually 
and also vectorize it for long inputs.


I see there are multiple options - each with it's own benefits and 
drawbacks. Hard to choose.


Regards, Peter



On Thu, Sep 25, 2014 at 7:09 AM, Peter Levart peter.lev...@gmail.com 
mailto:peter.lev...@gmail.com wrote:


On 09/25/2014 09:40 AM, Aleksey Shipilev wrote:

Hi Peter,

On 09/25/2014 02:46 AM, Peter Levart wrote:


http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java

http://cr.openjdk.java.net/%7Eplevart/misc/StringHash/HashBench.java

Interesting.

I have to say it once again:
  a) It is *an error* to use static finals as the benchmark
input. They
are perfectly constant-foldable in way too many cases. Break
this habit,
please.


Hi Aleksey,

The constant in this example is only a reference to the char[]
array. It's content is not. In String, this is a final instance
field, which behaves similarly inside an instance method (usually
it is read only once per method invocation).

  b) Explicit Blackholes are not needed, and just returning
int from
@Benchmark method helps readability a lot. Please break this
habit as
well. Having readable and maintainable benchmarks is a key.


Ok, here's a modified benchmark:

http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench2.java 
http://cr.openjdk.java.net/%7Eplevart/misc/StringHash/HashBench2.java

Which behaves similarly.

Here are it's results:

 * Results on JDK9, Linux, i7-2600K CPU, JMH args: -f 1 -wi 5 -i 8
-gc true
 *
 * Benchmark Mode  Samples  Score  Score
error  Units
 * j.t.HashBench2._hashCode thrpt8 8308858.217
tel:8308858.217 353019.084  ops/s
 * j.t.HashBench2.hashCode0 thrpt8   8207337.729
217048.634  ops/s
 * j.t.HashBench2.hashCode1 thrpt8  13359572.359
345736.675  ops/s
 * j.t.HashBench2.hashCode2 thrpt8  15310621.202
238369.017  ops/s
 * j.t.HashBench2.hashCode3 thrpt8 17637944.829
tel:17637944.829 232155.847  ops/s
 * j.t.HashBench2.hashCode3ithrpt8 17724181.444
tel:17724181.444 509913.288  ops/s
 * j.t.HashBench2.hashCode3xthrpt8   8344128.432
159508.813  ops/s
 * j.t.HashBench2.hashCode4 thrpt8  16526850.489
969549.448  ops/s
 * j.t.HashBench2.hashCode5 thrpt8  17567765.554
917934.885  ops/s
 * j.t.HashBench2.hashCode6 thrpt8 17705074.332
tel:17705074.332 419405.652  ops/s
 * j.t.HashBench2.hashCode7 thrpt8  18805633.563
209181.299  ops/s
 * j.t.HashBench2.hashCode8 thrpt8  18300123.201
376681.550  ops/s

It would be interesting to see how it behaves on different CPUs.


This is really great!

Couldn't this be a tweak via HotSpot, instead
uglifying and bloating
the Java and hence the byte code?

+1

This is for HotSpot compiler guys to answer. Theoretically
I think it is
possible. But it would have to be tailored to the very
specific use case
and I don't know if such specific transformation would
have wide-enough
applicability. If it would only speed-up String.hashCode
and very
similar loops, it is less trouble to do that by hand in
one or few
places...

I would think this happens in user-specified hashCode() over
arrays.
IDEs would routinely inject the loop like that or delegate to
Arrays.hashCode, that does the