cyb70289 opened a new pull request, #49568:
URL: https://github.com/apache/spark/pull/49568

   This is a trivial change to replace the loop index from `int` to `long`. 
Surprisingly, microbenchmark shows more than double performance uplift.
   
   Analysis
   --------
   The hot loop of `arrayEquals` method is simplifed as below. Loop index `i` 
is defined as `int`, it's compared with `length`, which is a `long`, to 
determine if the loop should end.
   ```
   public static boolean arrayEquals(
       Object leftBase, long leftOffset, Object rightBase, long rightOffset, 
final long length) {
     ......
     int i = 0;
     while (i <= length - 8) {
       if (Platform.getLong(leftBase, leftOffset + i) !=
           Platform.getLong(rightBase, rightOffset + i)) {
             return false;
       }
       i += 8;
     }
     ......
   }
   ```
   
   Strictly speaking, there's a code bug here. If `length` is greater than 2^31 
+ 8, this loop will never end because `i` as a 32 bit integer is at most 2^31 - 
1. But compiler must consider this behaviour as intentional and generate code 
strictly match the logic. It prevents compiler from generating optimal code.
   
   Defining loop index `i` as `long` corrects this issue. Besides more accurate 
code logic, JIT is able to optimize this code much more aggressively. From 
microbenchmark, this trivial change improves performance significantly on both 
Arm and x86 platforms.
   
   Benchmark
   ---------
   Source code:
   https://gist.github.com/cyb70289/258e261f388e22f47e4d961431786d1a
   
   Result on Arm Neoverse N2:
   ```
   Benchmark                             Mode  Cnt    Score   Error  Units
   ArrayEqualsBenchmark.arrayEqualsInt   avgt   10  674.313 ± 0.213  ns/op
   ArrayEqualsBenchmark.arrayEqualsLong  avgt   10  313.563 ± 2.338  ns/op
   ```
   
   Result on Intel Cascake Lake:
   ```
   Benchmark                             Mode  Cnt     Score   Error  Units
   ArrayEqualsBenchmark.arrayEqualsInt   avgt   10  1130.695 ± 0.168  ns/op
   ArrayEqualsBenchmark.arrayEqualsLong  avgt   10   461.979 ± 0.097  ns/op
   ```
   
   Deep dive
   ---------
   Dive deep to the machine code level, we can see why the big gap. Listed 
below are arm64 assembly generated by Openjdk-17 C2 compiler.
   
   For `int i`, the machine code is similar to source code, no deep 
optimization. Safepoint polling is expensive in this short loop.
   ```
   // jit c2 machine code snippet
     0x0000ffff81ba8904:   mov        w15, wzr              // int i = 0
     0x0000ffff81ba8908:   nop
     0x0000ffff81ba890c:   nop
   loop:
     0x0000ffff81ba8910:   ldr        x10, [x13, w15, sxtw] // 
Platform.getLong(leftBase, leftOffset + i)
     0x0000ffff81ba8914:   ldr        x14, [x12, w15, sxtw] // 
Platform.getLong(rightBase, rightOffset + i)
     0x0000ffff81ba8918:   cmp        x10, x14
     0x0000ffff81ba891c:   b.ne       0x0000ffff81ba899c    // return false if 
not equal
     0x0000ffff81ba8920:   ldr        x14, [x28, #848]      // x14 -> safepoint
     0x0000ffff81ba8924:   add        w15, w15, #0x8        // i += 8
     0x0000ffff81ba8928:   ldr        wzr, [x14]            // safepoint polling
     0x0000ffff81ba892c:   sxtw       x10, w15              // extend i to long
     0x0000ffff81ba8930:   cmp        x10, x11
     0x0000ffff81ba8934:   b.le       0x0000ffff81ba8910    // if (i <= length 
- 8) goto loop
   ```
   
   For `long i`, JIT is able to do much more aggressive optimization. E.g, 
below code snippet unrolls the loop by four.
   ```
   // jit c2 machine code snippet
   unrolled_loop:
     0x0000ffff91de6fe0:   sxtw       x10, w7
     0x0000ffff91de6fe4:   add        x23, x22, x10
     0x0000ffff91de6fe8:   add        x24, x21, x10
     0x0000ffff91de6fec:   ldr        x13, [x23]          // unroll-1
     0x0000ffff91de6ff0:   ldr        x14, [x24]
     0x0000ffff91de6ff4:   cmp        x13, x14
     0x0000ffff91de6ff8:   b.ne       0x0000ffff91de70a8
     0x0000ffff91de6ffc:   ldr        x13, [x23, #8]      // unroll-2
     0x0000ffff91de7000:   ldr        x14, [x24, #8]
     0x0000ffff91de7004:   cmp        x13, x14
     0x0000ffff91de7008:   b.ne       0x0000ffff91de70b4
     0x0000ffff91de700c:   ldr        x13, [x23, #16]     // unroll-3
     0x0000ffff91de7010:   ldr        x14, [x24, #16]
     0x0000ffff91de7014:   cmp        x13, x14
     0x0000ffff91de7018:   b.ne       0x0000ffff91de70a4
     0x0000ffff91de701c:   ldr        x13, [x23, #24]     // unroll-4
     0x0000ffff91de7020:   ldr        x14, [x24, #24]
     0x0000ffff91de7024:   cmp        x13, x14
     0x0000ffff91de7028:   b.ne       0x0000ffff91de70b0
     0x0000ffff91de702c:   add        w7, w7, #0x20
     0x0000ffff91de7030:   cmp        w7, w11
     0x0000ffff91de7034:   b.lt       0x0000ffff91de6fe0
   ```
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: 
https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: 
https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., 
'[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a 
faster review.
     7. If you want to add a new configuration, please read the guideline first 
for naming configurations in
        
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the 
guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section 
is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster 
reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class 
hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other 
DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   A trivial change to replace loop index `i` of method `arrayEquals` from 
`int` to `long`.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   To improve performance and fix a possible bug.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as 
new features, bug fixes, or other behavior changes. Documentation-only updates 
are not considered user-facing changes.
   
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to 
the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions 
for the consistent environment, and the instructions could accord to: 
https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Existing unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this 
patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling 
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to