LuciferYang opened a new pull request, #37799:
URL: https://github.com/apache/spark/pull/37799

   ### What changes were proposed in this pull request?
   This PR suggests using Java 11+ as runtime of Spark in the `index.md`.
   
   
   ### Why are the changes needed?
   The JIT optimizer in Java 11+ is better than Java 8, running Spark using 
Java 11+ will bring performance benefits.
   
   Spark uses `Whole Stage Code Gen` to improve performance, but the generated 
code may not be friendly to the JIT optimizer. For example, the case mentioned 
in SPARK-40303:
   
   ```scala
     runBenchmark("Benchmark count distinct") {
       withTempPath { dir =>
         val N = 2000000
         val columns = Range(0, 100).map(i => s"id % $i AS id$i")
   
         spark.range(N).selectExpr(columns: 
_*).write.mode("Overwrite").parquet(dir.getCanonicalPath)
   
         Seq(1, 2, 5, 10, 15, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100).foreach 
{ cnt =>
           val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => 
s"count(distinct $c)")
   
           val benchmark = new Benchmark("Benchmark count distinct", N, 
minNumIters = 1)
           benchmark.addCase(s"$cnt count distinct with codegen") { _ =>
             withSQLConf(
               "spark.sql.codegen.wholeStage" -> "true",
               "spark.sql.codegen.factoryMode" -> "FALLBACK") {
               spark.read.parquet(dir.getCanonicalPath).selectExpr(selectExps: 
_*)
                 .write.format("noop").mode("Overwrite").save()
             }
           }
   
           benchmark.addCase(s"$cnt count distinct without codegen") { _ =>
             withSQLConf(
               "spark.sql.codegen.wholeStage" -> "false",
               "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") {
               spark.read.parquet(dir.getCanonicalPath).selectExpr(selectExps: 
_*)
                 .write.format("noop").mode("Overwrite").save()
             }
           }
           benchmark.run()
         }
       }
     }
   ```
   
   When use Java 8 to run the above case in `local[2]` mode, there will be 
obvious negative effects when `cnt` is **35, 40, 50, 60 or 70** (the 
performance after `cnt > 70` is meet expectations due to fallback with 
InternalCompilerException: Code grows beyond 64 KB):
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   35 count distinct with codegen                   418321         418321       
    0          0.0      209160.3       1.0X
   35 count distinct without codegen                 69975          69975       
    0          0.0       34987.7       6.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   40 count distinct with codegen                   626627         626627       
    0          0.0      313313.7       1.0X
   40 count distinct without codegen                 90564          90564       
    0          0.0       45281.9       6.9X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   50 count distinct with codegen                   906749         906749       
    0          0.0      453374.7       1.0X
   50 count distinct without codegen                140278         140278       
    0          0.0       70138.8       6.5X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   60 count distinct with codegen                   995616         995616       
    0          0.0      497808.2       1.0X
   60 count distinct without codegen                215088         215088       
    0          0.0      107544.2       4.6X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   70 count distinct with codegen                  1060273        1060273       
    0          0.0      530136.4       1.0X
   70 count distinct without codegen                290576         290576       
    0          0.0      145287.9       3.6X
   ```
   
   
   But run the above cases using Java 11 or Java 17,  there will be no obvious 
negative effects:
   
   Java 11
   
   ```
   OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   35 count distinct with codegen                    54733          54733       
    0          0.0       27366.6       1.0X
   35 count distinct without codegen                 97613          97613       
    0          0.0       48806.6       0.6X
   
   OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   40 count distinct with codegen                    70454          70454       
    0          0.0       35226.8       1.0X
   40 count distinct without codegen                127975         127975       
    0          0.0       63987.3       0.6X
   
   
   OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
   ntel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   50 count distinct with codegen                   104559         104559       
    0          0.0       52279.7       1.0X
   50 count distinct without codegen                182331         182331       
    0          0.0       91165.5       0.6X
   
   OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   60 count distinct with codegen                   146774         146774       
    0          0.0       73386.8       1.0X
   60 count distinct without codegen                257184         257184       
    0          0.0      128592.1       0.6X
   
   OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   70 count distinct with codegen                   190883         190883       
    0          0.0       95441.4       1.0X
   70 count distinct without codegen                346295         346295       
    0          0.0      173147.4       0.6X
   ```
   
   **Java 17**
   
   ```
   OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   35 count distinct with codegen                    45439          45439       
    0          0.0       22719.5       1.0X
   35 count distinct without codegen                 77241          77241       
    0          0.0       38620.5       0.6X
   
   OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   40 count distinct with codegen                    58347          58347       
    0          0.0       29173.3       1.0X
   40 count distinct without codegen                 98928          98928       
    0          0.0       49464.1       0.6X
   
   
   OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   50 count distinct with codegen                    85589          85589       
    0          0.0       42794.7       1.0X
   50 count distinct without codegen                151642         151642       
    0          0.0       75820.9       0.6X
   
   OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   60 count distinct with codegen                   118545         118545       
    0          0.0       59272.5       1.0X
   60 count distinct without codegen                234024         234024       
    0          0.0      117012.2       0.5X
   
   OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   70 count distinct with codegen                   157151         157151       
    0          0.0       78575.7       1.0X
   70 count distinct without codegen                319582         319582       
    0          0.0      159791.2       0.5X
   ```
   
   After turning on the `-XX:+PrintCompilation` option, we can found the logs 
related to the `XX_doConsume` method compilation failure of the C2 compiler:
   
   ```
   COMPILE SKIPPED: unsupported incoming calling sequence
   
   or 
   
   COMPILE SKIPPED: unsupported calling sequence
   
   ```
   
   When using Java 8, they are identified as `not retryable`, this indicates 
that the compiler deemed this method should not be attempted to compile again 
on any tier of compilation, and because this is an OSR compilation (i.e. loop 
compilation), this will mark the method as "never try to perform OSR 
compilation again on all tiers".  But when using Java 11/17 they are identified 
as `retry at different tier`,  it'll try tier 1, this also makes the 
performance of the above cases seem acceptable.
   
   
   So this pr started to suggest using Java 11+ as runtime environment of Spark 
in the document.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Just change the document.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to