dongjoon-hyun opened a new pull request #26049: [SPARK-25668][SQL][TESTS] 
Refactor TPCDSQueryBenchmark to use main method
URL: https://github.com/apache/spark/pull/26049
 
 
   ### What changes were proposed in this pull request?
   
   This PR aims to refactor `TPCDSQueryBenchmark` to use main method and to 
store the result into the file.
   
   ### Why are the changes needed?
   
   Although the generated TPCDS data is random, we can keep the record. This PR 
adds a JDK8 result based on the TPCDS ScaleFactor 1G data generated by the 
following.
   ```
   # `spark-tpcds-datagen` needs this.
   $ git clone https://github.com/apache/spark.git -b branch-2.4 --depth 1 
spark-2.4
   $ export SPARK_HOME=$PWD
   $ ./build/mvn clean package -DskipTests
   
   # Generate data.
   $ git clone [email protected]:maropu/spark-tpcds-datagen.git
   $ cd spark-tpcds-datagen/
   $ build/mvn clean package
   $ mkdir -p /data/tpcds
   $ ./bin/dsdgen --output-location /data/tpcds/s1  // This need `Spark 2.4`
   ```
   
   ### Does this PR introduce any user-facing change?
   
   No. (This is dev-only test benchmark).
   
   ### How was this patch tested?
   
   Manually run the benchmark. Please note that you need to have TPCDS data.
   ```
   $ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location 
/data/tpcds/s1"
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to