Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22965#discussion_r232001105 --- Diff: sql/core/benchmarks/DataSourceReadBenchmark-results.txt --- @@ -2,268 +2,268 @@ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 21508 / 22112 0.7 1367.5 1.0X -SQL Json 8705 / 8825 1.8 553.4 2.5X -SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X -SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X -SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X -SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X -SQL ORC MR 1448 / 1492 10.9 92.0 14.9X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +SQL CSV 26366 / 26562 0.6 1676.3 1.0X --- End diff -- Hi, @HyukjinKwon , @MaxGekk , @cloud-fan , @peter-toth This is not related to this PR. CSV shows a consistent performance regression (about 10%) thoughout all benchmark cases. The other data sources show reasonable numbers for all types. The baseline is generated on Oct 11st. The followings are the suspects. 1. ee03f760b3 [SPARK-25955][TEST] Porting JSON tests for CSV functions 1. 94de5609be [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method 1. 3b4556745e [SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example 1. 1e6c1d8bfb [SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline mode 1. c7eadb5e66 [SPARK-25660][SQL] Fix for the backward slash as CSV fields delimiter 1. 39872af882 [SPARK-25684][SQL] Organize header related codes in CSV datasource 1. 46fe40838a [SPARK-25669][SQL] Check CSV header only when it exists
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org