MaxGekk edited a comment on issue #28169: [SPARK-31398][SQL] Speed up reading dates in ORC URL: https://github.com/apache/spark/pull/28169#issuecomment-612206971 I ran the benchmark `DateTimeRebaseBenchmark` on 2.4.6-SNAPSHOT: ``` OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Load dates from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ after 1582, vec off 48169 48276 96 2.1 481.7 1.0X after 1582, vec on 5375 5410 41 18.6 53.7 9.0X before 1582, vec off 22353 22482 198 4.5 223.5 2.2X before 1582, vec on 5474 5475 1 18.3 54.7 8.8X ``` Here is the PR https://github.com/MaxGekk/spark/pull/27. - After 1582, it is **~4 times faster** - Before 1582, it is **~2 times faster**
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
