MaxGekk edited a comment on issue #28169: [SPARK-31398][SQL] Speed up reading 
dates in ORC
URL: https://github.com/apache/spark/pull/28169#issuecomment-612206971
 
 
   I ran the benchmark `DateTimeRebaseBenchmark` on 2.4.6-SNAPSHOT 
(https://github.com/MaxGekk/spark/commit/965757572bd4bfd04d6f547a6094d9b2891b34d6):
   ```
   OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
4.15.0-1063-aws
   Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
   Load dates from ORC:                      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   after 1582, vec off                               48169          48276       
   96          2.1         481.7       1.0X
   after 1582, vec on                                 5375           5410       
   41         18.6          53.7       9.0X
   before 1582, vec off                              22353          22482       
  198          4.5         223.5       2.2X
   before 1582, vec on                                5474           5475       
    1         18.3          54.7       8.8X
   ```
   
   Here is the PR https://github.com/MaxGekk/spark/pull/27.
   
   - After 1582, it is **~4 times faster**
   - Before 1582, it is **~2 times faster** 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to