Maxim Gekk created SPARK-31426:
----------------------------------

             Summary: Regression in loading/saving timestamps from/to ORC files
                 Key: SPARK-31426
                 URL: https://issues.apache.org/jira/browse/SPARK-31426
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Maxim Gekk


Here are results of DateTimeRebaseBenchmark on the current master branch:
{code}
Save timestamps to ORC:                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582                                        59877          59877          
 0          1.7         598.8       0.0X
before 1582                                       61361          61361          
 0          1.6         613.6       0.0X

Load timestamps from ORC:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off                               48197          48288         
118          2.1         482.0       1.0X
after 1582, vec on                                38247          38351         
128          2.6         382.5       1.3X
before 1582, vec off                              53179          53359         
249          1.9         531.8       0.9X
before 1582, vec on                               44076          44268         
269          2.3         440.8       1.1X
{code}

The results of the same benchmark on Spark 2.4.6-SNAPSHOT:
{code}
Save timestamps to ORC:                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582                                        18858          18858          
 0          5.3         188.6       1.0X
before 1582                                       18508          18508          
 0          5.4         185.1       1.0X

Load timestamps from ORC:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off                               14063          14177         
143          7.1         140.6       1.0X
after 1582, vec on                                 5955           6029         
100         16.8          59.5       2.4X
before 1582, vec off                              14119          14126          
 7          7.1         141.2       1.0X
before 1582, vec on                                5991           6007          
25         16.7          59.9       2.3X
{code}
 Here is the PR with DateTimeRebaseBenchmark backported to 2.4: 
https://github.com/MaxGekk/spark/pull/27



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to