MaxGekk commented on a change in pull request #27807: [SPARK-31076][SQL] 
Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time
URL: https://github.com/apache/spark/pull/27807#discussion_r390761064
 
 

 ##########
 File path: 
sql/core/v1.2/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
 ##########
 @@ -136,7 +138,9 @@ public int getInt(int rowId) {
   public long getLong(int rowId) {
     int index = getRowIndex(rowId);
     if (isTimestamp) {
-      return timestampData.time[index] * 1000 + timestampData.nanos[index] / 
1000 % 1000;
+      Timestamp ts = new Timestamp(timestampData.time[index]);
 
 Review comment:
   > Could you elaborate a little more what was the existing bug here?
   
   ORC writer uses `DateTimeUtils.toJavaTimestamp`, see 
https://github.com/apache/spark/blob/300ec1a74cb14867c22e616e657566d510426331/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcSerializer.scala#L144-L148
  but here we don't use opposite function `DateTimeUtils.fromJavaTimestamp`.
   
   And the replaced hand-written code is not equal to `fromJavaTimestamp` 
   
   > Does this PR add a test coverage for this vectorized code path change?
   
   I have to fix this place due test failures, see 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119486/testReport/
   
   The changes covered by the round trip test 
https://github.com/apache/spark/blob/7782b61a31ba49cdeffc35a942bd365bb71b026d/sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala#L122
 which runs by `OrcHadoopFsRelationSuite` for vectorized code path.
   
   > can we add a round trip test in OrcQuerySuite to read/write date before 
1582?
   
   @cloud-fan The test I pointed out above generates random dates/timestamps 
before 1582 with high probability. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to