qiuwenxiang0103 opened a new pull request #29420:
URL: https://github.com/apache/spark/pull/29420


   [SPARK-32602][SQL] Fix issue where data with date type are saved into hive 
table with wrong value '1970-01-01'
   
   ### What changes were proposed in this pull request?
   Data with date type are saved into hive table with wrong value '1970-01-01'
   scala> spark.sql("create table t1(d date)")
   res2: org.apache.spark.sql.DataFrame = []
   scala> spark.sql("insert into table t1 values(cast('2020-08-09' as date))")
   res3: org.apache.spark.sql.DataFrame = []
   scala> spark.sql("select d from t1").show
   +----------+
   |         d|
   +----------+
   |1970-01-01|
   +----------+  
   Spark 3.0 introduced DaysWritable which extends DateWrite from hive to 
handle date type. DaysWritable.toString() is called to write its value into 
hive table. DateWrite.toString() is defined as:
   @Override
   public String toString() {
    // For toString, the time does not matter
    return get(false).toString();
   }
   
   public Date get(boolean doesTimeMatter) {
     return new Date(daysToMillis(daysSinceEpoch, doesTimeMatter));
   }
   
   DaysWritable didn't override toString(), neither get(boolean 
doesTimeMatter)。It did override get():
   override def get(): Date = new Date(DateWritable.daysToMillis(julianDays))
   but this didn't help with toString(), so with daysSinceEpoch in DateWritable 
always as 0, calls to DaysWritable.toString() will always return '1970-01-01', 
and as a result date value stored into hive table will always have value 
'1970-01-01'。
   
   This pull request overrides DaysWritable.get(boolean doesTimeMatter) so that 
it's toString() behaves properly.
   
   ### Why are the changes needed?
   Fix the correctness issue describe above.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   New test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to