[GitHub] [spark] MaxGekk commented on issue #28137: [SPARK-31361][SQL] Rebase datetime in parquet/avro according to file written Spark version

GitBox Mon, 06 Apr 2020 11:17:26 -0700

MaxGekk commented on issue #28137: [SPARK-31361][SQL] Rebase datetime in 
parquet/avro according to file written Spark version
URL: https://github.com/apache/spark/pull/28137#issuecomment-609956319
 
 
   I haven't looked at the implementation yet but the description looks 
dangerous for me:
   > always write the datetime values to parquet/avro without rebasing
   
   @cloud-fan I think there are at least 2 cases:
   1. Spark 3.0 saves parquets that are supposed to be loaded by Spark 3.0 - no 
need to rebase
   2. Spark 3.0 saves parquets that should be loaded by Spark 2.4, 2.3 and etc. 
- need to rebase.
   
   Do you think the seconds case is not possible?
   
   > Do not rebase datetime values when reading parquet/avro, if we know the 
file written version and it's > "3.0"
   
   but you don't know the purpose of written files. Maybe an users wants to 
save parquet files by Spark 3.0  and read them back by Spark 2.4? For example, 
Spark 3.0 and Spark 2.4 can be used in different clusters that prepares data 
for each others.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on issue #28137: [SPARK-31361][SQL] Rebase datetime in parquet/avro according to file written Spark version

Reply via email to