HyukjinKwon edited a comment on issue #23202: [SPARK-26248][SQL] Infer date 
type from CSV
URL: https://github.com/apache/spark/pull/23202#issuecomment-447715090
 
 
   **Problem 1.**
   
   https://github.com/apache/spark/pull/23202#discussion_r242010787 - I left 
some examples there.
   
   If there are multiple rows, and the first row is inferred as date type in 
the same partition,
   It will not be able to infer timestamp afterward.
   
   
   **Problem 2.**
   
   https://github.com/apache/spark/pull/23202#issuecomment-447701620
   
   If legacy is on, we have ambiguity about date/timestamp pattern matching, 
because they can be arbitrarily set by users.
   It does not do the exact match, which means it's not going to distinguish 
`yyyy-MM` and `yyyy-MM-dd` for input, for instane, `2010-10-10`.
   
   We are able to do this only when `spark.sql.legacy.timeParser.enabled` is 
disabled (by default), however, I was thinking it's going to introduce 
complexity. 
   I was thinking we could do it later when we remove 
`spark.sql.legacy.timeParser.enabled`. Date type inference isn't super 
important IMHO becase we infer timestamps.
   I would like to talk about this further if anyone thinks differently. If the 
change isn't complicated then I thought, it should also be okay to go ahead.
   
   **Questions:**
   
   How do we define the precedence between `dateFormat` and `timestampFormat`? 
(for instance, if the patterns are same, then, does it become timestamp or 
date?)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to