Jonathancui123 commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r919291362
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala:
##########
@@ -148,7 +148,28 @@ class CSVOptions(
// A language tag in IETF BCP 47 format
val locale: Locale =
parameters.get("locale").map(Locale.forLanguageTag).getOrElse(Locale.US)
- val dateFormatInRead: Option[String] = parameters.get("dateFormat")
+ /**
+ * Infer columns with all valid date entries as date type (otherwise
inferred as timestamp type).
+ * Disabled by default for backwards compatibility and performance. When
enabled, date entries in
+ * timestamp columns will be cast to timestamp upon parsing. Not compatible
with
+ * legacyTimeParserPolicy == LEGACY since legacy date parser will accept
extra trailing characters
+ */
+ val inferDate = {
+ val inferDateFlag = getBool("inferDate")
+ if (SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY &&
inferDateFlag) {
Review Comment:
In the [most recent
commit](https://github.com/apache/spark/pull/36871/commits/e1170d0ee2027d810d2b23243602de147748838b),
I implemented the suggestion from @cloud-fan to always use the non-legacy
parser for inference and allowing `inferDate=true` with `legacyTimeParserPolicy
= LEGACY`. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]