sadikovi commented on code in PR #37147:
URL: https://github.com/apache/spark/pull/37147#discussion_r921809386
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala:
##########
@@ -356,6 +356,16 @@ class UnivocityParserSuite extends SparkFunSuite with
SQLHelper {
val optionsWithPattern = new CSVOptions(
Map("timestampFormat" -> "invalid", "dateFormat" -> "invalid"), false,
"UTC")
- check(new UnivocityParser(StructType(Seq.empty), optionsWithPattern))
+
+ // With legacy parser enabled, we are still able to parse dates and
timestamps.
+ check(new UnivocityParser(StructType(Seq.empty), optionsWithPattern) {
+ override val isLegacyParserPolicy: Boolean = true
+ })
+
+ // With legacy parser disabled, parsing results in error.
+ val err = intercept[IllegalArgumentException] {
+ check(new UnivocityParser(StructType(Seq.empty), optionsWithPattern))
+ }
+ assert(err.getMessage.contains("Illegal pattern character: n"))
Review Comment:
With the invalid pattern and before this PR, yes, the fallback code would be
triggered on every pattern mismatch. With the change, we will just throw an
exception parsing those values as nulls. Yes, it does sound like a performance
issue but it has been there for some time.
I agree with copy-paste of stringToDate, I proposed to add a data source
config to keep the old behaviour. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]