Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22374#discussion_r216509913
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1700,4 +1700,13 @@ class CSVSuite extends QueryTest with
SharedSQLContext with SQLTestUtils with Te
checkCount(2)
countForMalformedCSV(0, Seq(""))
}
+
+ test("SPARK-25387: bad input should not cause NPE") {
+ val schema = StructType(StructField("a", IntegerType) :: Nil)
+ val input = spark.createDataset(Seq("\u0000\u0000\u0001234"))
--- End diff --
btw, in this title, bad CSV means what (bad unicode?)? In this case, the
CSV parser returns null and, in another case, it throws
`com.univocity.parsers.common.TextParsingException`? I just want to know the
behaivour in the parser.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]