Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20355#discussion_r163434524
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends QueryTest with
SharedSQLContext {
}
}
}
+
+ // Separate test case for text-based formats that support multiLine as
an option.
+ Seq("json", "csv", "text").foreach { format =>
--- End diff --
Actually "text" doesn't support `multiLine` but `wholetext` which runs
another code path. Let's take this out.
How about leaving out the tests for `SPARK-22146` as was, and just adding a
test dedicated for `multiline` here like:
```scala
Seq("json", "csv").foreach { format =>
test("SPARK-23148 read files containing special characters " +
s"using $format multiline enabled") {
withTempDir { dir =>
val tmpFile = s"$dir/$nameWithSpecialChars"
spark.createDataset(Seq("a",
"b")).write.format(format).save(tmpFile)
val reader = spark.read.format(format).option("multiLine", true)
val fileContent = reader.load(tmpFile)
checkAnswer(fileContent, Seq(Row("a"), Row("b")))
}
}
}
```
This PR changes really fix the code paths for both CSV / JSON when
`multiLine` is enabled only .. I am not sure why you don't like this suggestion
.. one test less invasive and more targeted for one JIRA ...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]