[GitHub] [spark] jeff303 commented on a change in pull request #26027: [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read

GitBox Mon, 07 Oct 2019 13:29:54 -0700

jeff303 commented on a change in pull request #26027: [SPARK-24540][SQL] 
Support for multiple character delimiter in Spark CSV read
URL: https://github.com/apache/spark/pull/26027#discussion_r332220789


 ##########
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala
 ##########
 @@ -58,4 +61,40 @@ class CSVExprUtilsSuite extends SparkFunSuite {
     }
     assert(exception.getMessage.contains("Delimiter cannot be empty string"))
   }
+
+  val testCases = Table(
+    ("input", "separatorStr", "expectedErrorMsg"),
+    // normal tab
+    ("""\t""", Some("\t"), None),
+    // backslash, then tab
+    ("""\\t""", Some("""\t"""), None),
+    // invalid special character (dot)
+    ("""\.""", None, Some("Unsupported special character for delimiter")),
+    // backslash, then dot
+    ("""\\.""", Some("""\."""), None),
+    // nothing special, just straight conversion
+    ("""foo""", Some("foo"), None),
+    // tab in the middle of some other letters
+    ("""ba\tr""", Some("ba\tr"), None),
+    // null character, expressed in Unicode literal syntax
+    ("""\u0000""", Some("\u0000"), None),
+    // and specified directly
+    ("\0", Some("\u0000"), None)
+  )
+
+  test("should correctly produce separator strings, or exceptions, from 
input") {
+    forAll(testCases) { (input, separatorStr, expectedErrorMsg) =>
+      try {
+        val separator = CSVExprUtils.toDelimiterStr(input)
+        separatorStr.isDefined should equal(true)
 
 Review comment:
   Done (got rid of all the `Matchers` stuff).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] jeff303 commented on a change in pull request #26027: [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read

Reply via email to