Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223726757
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala
---
@@ -97,23 +97,22 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \"
as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special
character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than
one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException("Delimiter cannot
be empty string")
+ case Seq(c) => c
--- End diff --
I'm missing why we had to switch up the case statement like this. I get
that we need to cover more cases, but there was duplication and now there is a
bit more. What about ...
```
str.length match {
case 0 => // error
case 1 => str(0)
case 2 if str(0) == '\\' =>
str(1) match {
case c if """trbf"'\""".contains(c) => c
case 'u' if str == """\u0000""" => '\0'
case _ => // error
}
case _ => // error
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]