Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22654#discussion_r223749951
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala
---
@@ -97,23 +97,22 @@ object CSVUtils {
*/
@throws[IllegalArgumentException]
def toChar(str: String): Char = {
- if (str.charAt(0) == '\\') {
- str.charAt(1)
- match {
- case 't' => '\t'
- case 'r' => '\r'
- case 'b' => '\b'
- case 'f' => '\f'
- case '\"' => '\"' // In case user changes quote char and uses \"
as delimiter in options
- case '\'' => '\''
- case 'u' if str == """\u0000""" => '\u0000'
- case _ =>
- throw new IllegalArgumentException(s"Unsupported special
character for delimiter: $str")
- }
- } else if (str.length == 1) {
- str.charAt(0)
- } else {
- throw new IllegalArgumentException(s"Delimiter cannot be more than
one character: $str")
+ (str: Seq[Char]) match {
+ case Seq() => throw new IllegalArgumentException("Delimiter cannot
be empty string")
+ case Seq(c) => c
--- End diff --
I would prefer more declarative way and less nested levels of controls. but
this is personal opinion. Let's look at your example:
```
str.length
```
you didn't check that str can be null.
```
case 2 if str(0) == '\\' =>
case 'u' if str == """\u0000""" => '\0'
```
If it has length 2, how `str` could be `"""\u0000"""`?
```
case c if """trbf"'\""".contains(c) => c
```
You should produce control chars not just second char. For example: `\t` ->
Seq('\', 't') -> '\t`.
In my approach, everything is simple. One input case is mapped to one
output. There is no unnecessary complexity.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]