[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

cloud-fan Sat, 01 Dec 2018 19:06:01 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23120#discussion_r238083349
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ---
    @@ -243,21 +243,27 @@ class UnivocityParser(
             () => getPartialResult(),
             new RuntimeException("Malformed CSV record"))
         } else {
    -      try {
    -        // When the length of the returned tokens is identical to the 
length of the parsed schema,
    -        // we just need to convert the tokens that correspond to the 
required columns.
    -        var i = 0
    -        while (i < requiredSchema.length) {
    +      // When the length of the returned tokens is identical to the length 
of the parsed schema,
    +      // we just need to convert the tokens that correspond to the 
required columns.
    +      var badRecordException: Option[Throwable] = None
    +      var i = 0
    +      while (i < requiredSchema.length) {
    +        try {
               row(i) = valueConverters(i).apply(getToken(tokens, i))
    -          i += 1
    +        } catch {
    +          case NonFatal(e) =>
    +            badRecordException = badRecordException.orElse(Some(e))
             }
    +        i += 1
    +      }
    +
    +      if (badRecordException.isEmpty) {
             row
    -      } catch {
    -        case NonFatal(e) =>
    -          // For corrupted records with the number of tokens same as the 
schema,
    -          // CSV reader doesn't support partial results. All fields other 
than the field
    -          // configured by `columnNameOfCorruptRecord` are set to `null`.
    -          throw BadRecordException(() => getCurrentInput, () => None, e)
    +      } else {
    +        // For corrupted records with the number of tokens same as the 
schema,
    +        // CSV reader doesn't support partial results. All fields other 
than the field
    +        // configured by `columnNameOfCorruptRecord` are set to `null`.
    --- End diff --
    
    what do you mean here?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

Reply via email to