[GitHub] [spark] MaxGekk commented on a change in pull request #27239: [SPARK-30530][SQL] Fix filter pushdown for bad CSV records

GitBox Sun, 19 Jan 2020 22:20:01 -0800

MaxGekk commented on a change in pull request #27239: [SPARK-30530][SQL] Fix 
filter pushdown for bad CSV records
URL: https://github.com/apache/spark/pull/27239#discussion_r368383814


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##########
 @@ -230,64 +230,55 @@ class UnivocityParser(
         () => getCurrentInput,
         () => None,
         new RuntimeException("Malformed CSV record"))
-    } else if (tokens.length != parsedSchema.length) {
+    }
+
+    var checkedTokens = tokens
+    var badRecordException: Option[Throwable] = None
+
+    if (tokens.length != parsedSchema.length) {
       // If the number of tokens doesn't match the schema, we should treat it 
as a malformed record.
       // However, we still have chance to parse some of the tokens, by adding 
extra null tokens in
       // the tail if the number is smaller, or by dropping extra tokens if the 
number is larger.
-      val checkedTokens = if (parsedSchema.length > tokens.length) {
+      checkedTokens = if (parsedSchema.length > tokens.length) {
 
 Review comment:
   It seems not. The `if` can be replaced by:
   ```scala
       var badRecordException: Option[Throwable] = if (tokens.length != 
parsedSchema.length) {
         // If the number of tokens doesn't match the schema, we should treat 
it as a malformed record.
         Some(new RuntimeException("Malformed CSV record"))
       } else None
   ```
   Let me do that in a follow up PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on a change in pull request #27239: [SPARK-30530][SQL] Fix filter pushdown for bad CSV records

Reply via email to