itholic commented on code in PR #39258:
URL: https://github.com/apache/spark/pull/39258#discussion_r1062222396


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -319,15 +319,17 @@ class UnivocityParser(
       throw BadRecordException(
         () => getCurrentInput,
         () => None,
-        QueryExecutionErrors.malformedCSVRecordError())
+        QueryExecutionErrors.malformedCSVRecordError(""))
     }
 
+    val currentInput = getCurrentInput

Review Comment:
   It seems like the `getCurrentInput` pops the current input and remove it 
from the queue or something ??
   
   (I guess that's way the previous commit was failed: 
https://github.com/itholic/spark/runs/10428900823)
   
   If we use `getCurrentInput` in errors directly, the current input is spent 
so the following code couldn't get the proper input, so `getCurrentInput` 
returns null which is not correct at:
   ```scala
           throw BadRecordException(
             () => getCurrentInput, () => requiredRow.headOption, 
badRecordException.get)
         } else {
           requiredRow
         }
   ```
   
   So, it eventually complains:
   ```
   == Results ==
   !== Correct Answer - 2 ==   == Spark Answer - 2 ==
   !struct<>                   
struct<from_csv(value):struct<a:int,b:int,_unparsed:string>>
    [[2,12,null]]              [[2,12,null]]
   ![[null,null,"]]            [[null,null,null]]
   ```
   
   Maybe is there any way to keep the value in the memory even after calling 
`getCurrentInput` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to