itholic commented on code in PR #39258:
URL: https://github.com/apache/spark/pull/39258#discussion_r1062222396
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -319,15 +319,17 @@ class UnivocityParser(
throw BadRecordException(
() => getCurrentInput,
() => None,
- QueryExecutionErrors.malformedCSVRecordError())
+ QueryExecutionErrors.malformedCSVRecordError(""))
}
+ val currentInput = getCurrentInput
Review Comment:
It seems like the `getCurrentInput` pops the current input and remove it
from the queue or something ??
(I guess that's way the previous commit was failed:
https://github.com/itholic/spark/runs/10428900823)
If we use `getCurrentInput` in errors directly, the current input is spent
so the following code couldn't get the proper input, so `getCurrentInput`
returns null which is not correct at:
```scala
throw BadRecordException(
() => getCurrentInput, () => requiredRow.headOption,
badRecordException.get)
} else {
requiredRow
}
```
So, it eventually complains:
```
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<>
struct<from_csv(value):struct<a:int,b:int,_unparsed:string>>
[[2,12,null]] [[2,12,null]]
![[null,null,"]] [[null,null,null]]
```
Maybe is there any way to keep the value in the memory even after calling
`getCurrentInput` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]