bersprockets commented on a change in pull request #30221:
URL: https://github.com/apache/spark/pull/30221#discussion_r516073775
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
protected val stopPosition: Long
private[this] var completed = false
+ private[this] var interveningNext = true
+ private[this] var prevHasNextRow = false
private[this] var currentRow: Option[InternalRow] = None
def hasNextRow: Boolean = {
+ if (!interveningNext) {
+ // until a row is consumed, return previous result of hasNextRow
+ return prevHasNextRow
+ }
Review comment:
Yeah, this approach is better, because it also handles the case where
nextRow is called multiple times (which previously, would return the same row
over and over).
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
protected val stopPosition: Long
private[this] var completed = false
+ private[this] var interveningNext = true
+ private[this] var prevHasNextRow = false
private[this] var currentRow: Option[InternalRow] = None
def hasNextRow: Boolean = {
+ if (!interveningNext) {
+ // until a row is consumed, return previous result of hasNextRow
+ return prevHasNextRow
+ }
Review comment:
Yeah, this approach is better, because it also handles the case where
nextRow is called multiple times (which previously, would return the same row
over and over, which, as @HyukjinKwon pointed out would keep the status quo,
but it probably wouldn't be correct, since the users of this code are Iterator
implementations).
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
protected val stopPosition: Long
private[this] var completed = false
+ private[this] var interveningNext = true
+ private[this] var prevHasNextRow = false
private[this] var currentRow: Option[InternalRow] = None
def hasNextRow: Boolean = {
+ if (!interveningNext) {
+ // until a row is consumed, return previous result of hasNextRow
+ return prevHasNextRow
+ }
Review comment:
>But it sounds weird that nextRow will be called with the same row.
Yes, I agree. As I mentioned above, leaving it that way would keep the
status quo, but probably wouldn't be correct since the users of this RowReader
are Iterator implementations.
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
protected val stopPosition: Long
private[this] var completed = false
+ private[this] var interveningNext = true
+ private[this] var prevHasNextRow = false
private[this] var currentRow: Option[InternalRow] = None
def hasNextRow: Boolean = {
+ if (!interveningNext) {
+ // until a row is consumed, return previous result of hasNextRow
+ return prevHasNextRow
+ }
Review comment:
Follow up to my previous comment: As @HyukjinKwon pointed out, I was
earlier trying to fix the stated bug without changing other behavior. But I
should probably fix nextRow while I am at it, which makes RowReader follow a
more recognized pattern and makes the actual Iterators that use it more
correct.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]