sadikovi commented on code in PR #45578:
URL: https://github.com/apache/spark/pull/45578#discussion_r1529737022
##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala:
##########
@@ -197,19 +197,20 @@ private[sql] object AvroUtils extends Logging {
def hasNextRow: Boolean = {
while (!completed && currentRow.isEmpty) {
- if (fileReader.pastSync(stopPosition)) {
+ // In some cases of empty blocks in an Avro file,
`fileReader.hasNext()` returns false but
Review Comment:
It seems to be a bug in Avro. When blockRemaining can be 0, hasNext tries to
load the next block but still checks if blockRemaining != 0 returning false
when the next block is actually available.
The Avro FileReader API is limited and the only thing I could do is to just
try to call hasNext again - seems to work for all of the tests cases including
empty blocks.
You are right, ideally we should just loop over hasNext until it actually
returns false or we reach EOF. I tried to implement it but I could not because
FileReader does not expose the current stream offset (`tell()` actually returns
the block start which is different).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]