sadikovi commented on code in PR #45578:
URL: https://github.com/apache/spark/pull/45578#discussion_r1529737022


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala:
##########
@@ -197,19 +197,20 @@ private[sql] object AvroUtils extends Logging {
 
     def hasNextRow: Boolean = {
       while (!completed && currentRow.isEmpty) {
-        if (fileReader.pastSync(stopPosition)) {
+        // In some cases of empty blocks in an Avro file, 
`fileReader.hasNext()` returns false but

Review Comment:
   It seems to be a bug in Avro. When blockRemaining can be 0, hasNext tries to 
load the next block but still checks if blockRemaining != 0 returning false 
when the next block is actually available.
   
   The Avro FileReader API is limited and the only thing I could do is to just 
try to call hasNext again - seems to work for all of the tests cases including 
empty blocks. 
   
   You are right, ideally we should just loop over hasNext until it actually 
returns false or we reach EOF. I tried to implement it but I could not because 
FileReader does not expose the current stream offset (`tell()` actually returns 
the block start which is different).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to