liwensun opened a new pull request #29623:
URL: https://github.com/apache/spark/pull/29623


   ### What changes were proposed in this pull request?
   
   PropagateEmptyRelation will not be applied to the LIMIT operator in 
streaming queries.
   
   ### Why are the changes needed?
   
   Right now, the limit operator in a streaming query may get optimized away 
when the relation is empty. This can be problematic for stateful streaming, as 
this empty batch will not write any state store files, and the next batch will 
fail when trying to read these state store files and throw a file not found 
error.
   
   We should not let PropagateEmptyRelation optimize away the Limit operator 
for streaming queries.
   
   This PR is intended to apply a small and safe fix for 
PropagateEmptyRelation. A fundamental fix that can prevent this from happening 
again in the future and in other optimizer rules is more desirable, but that's 
a much larger task.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   unit tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to