HeartSaVioR opened a new pull request, #40845:
URL: https://github.com/apache/spark/pull/40845

   ### What changes were proposed in this pull request?
   
   This PR introduces a new callback "onQueryIdle" to StreamingQueryListener, 
which was a part of query progress update.
   
   The signature of the new callback method is below:
   
   ```
   def onQueryIdle(event: QueryIdleEvent): Unit
   
   class QueryIdleEvent(val id: UUID, val runId: UUID) extends Event
   ```
   
   This PR proposes to provide a default implementation for onQueryIdle in 
StreamingQueryListener so that it does not break existing implementations of 
streaming query listener in Scala/Java. 
   
   Note that it's a behavioral change as users will receive the different 
callback when the streaming query is being idle for configured period of time 
(previously they receive the callback onQueryProgress), but this is worth doing 
as described in the section "Why are the changes needed?".
   
   ### Why are the changes needed?
   
   People has been having a lot of confusions about query progress event on 
idleness query; it’s not only the matter of understanding but also comes up 
with various types of complaints, because they tend to think the event only 
happens after the microbatch has finished. In addition, misunderstanding may 
also lead to data loss on monitoring - since we give the latest batch ID for 
update event on idleness, if the listener implementation blindly performs 
upsert the information to the external storage based on batch ID, they are in 
risk on losing data.
   
   This also complicates the logic because we have to memorize the execution 
for the previous batch, which is arguably not necessary.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. After this change, users won't get query progress update event from 
idle query. Instead, they will get query idle event.
   
   ### How was this patch tested?
   
   Modified UTs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to