[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40845: [SPARK-43183][SS] Introduce a new callback "onQueryIdle" to StreamingQueryListener

via GitHub Wed, 19 Apr 2023 14:10:54 -0700


HeartSaVioR commented on code in PR #40845:
URL: https://github.com/apache/spark/pull/40845#discussion_r1171858659



##########
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala:
##########
@@ -55,6 +55,12 @@ abstract class StreamingQueryListener {
    */
   def onQueryProgress(event: QueryProgressEvent): Unit
 
+  /**
+   * Called when the query is idle for a certain time period and waiting for 
new data to process.
+   * @since 3.5.0
+   */
+  def onQueryIdle(event: QueryIdleEvent): Unit = {}

Review Comment:
   Current behavior of Spark - if the streaming query runs for batch N and be 
stuck with waiting for data, Spark will give a progress update with batch N, 
but modifying progress update a bit to reset the number of input rows to 0, 
output rows to 0, blabla.
   
   This gives a huge confusion from users because we also have no-data batch 
which the number of input rows is also 0. There is a way to distinguish the two 
via looking into elapsed time info and see there is no field of microbatch 
execution, but I wouldn't expect moderate users would know about this. Instead, 
the feedback what I got is, they even didn't know about such behavior we even 
produce a progress update event on idle. This is honestly an awful UX.
   
   Arguably, giving the latest progress update with resetting some of values on 
idle is not useful at all. If users want it, users can do it by themselves. Why 
not avoid the confusion at all while we can?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40845: [SPARK-43183][SS] Introduce a new callback "onQueryIdle" to StreamingQueryListener

Reply via email to