HeartSaVioR commented on code in PR #40845:
URL: https://github.com/apache/spark/pull/40845#discussion_r1171858659
##########
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala:
##########
@@ -55,6 +55,12 @@ abstract class StreamingQueryListener {
*/
def onQueryProgress(event: QueryProgressEvent): Unit
+ /**
+ * Called when the query is idle for a certain time period and waiting for
new data to process.
+ * @since 3.5.0
+ */
+ def onQueryIdle(event: QueryIdleEvent): Unit = {}
Review Comment:
Current behavior of Spark - if the streaming query runs for batch N and be
stuck with waiting for data, Spark will give a progress update with batch N,
but modifying progress update a bit to reset the number of input rows to 0,
output rows to 0, blabla.
This gives a huge confusion from users because we also have no-data batch
which the number of input rows is also 0. There is a way to distinguish the two
via looking into elapsed time info and see there is no field of microbatch
execution, but I wouldn't expect moderate users would know about this. Instead,
the feedback what I got is, they even didn't know about such behavior we even
produce a progress update event on idle. This is honestly an awful UX.
Arguably, giving the latest progress update with resetting some of values on
idle is not useful at all. If users want it, users can do it by themselves. Why
not avoid the confusion at all while we can?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]