anishshri-db opened a new pull request, #43370:
URL: https://github.com/apache/spark/pull/43370

   ### What changes were proposed in this pull request?
   Add assert and log to indicate watermark definition is required for 
streaming aggregation queries in append mode
   
   
   ### Why are the changes needed?
   We have a check for ensuring that watermark attributes are specified in 
append mode based on the UnsupportedOperationChecker. However, in some cases we 
got report where user hit this stack trace:
   
   ```
   org.apache.spark.SparkException: Exception thrown in awaitResult: Job 
aborted due to stage failure: Task 3 in stage 32.0 failed 4 times, most recent 
failure: Lost task 3.3 in stage 32.0 (TID 606) (10.5.71.29 executor 0): 
java.util.NoSuchElementException: None.get
           at scala.None$.get(Option.scala:529)
           at scala.None$.get(Option.scala:527)
           at 
org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:472)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:708)
           at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:145)
           at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:145)
           at 
org.apache.spark.sql.execution.streaming.StateStoreSaveExec.timeTakenMs(statefulOperators.scala:414)
           at 
org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$5(statefulOperators.scala:470)
           at 
org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.$anonfun$mapPartitionsWithStateStore$1(package.scala:63)
           at 
org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:127)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
   ```
   
   In this case, the reason for failure is not immediately clear. Hence adding 
an assert and log message to indicate why the query failed on the executor.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing unit tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to