xuanyuanking opened a new pull request #29256:
URL: https://github.com/apache/spark/pull/29256
### What changes were proposed in this pull request?
Check the Distinct nodes by assuming it as Aggregate in
`UnsupportOperationChecker` for streaming.
### Why are the changes needed?
Since the union clause in SQL has the requirement of deduplication, the
parser will generate `Distinct(Union)` and the optimizer rule
`ReplaceDistinctWithAggregate` will change it to `Aggregate(Union)`. This logic
is of both batch and streaming queries. However, in the streaming, the
aggregation will be wrapped by state store operations.
Before this change, the SS union queries in Append mode will get the
following confusing error when the watermark is lacking.
```
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at
org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:346)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:561)
at
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:112)
...
```
### Does this PR introduce _any_ user-facing change?
Yes, return a better error message.
### How was this patch tested?
New UT added.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]