Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/21560#discussion_r196921230
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
---
@@ -350,7 +350,14 @@ object UnsupportedOperationChecker {
_: TypedFilter) =>
case node if node.nodeName == "StreamingRelationV2" =>
case node =>
- throwError(s"Continuous processing does not support
${node.nodeName} operations.")
+ val aboveSinglePartitionCoalesce = node.find {
--- End diff --
It will allow the first one, and I've added a test to verify.
It ought to allow the second one, but for some reason streaming deduplicate
insists on inserting a shuffle above the coalesce(1). I will address this in a
separate PR, since this seems like suboptimal behavior that isn't only
restricted to continuous processing. For now I tweaked the condition to only
allow aggregates.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]