HeartSaVioR commented on code in PR #40561:
URL: https://github.com/apache/spark/pull/40561#discussion_r1151433440
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala:
##########
@@ -464,6 +469,19 @@ object UnsupportedOperationChecker extends Logging {
throwError(s"Join type $joinType is not supported with streaming
DataFrame/Dataset")
}
+ case d: DeduplicateWithinWatermark if d.isStreaming =>
+ // Find any attributes that are associated with an eventTime
watermark.
+ val watermarkAttributes = d.child.output.collect {
+ case a: Attribute if
a.metadata.contains(EventTimeWatermark.delayKey) => a
+ }
+
+ // DeduplicateWithinWatermark requires event time column being set
in the input DataFrame
+ if (watermarkAttributes.isEmpty) {
+ throwError(
+ "dropDuplicatesWithinWatermark is not supported on streaming
DataFrames/DataSets " +
Review Comment:
I'm also OK to go with new error message pattern if you feel like it's
clearer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]