HeartSaVioR commented on a change in pull request #29256:
URL: https://github.com/apache/spark/pull/29256#discussion_r471979320
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
##########
@@ -1106,6 +1107,54 @@ class StreamingQuerySuite extends StreamTest with
BeforeAndAfter with Logging wi
}
}
+ test("union in streaming query of append mode without watermark") {
+ val inputData1 = MemoryStream[Int]
+ val inputData2 = MemoryStream[Int]
+ withTempView("s1", "s2") {
+ inputData1.toDF().createOrReplaceTempView("s1")
+ inputData2.toDF().createOrReplaceTempView("s2")
+ val unioned = spark.sql(
+ "select s1.value from s1 union select s2.value from s2")
+ checkExceptionMessage(unioned)
+ }
+ }
+
+ test("distinct in streaming query of append mode without watermark") {
+ val inputData = MemoryStream[Int]
+ withTempView("deduptest") {
+ inputData.toDF().toDF("value").createOrReplaceTempView("deduptest")
+ val distinct = spark.sql("select distinct value from deduptest")
+ checkExceptionMessage(distinct)
+ }
+ }
+
+ test("distinct in streaming query of complete mode") {
+ val inputData = MemoryStream[Int]
+ withTempView("deduptest") {
+ inputData.toDF().toDF("value").createOrReplaceTempView("deduptest")
+ val distinct = spark.sql("select distinct value from deduptest")
+
+ testStream(distinct, Complete)(
+ AddData(inputData, 1, 2, 3, 3, 4),
+ CheckAnswer(Row(1), Row(2), Row(3), Row(4))
Review comment:
> It has been clear on the structured streaming guide doc - distinct is
unsupported. This is also referred to the Dataset API distinct, not the SQL
DISTINCT clause.
Are you sure that can be interpreted as same for end users? Is there any
mention of Dataset API vs SQL clause in the statement?
> Also, the different behavior between SQL UNION and Dataset.union I
mentioned here is not only for Streaming. The Dataset.union is equivalent to
UNION ALL in SQL, and SQL UNION hasn't corresponded API in Dataset yet.
That's clearly described in the doc. The thing is that SQL UNION cannot be
done in streaming, as according to the doc, SQL UNION is Dataset.union +
distinct but distinct is not supported. So this is another one being enabled as
side effects. You can't do that with Dataset API.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]