Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/20598#discussion_r168110951
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala
---
@@ -62,7 +64,7 @@ case class StreamingRelation(dataSource: DataSource,
sourceName: String, output:
case class StreamingExecutionRelation(
--- End diff --
They need to extend MultiInstance relation, because Dataset.join() forces
an analysis to disambiguate left and right in self-joins
([here](https://github.com/apache/spark/blob/357babde5a8eb9710de7016d7ae82dee21fa4ef3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L914))
and when there is a self-join between two streaming Datasets (i.e. they
contain StreamingRelation/StreamingRelationV2), without the
MultiInstanceRelation, it throws the error (see PR description).
Regarding StreamingExecutionRelation, while the other sources convert
StreamingRelation to StreamingExecutionRelation, the MemoryStream directly
injects StreamingExceutionRelation at that time of Dataset operations. Hence
its good that StreamingExecutionRelation also extends MultiInstanceRelation.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]