[GitHub] spark pull request #20598: [SPARK-23406] [SS] Enable stream-stream self-join...

tdas Wed, 14 Feb 2018 01:05:38 -0800

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20598#discussion_r168110951
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala
 ---
    @@ -62,7 +64,7 @@ case class StreamingRelation(dataSource: DataSource, 
sourceName: String, output:
     case class StreamingExecutionRelation(
    --- End diff --
    
    They need to extend MultiInstance relation, because Dataset.join() forces 
an analysis to disambiguate left and right in self-joins 
([here](https://github.com/apache/spark/blob/357babde5a8eb9710de7016d7ae82dee21fa4ef3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L914))
 and when there is a self-join between two streaming Datasets (i.e. they 
contain StreamingRelation/StreamingRelationV2), without the 
MultiInstanceRelation, it throws the error (see PR description).
    
    Regarding StreamingExecutionRelation, while the other sources convert 
StreamingRelation to StreamingExecutionRelation, the MemoryStream directly 
injects StreamingExceutionRelation at that time of Dataset operations. Hence 
its good that StreamingExecutionRelation also extends MultiInstanceRelation.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20598: [SPARK-23406] [SS] Enable stream-stream self-join...

Reply via email to