[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

SimonBin Mon, 20 Nov 2017 05:03:29 -0800

Github user SimonBin commented on the issue:

    https://github.com/apache/spark/pull/18692
  
    Hi, we are very interested in this patch. I wonder if it could detect this 
code automatically, without needing to write the explicit join:
    
    ```scala
    package net.sansa_stack.spark.playground
    
    import org.apache.spark.sql.{Row, SparkSession}
    import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
    import org.scalatest._
    
    class TestSparkSqlJoin extends FlatSpec {
    
      "SPARK SQL processor" should "be capable of handling transitive join 
conditions" in {
    
        val spark = SparkSession
          .builder()
          .master("local[1]")
          .getOrCreate()
    
        val schema = new StructType()
          .add("s", IntegerType, nullable = true)
          .add("p", IntegerType, nullable = true)
          .add("o", IntegerType, nullable = true)
    
        val data = List((1, 2, 3))
        val dataRDD = spark.sparkContext.parallelize(data).map(attributes => 
Row(attributes._1, attributes._2, attributes._3))
        spark.createDataFrame(dataRDD, schema).createOrReplaceTempView("T")
    
        spark.sql("SELECT A.s FROM T A, T B WHERE A.s = 1 AND B.s = 
1").explain(true)
      }
    
    }
    ```
    
    
    I built this Pull request locally but it still gives me the same issue -->
    
    ```
    == Physical Plan ==
    org.apache.spark.sql.AnalysisException: Detected cartesian product for 
INNER join between logical plans
    Project [s#3]
    +- Filter (isnotnull(s#3) && (s#3 = 1))
       +- LogicalRDD [s#3, p#4, o#5], false
    and
    Project
    +- Filter (isnotnull(s#25) && (s#25 = 1))
       +- LogicalRDD [s#25, p#26, o#27], false
    Join condition is missing or trivial.
    Use the CROSS JOIN syntax to allow cartesian products between these 
relations.;
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

Reply via email to