Github user SimonBin commented on the issue:
https://github.com/apache/spark/pull/18692
Hi, we are very interested in this patch. I wonder if it could detect this
code automatically, without needing to write the explicit join:
```scala
package net.sansa_stack.spark.playground
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
import org.scalatest._
class TestSparkSqlJoin extends FlatSpec {
"SPARK SQL processor" should "be capable of handling transitive join
conditions" in {
val spark = SparkSession
.builder()
.master("local[1]")
.getOrCreate()
val schema = new StructType()
.add("s", IntegerType, nullable = true)
.add("p", IntegerType, nullable = true)
.add("o", IntegerType, nullable = true)
val data = List((1, 2, 3))
val dataRDD = spark.sparkContext.parallelize(data).map(attributes =>
Row(attributes._1, attributes._2, attributes._3))
spark.createDataFrame(dataRDD, schema).createOrReplaceTempView("T")
spark.sql("SELECT A.s FROM T A, T B WHERE A.s = 1 AND B.s =
1").explain(true)
}
}
```
I built this Pull request locally but it still gives me the same issue -->
```
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for
INNER join between logical plans
Project [s#3]
+- Filter (isnotnull(s#3) && (s#3 = 1))
+- LogicalRDD [s#3, p#4, o#5], false
and
Project
+- Filter (isnotnull(s#25) && (s#25 = 1))
+- LogicalRDD [s#25, p#26, o#27], false
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these
relations.;
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]