Xiaoju Wu created SPARK-30298:
---------------------------------

             Summary: bucket join cannot work for self-join with views
                 Key: SPARK-30298
                 URL: https://issues.apache.org/jira/browse/SPARK-30298
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Xiaoju Wu


This UT may fail at the last line:
{code:java}
test("bucket join cannot work for self-join with views") {
    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1") {
      withTable("t1") {
        val df = (0 until 20).map(i => (i, i)).toDF("i", "j").as("df")
        df.write
          .format("parquet")
          .bucketBy(8, "i")
          .saveAsTable("t1")

        sql(s"create view v1 as select * from t1").collect()

        val plan1 = sql("SELECT * FROM t1 a JOIN t1 b ON a.i = 
b.i").queryExecution.executedPlan
        assert(plan1.collect { case exchange : ShuffleExchangeExec => exchange 
}.isEmpty)

        val plan2 = sql("SELECT * FROM t1 a JOIN v1 b ON a.i = 
b.i").queryExecution.executedPlan
        assert(plan2.collect { case exchange : ShuffleExchangeExec => exchange 
}.isEmpty)
      }
    }
  }
{code}

It's because View will add Project with Alias, then Join's requiredDistribution 
is based on Alias, but ProjectExec passes child's outputPartition up without 
Alias. Then the satisfies check cannot meet in EnsureRequirement.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to