Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22326#discussion_r220628624
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExecSuite.scala
 ---
    @@ -100,6 +104,28 @@ class BatchEvalPythonExecSuite extends SparkPlanTest 
with SharedSQLContext {
         }
         assert(qualifiedPlanNodes.size == 1)
       }
    +
    +  test("SPARK-25314: Python UDF refers to the attributes from more than 
one child " +
    +    "in join condition") {
    +    def dummyPythonUDFTest(): Unit = {
    +      val df = Seq(("Hello", 4)).toDF("a", "b")
    +      val df2 = Seq(("Hello", 4)).toDF("c", "d")
    +      val joinDF = df.join(df2,
    +        dummyPythonUDF(col("a"), col("c")) === dummyPythonUDF(col("d"), 
col("c")))
    +      val qualifiedPlanNodes = joinDF.queryExecution.executedPlan.collect {
    +        case b: BatchEvalPythonExec => b
    +      }
    +      assert(qualifiedPlanNodes.size == 1)
    +    }
    +    // Test without spark.sql.crossJoin.enabled set
    +    val errMsg = intercept[AnalysisException] {
    +      dummyPythonUDFTest()
    +    }
    +    assert(errMsg.getMessage.startsWith("Detected implicit cartesian 
product"))
    +    // Test with spark.sql.crossJoin.enabled=true
    +    spark.conf.set("spark.sql.crossJoin.enabled", "true")
    --- End diff --
    
    Thanks, done in 7f66954.
    ```
    So I'd prefer having one or 2 end-to-end tests and create a new suite 
testing only the rule and the plan transformation, both for having lower 
testing time and finer grained tests checking that the output plan is indeed 
the expected one (not only checking the result of the query).
    ```
    Make sense, will add a plan test for this rule.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to