[ https://issues.apache.org/jira/browse/SPARK-26864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-26864: ------------------------------- Labels: correctness (was: ) > Query may return incorrect result when python udf is used as a join condition > and the udf uses attributes from both legs of left semi join. > ------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-26864 > URL: https://issues.apache.org/jira/browse/SPARK-26864 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dilip Biswal > Assignee: Dilip Biswal > Priority: Major > Labels: correctness > Fix For: 2.4.1, 3.0.0 > > > In SPARK-25314, we supported the scenario of having a python UDF that refers > to attributes from both legs of a join condition by rewriting > the plan to convert an inner join or left semi join to a filter over a cross > join. In case of left semi join, this transformation may > cause incorrect results when the right leg of join condition produces > duplicate rows based on the join condition. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org