[jira] [Commented] (SPARK-45722) False positive when cheking for ambigious columns

2023-10-30 Thread Alexey Dmitriev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780965#comment-17780965
 ] 

Alexey Dmitriev commented on SPARK-45722:
-

turning off spark.sql.analyzer.failAmbiguousSelfJoin doesn't help, so probably 
issue is not exatcly where I think it was:
{code:java}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
session = SparkSession.Builder().getOrCreate()
session.conf.set('spark.sql.analyzer.failAmbiguousSelfJoin', False)
A = session.createDataFrame([(1,)], ['a'])
B = session.createDataFrame([(1,)], ['b'])
A.join(B).select(B.b)
C = A.join(A.join(B), on=F.lit(False), how='leftanti')
C.join(B).select(B.b) {code}
AnalysisException: Resolved attribute(s) b#2L missing from a#0L,b#12L in 
operator !Project [b#2L]. Attribute(s) with the same name appear in the 
operation: b. Please check if the right attribute(s) are used.; !Project [b#2L] 
+- Join Inner :- Join LeftAnti, false : :- LogicalRDD [a#0L], false : +- Join 
Inner : :- LogicalRDD [a#9L], false : +- LogicalRDD [b#2L], false +- LogicalRDD 
[b#12L], false

> False positive when cheking for ambigious columns 
> --
>
> Key: SPARK-45722
> URL: https://issues.apache.org/jira/browse/SPARK-45722
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
> Environment: py3.11 
> pyspark 3.4.0
>Reporter: Alexey Dmitriev
>Priority: Major
>
> I have following code, which I expect to work
> {code:java}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as F session = 
> SparkSession.Builder().getOrCreate() A = session.createDataFrame([(1,)], 
> ['a'])
> B = session.createDataFrame([(1,)], ['b'])
> A.join(B).select(B.b) # works fine
> C = A.join(A.join(B), on=F.lit(False), how='leftanti') # C has the same 
> columns as A (columns, not only names)
> C.join(B).select(B.b) #doesn't work, says B.b is ambigious,
> {code}
> {code:java}
> Exception below:{code}
> {code:java}
> AnalysisException: Column b#11L are ambiguous. It's probably because you 
> joined several Datasets together, and some of these Datasets are the same. 
> This column points to one of the Datasets but Spark is unable to figure out 
> which one. Please alias the Datasets with different names via `Dataset.as` 
> before joining them, and specify the column using qualified name, e.g. 
> `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
> spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45722) False positive when cheking for ambigious columns

2023-10-30 Thread Alexey Dmitriev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780942#comment-17780942
 ] 

Alexey Dmitriev commented on SPARK-45722:
-

I think the type of the merge should be checked 
[here|https://github.com/apache/spark/blob/b92265a98f241b333467a02f4fffc9889ad3e7da/sql/core/src/main/scala/org/apache/spark/sql/execution/analysis/DetectAmbiguousSelfJoin.scala#L129]

> False positive when cheking for ambigious columns 
> --
>
> Key: SPARK-45722
> URL: https://issues.apache.org/jira/browse/SPARK-45722
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
> Environment: py3.11 
> pyspark 3.4.0
>Reporter: Alexey Dmitriev
>Priority: Major
>
> I have following code, which I expect to work
> ```
> from pyspark.sql import SparkSession
> session = SparkSession.Builder().getOrCreate()
> A = session.createDataFrame([(1,)], ['a'])
> B = session.createDataFrame([(1,)], ['b'])
> A.join(B).select(B.b) # works fine
> C = A.join(A.join(B), on=F.lit(False), how='leftanti') # C has the same 
> columns as A (columns, not only names)
> C.join(B).select(B.b) #doesn't work, says B.b is ambigious,
> ``` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org