Re: [PR] [SPARK-47217][SQL] Fix deduplicated expression resolution [spark]

via GitHub Wed, 20 Mar 2024 10:10:00 -0700


peter-toth commented on PR #45552:
URL: https://github.com/apache/spark/pull/45552#issuecomment-2010096626


   I had some time to play with connect today and as I said it does work well 
with the query in the PR description, but it doesn't seem to support even the 
most basic self joins:
   ```
   @ val df = Seq((1, 2)).toDF("a", "b")
   Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties
   df: org.apache.spark.sql.package.DataFrame = [a: int, b: int]
   
   @ val df2 = df.select(df("a").as("aa"), df("b"))
   df2: org.apache.spark.sql.package.DataFrame = [aa: int, b: int]
   
   @ val df3 = df2.join(df, df2("b") === df("b"))
   df3: org.apache.spark.sql.package.DataFrame = Invalid Dataframe; 
[AMBIGUOUS_COLUMN_REFERENCE] Column "b" is ambiguous. It's because you joined 
several DataFrame together, and some of these DataFrames are the same.
   This column points to one of the DataFrames but Spark is unable to figure 
out which one.
   Please alias the DataFrames with different names via `DataFrame.alias` 
before joining them,
   and specify the column using qualified name, e.g. 
`df.alias("a").join(df.alias("b"), col("a.id") > col("b.id"))`. SQLSTATE: 42702
   ```
   Am I missing something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47217][SQL] Fix deduplicated expression resolution [spark]

Reply via email to