peter-toth commented on PR #45552:
URL: https://github.com/apache/spark/pull/45552#issuecomment-2010096626
I had some time to play with connect today and as I said it does work well
with the query in the PR description, but it doesn't seem to support even the
most basic self joins:
```
@ val df = Seq((1, 2)).toDF("a", "b")
Using Spark's default log4j profile:
org/apache/spark/log4j2-defaults.properties
df: org.apache.spark.sql.package.DataFrame = [a: int, b: int]
@ val df2 = df.select(df("a").as("aa"), df("b"))
df2: org.apache.spark.sql.package.DataFrame = [aa: int, b: int]
@ val df3 = df2.join(df, df2("b") === df("b"))
df3: org.apache.spark.sql.package.DataFrame = Invalid Dataframe;
[AMBIGUOUS_COLUMN_REFERENCE] Column "b" is ambiguous. It's because you joined
several DataFrame together, and some of these DataFrames are the same.
This column points to one of the DataFrames but Spark is unable to figure
out which one.
Please alias the DataFrames with different names via `DataFrame.alias`
before joining them,
and specify the column using qualified name, e.g.
`df.alias("a").join(df.alias("b"), col("a.id") > col("b.id"))`. SQLSTATE: 42702
```
Am I missing something?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]