zhengruifeng opened a new pull request, #44526:
URL: https://github.com/apache/spark/pull/44526
### What changes were proposed in this pull request?
the column references in `ALSModel.transform` maybe ambiguous in some case
### Why are the changes needed?
to fix a bug
before this fix, the test fails with:
```
======================================================================
ERROR [5.597s]: test_ambiguous_column
(pyspark.ml.tests.test_als.ALSTest.test_ambiguous_column)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/ml/tests/test_als.py",
line 47, in test_ambiguous_column
predictions = loaded_model.transform(users.crossJoin(items))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/ml/base.py", line 260,
in transform
return self._transform(dataset)
^^^^^^^^^^^^^^^^^^^^^^^^
...
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: Column features#50,
features#46 are ambiguous. It's probably because you joined several Datasets
together, and some of these Datasets are the same. This column points to one of
the Datasets but Spark is unable to figure out which one. Please alias the
Datasets with different names via `Dataset.as` before joining them, and specify
the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" >
$"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false
to disable this check.
JVM stacktrace:
org.apache.spark.sql.AnalysisException: Column features#50, features#46 are
ambiguous. It's probably because you joined several Datasets together, and some
of these Datasets are the same. This column points to one of the Datasets but
Spark is unable to figure out which one. Please alias the Datasets with
different names via `Dataset.as` before joining them, and specify the column
using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`.
You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable
this check.
at
org.apache.spark.sql.errors.QueryCompilationErrors$.ambiguousAttributesInSelfJoinError(QueryCompilationErrors.scala:1998)
```
### Does this PR introduce _any_ user-facing change?
yes, bug fix
### How was this patch tested?
added ut
### Was this patch authored or co-authored using generative AI tooling?
no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]