zhengruifeng opened a new pull request, #55848:
URL: https://github.com/apache/spark/pull/55848
### What changes were proposed in this pull request?
Add a new gotcha section to `docs/spark-connect-gotchas.md` describing how
Spark Connect resolves DataFrame column references (`df["col"]`) via plan-id
tagging, and how this diverges from Spark Classic once a column has been
shadowed by `withColumn` or `select`.
The section covers:
- Why `df.withColumn("col", ...).select(df["col"])` fails on both Spark
Classic (`MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION`) and Spark
Connect (`CANNOT_RESOLVE_DATAFRAME_COLUMN`).
- Why users may have observed this query succeeding on older Spark Connect
builds (lenient name-based fallback when plan-id resolution does not match a
tagged ancestor).
- The recommended fix: use an untagged `F.col("col")` reference after column
shadowing.
- The opt-in escape hatch:
`spark.sql.analyzer.strictDataFrameColumnResolution=false` (introduced in
SPARK-56614 / apache/spark#55531) to re-enable the lenient fallback.
Also adds a "DataFrame column references" row to the summary table at the
end of the document.
### Why are the changes needed?
The plan-id-based column resolution path is a Spark Connect-specific
contract that is not documented anywhere user-facing. Users migrating workloads
to Spark Connect have encountered surprises when patterns that previously
"worked" stop resolving, with an error class
(`CANNOT_RESOLVE_DATAFRAME_COLUMN`) and a config
(`strictDataFrameColumnResolution`) whose connection to their code is not
obvious. This adds explicit guidance and a code-level mitigation alongside the
other Connect-vs-Classic gotchas already documented in this file.
### Does this PR introduce _any_ user-facing change?
No. Documentation-only change.
### How was this patch tested?
Documentation-only change; no automated tests. Verified the markdown renders
correctly and is consistent with the existing four-gotcha layout in
`docs/spark-connect-gotchas.md`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Anthropic), claude-opus-4-7
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]