zhengruifeng opened a new pull request, #56398: URL: https://github.com/apache/spark/pull/56398
### What changes were proposed in this pull request? This PR adds PySpark tests for resolving a DataFrame-scoped column reference (e.g. `df1.zip(df2).select(df1.some_col)`) through `DataFrame.zip`. The tests live in the column test suites, next to the existing `test_resolve_*` resolve-through-operator tests: - `python/pyspark/sql/tests/test_column.py` (`ColumnTestsMixin`): 7 tests asserting the Classic behavior - selecting each side by its originating DataFrame, reordering, disambiguating duplicate column names, shared-producer dedup, use in an expression, use in a filter, and through chained projections. - `python/pyspark/sql/tests/connect/test_parity_column.py` (`ColumnParityTests`): 7 overrides asserting the Spark Connect behavior. On Classic, `df.col` resolves by attribute id, which `ResolveZip` preserves in the merged `Project`, so the reference resolves. On Spark Connect, `df.col` resolves by plan id; `ResolveZip` merges the two sides into a single plan and the per-DataFrame plan-id tags do not survive, so the reference raises `CANNOT_RESOLVE_DATAFRAME_COLUMN` in both strict and lenient resolution modes. This mirrors the existing `test_resolve_after_union` divergence, where a tagged reference also cannot be found below the operator on Connect. ### Why are the changes needed? `DataFrame.zip` is a newly added API, and its interaction with DataFrame-scoped column references (`DataFrame.col`) was not covered by tests. These tests document the supported Classic behavior and pin down the current Spark Connect limitation, so any future change in resolution behavior is caught. ### Does this PR introduce _any_ user-facing change? No. This is a test-only change. ### How was this patch tested? New unit tests, run locally: - `pyspark.sql.tests.test_column` (Classic `ColumnTests`) - the 7 new tests pass. - `pyspark.sql.tests.connect.test_parity_column` (`ColumnParityTests` and `ColumnParityTestsWithNonStrictDFColResolution`) - the 7 overrides pass under both strict and lenient DataFrame column resolution modes (14 invocations, all passing). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
