zhengruifeng opened a new pull request, #56398:
URL: https://github.com/apache/spark/pull/56398

   ### What changes were proposed in this pull request?
   
   This PR adds PySpark tests for resolving a DataFrame-scoped column reference 
(e.g. `df1.zip(df2).select(df1.some_col)`) through `DataFrame.zip`. The tests 
live in the column test suites, next to the existing `test_resolve_*` 
resolve-through-operator tests:
   
   - `python/pyspark/sql/tests/test_column.py` (`ColumnTestsMixin`): 7 tests 
asserting the Classic behavior - selecting each side by its originating 
DataFrame, reordering, disambiguating duplicate column names, shared-producer 
dedup, use in an expression, use in a filter, and through chained projections.
   - `python/pyspark/sql/tests/connect/test_parity_column.py` 
(`ColumnParityTests`): 7 overrides asserting the Spark Connect behavior.
   
   On Classic, `df.col` resolves by attribute id, which `ResolveZip` preserves 
in the merged `Project`, so the reference resolves. On Spark Connect, `df.col` 
resolves by plan id; `ResolveZip` merges the two sides into a single plan and 
the per-DataFrame plan-id tags do not survive, so the reference raises 
`CANNOT_RESOLVE_DATAFRAME_COLUMN` in both strict and lenient resolution modes. 
This mirrors the existing `test_resolve_after_union` divergence, where a tagged 
reference also cannot be found below the operator on Connect.
   
   ### Why are the changes needed?
   
   `DataFrame.zip` is a newly added API, and its interaction with 
DataFrame-scoped column references (`DataFrame.col`) was not covered by tests. 
These tests document the supported Classic behavior and pin down the current 
Spark Connect limitation, so any future change in resolution behavior is caught.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This is a test-only change.
   
   ### How was this patch tested?
   
   New unit tests, run locally:
   
   - `pyspark.sql.tests.test_column` (Classic `ColumnTests`) - the 7 new tests 
pass.
   - `pyspark.sql.tests.connect.test_parity_column` (`ColumnParityTests` and 
`ColumnParityTestsWithNonStrictDFColResolution`) - the 7 overrides pass under 
both strict and lenient DataFrame column resolution modes (14 invocations, all 
passing).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.8
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to