Re: [PR] [SPARK-55886][PYTHON][TESTS][FOLLOWUP] Add tests for DataFrame.col resolution through DataFrame.zip [spark]

via GitHub Thu, 11 Jun 2026 21:23:52 -0700


zhengruifeng commented on code in PR #56398:
URL: https://github.com/apache/spark/pull/56398#discussion_r3400651185



##########
python/pyspark/sql/tests/connect/test_parity_column.py:
##########
@@ -50,6 +50,63 @@ def test_resolve_after_union(self):
         with self.assertRaisesRegex(AnalysisException, 
"CANNOT_RESOLVE_DATAFRAME_COLUMN"):
             df1.union(df2).select(df1.c).collect()
 
+    # zip merges the two column-projected sides into a single plan, so the
+    # per-DataFrame plan-id tags do not survive ResolveZip. A tagged left/right
+    # reference can no longer be found and raises in both strict and lenient

Review Comment:
   Ran it - your trace holds exactly when the base's plan root is the relation 
node itself, but not for `createDataFrame`:
   
   - `range` base: the root is the bare `Range` node, which `ResolveZip` reuses 
unchanged as the merged base, so its plan-id tag survives and 
`r.zip(rr).select(r.id)` resolves on Connect in both strict and lenient modes - 
as you predicted.
   - `createDataFrame` base: it analyzes to `Project [a AS a, b AS b]` over a 
`LocalRelation`, and the plan-id tag sits on that `Project` - which 
`analyzeChain` dissolves into the merged chain - so 
`df.zip(right).select(df.a)` raises `CANNOT_RESOLVE_DATAFRAME_COLUMN` like the 
projected sides (both modes).
   
   Added both shapes as tests in ee26643956f: 
`test_resolve_after_zip_base_side` (createDataFrame; parity override asserts 
the raise) and `test_resolve_after_zip_bare_base_side` (range; resolves 
everywhere, inherited with no override). Also noted the boundary in the PR 
description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55886][PYTHON][TESTS][FOLLOWUP] Add tests for DataFrame.col resolution through DataFrame.zip [spark]

Reply via email to