Re: [PR] [SPARK-55179][PYTHON][CONNECT] Skip eager column name validation in `df.col_name` [spark]

via GitHub Tue, 27 Jan 2026 10:48:04 -0800


gaogaotiantian commented on code in PR #53059:
URL: https://github.com/apache/spark/pull/53059#discussion_r2733371433



##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1736,10 +1736,20 @@ def __getattr__(self, name: str) -> "Column":
                 errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", 
messageParameters={"attr_name": name}
             )
 
-        if name not in self.columns:
-            raise PySparkAttributeError(
-                errorClass="ATTRIBUTE_NOT_SUPPORTED", 
messageParameters={"attr_name": name}
-            )
+        # Only eagerly validate the column name when:
+        # 1, PYSPARK_VALIDATE_COLUMN_NAME_LEGACY is set 1; or
+        # 2, name starting with '__', because this is likely a python internal 
method and
+        # an AttributeError might be expected to make hasattr(df, name) work.

Review Comment:
   Let's be more accurate here. `hasattr` is not involved in this case. The 
more common case would be
   
   ```python
   getattr(df, name, None)
   # or
   df.name
   ```
   
   and catch an exception.
   
   In `pickle` the code is in C, but it's more equivalent to a direct `getattr` 
then check for the exception. To make it less confusing let's just avoid 
talking about `hasattr` and focus on `getattr` or `df.name`. (Below comment as 
well)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55179][PYTHON][CONNECT] Skip eager column name validation in `df.col_name` [spark]

Reply via email to