gaogaotiantian commented on code in PR #53059:
URL: https://github.com/apache/spark/pull/53059#discussion_r2733371433
##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1736,10 +1736,20 @@ def __getattr__(self, name: str) -> "Column":
errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED",
messageParameters={"attr_name": name}
)
- if name not in self.columns:
- raise PySparkAttributeError(
- errorClass="ATTRIBUTE_NOT_SUPPORTED",
messageParameters={"attr_name": name}
- )
+ # Only eagerly validate the column name when:
+ # 1, PYSPARK_VALIDATE_COLUMN_NAME_LEGACY is set 1; or
+ # 2, name starting with '__', because this is likely a python internal
method and
+ # an AttributeError might be expected to make hasattr(df, name) work.
Review Comment:
Let's be more accurate here. `hasattr` is not involved in this case. The
more common case would be
```python
getattr(df, name, None)
# or
df.name
```
and catch an exception.
In `pickle` the code is in C, but it's more equivalent to a direct `getattr`
then check for the exception. To make it less confusing let's just avoid
talking about `hasattr` and focus on `getattr` or `df.name`. (Below comment as
well)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]