Re: [PR] [SPARK-55179][PYTHON][CONNECT] Skip eager column name validation in `df.col_name` [spark]

via GitHub Mon, 26 Jan 2026 01:18:47 -0800


zhengruifeng commented on code in PR #53059:
URL: https://github.com/apache/spark/pull/53059#discussion_r2726808232



##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -169,6 +169,12 @@ def __reduce__(self) -> Tuple:
             },
         )
 
+    def __getstate__(self) -> Any:
+        return self.__dict__.copy()
+
+    def __setstate__(self, state: Any) -> None:

Review Comment:
   this change is needed, otherwise the SERDES of connect dataframe will fail 
after the change in `__getattr__`.
   
   We already have `__reduce__` in this class, I am not very sure how it works 
with `__getstate__`  and `__setstate__`
   
   The official doc said
   
   > Although powerful, implementing 
[__reduce__()](https://docs.python.org/3/library/pickle.html#object.__reduce__) 
directly in your classes is error prone. For this reason, class designers 
should use the high-level interface (i.e., 
[__getnewargs_ex__()](https://docs.python.org/3/library/pickle.html#object.__getnewargs_ex__),
 
[__getstate__()](https://docs.python.org/3/library/pickle.html#object.__getstate__)
 and 
[__setstate__()](https://docs.python.org/3/library/pickle.html#object.__setstate__))
 whenever possible.
   
   I did have a try to remove the `__reduce__`, but cannot make it works.
   
   cc @gaogaotiantian could you please help review this PR? thanks
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55179][PYTHON][CONNECT] Skip eager column name validation in `df.col_name` [spark]

Reply via email to