Re: [PR] [SPARK-54497][PYTHON] Apply `functools.lru_cache` in converter caching [spark]

via GitHub Tue, 25 Nov 2025 00:07:04 -0800


zhengruifeng commented on code in PR #53205:
URL: https://github.com/apache/spark/pull/53205#discussion_r2558914163



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -855,6 +856,7 @@ def _to_corrected_pandas_type(dt: DataType) -> 
Optional[Any]:
         return None
 
 
[email protected]_cache(maxsize=64)

Review Comment:
   The default size of `@lru_cache` is 
[128](https://docs.python.org/3/library/functools.html#functools.lru_cache)
   
   `@cache` seems also reasonable and acceptable, but I think 
`lru_cache(maxsize=64)` won't be a regression since:
   1, we don't have too many different data types;
   2, the number of input columns in a UDF is likely less than 64;



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54497][PYTHON] Apply `functools.lru_cache` in converter caching [spark]

Reply via email to