DinhLiu opened a new pull request, #56066: URL: https://github.com/apache/spark/pull/56066
### What changes were proposed in this pull request? This PR implements support for Pandas Extension Properties (`_constructor`, `_constructor_sliced`, and `_constructor_expanddim`) in the Pandas API on Spark for both `DataFrame` and `Series`. Specifically, it replaces the hardcoded `DataFrame(...)` and `Series(...)` instantiations inside standard operations (such as `head()`, `_apply_series_op`, `to_frame`, etc.) with `self._constructor(...)` or its dimensionality-aware counterparts. ### Why are the changes needed? Original pandas supports extension properties like `_constructor` that can be used to easily override what datatype is returned by default when downstream libraries inherit from pandas classes. Prior to this PR, subclassing a PySpark Pandas `DataFrame` or `Series` would break the inheritance chain during standard operations, as the methods would return the base PySpark classes instead of the subclassed ones. This change achieves better parity with standard pandas and allows developers to safely extend PySpark Pandas objects. ### Does this PR introduce _any_ user-facing change? No for standard end-users. Yes for developers extending the API: Developers can now subclass `pyspark.pandas.DataFrame` and `pyspark.pandas.Series` and retain their custom types after applying transformations. ### How was this patch tested? Added a new unit test `test_extension_properties` in `python/pyspark/pandas/tests/test_extension.py` to verify that operations on subclassed `DataFrame` and `Series` correctly return instances of the subclasses. Tested locally via: `python/run-tests --testnames 'pyspark.pandas.tests.test_extension'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Google Gemini 3.1 Pro -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
