gaogaotiantian commented on code in PR #54017:
URL: https://github.com/apache/spark/pull/54017#discussion_r2734977778
##########
python/pyspark/pandas/data_type_ops/datetime_ops.py:
##########
@@ -128,6 +129,13 @@ def prepare(self, col: pd.Series) -> pd.Series:
"""Prepare column when from_pandas."""
return col
+ def restore(self, col: pd.Series) -> pd.Series:
Review Comment:
So `restore` is a method that would be called by
`InternalFrame.restore_index` - that's the function to convert
`pyspark.DataFrame` back to `pandas.DataFrame`. `InternalFrame` keeps a record
of what the pandas dtypes should be for each column, and try to restore the
types during conversion. However, I assume because we always use
`datetime64[ns]` for `TimestampType`, we never write the `restore` function for
`TimestampType` - they will always be converted to `datetime64[ns]`. With this
newly added method, it can be converted back to what it's supposed to be from
original pandas.
In practice, it would be something like
`ps.from_pandas(pd.some_df_or_series_or_index())` having the same column type
as the original pd df/series/index.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]