xinrong-databricks commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r699575464
##########
File path: python/pyspark/pandas/typedef/typehints.py
##########
@@ -313,7 +315,9 @@ def pandas_on_spark_type(tpe: Union[str, type, Dtype]) ->
Tuple[Dtype, types.Dat
return dtype, spark_type
-def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) ->
types.DataType:
+def infer_pd_series_spark_type(
+ pser: pd.Series, dtype: Dtype, prefer_timestamp_ntz: bool = False
+) -> types.DataType:
"""Infer Spark DataType from pandas Series dtype.
:param pser: :class:`pandas.Series` to be inferred
Review comment:
nit: Shall we add docstring for the `prefer_timestamp_ntz` parameter?
##########
File path: python/pyspark/pandas/groupby.py
##########
@@ -1439,8 +1439,15 @@ def _make_pandas_df_builder_func(
the same pandas DataFrame as if the pandas-on-Spark DataFrame is
collected to driver side.
The index, column labels, etc. are re-constructed within the function.
"""
+ from pyspark.pandas.utils import default_session
+
arguments_for_restore_index =
psdf._internal.arguments_for_restore_index
+ prefer_timestamp_ntz = (
Review comment:
I am wondering if we may save it as a global variable, then we may reuse
it
https://github.com/apache/spark/pull/33877/files#diff-fac4c35e2182657dfceedcaa20fd78963573ad34f08fd597067652e66dad53eeR1455-R1457
as well.
The current approach looks good enough though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]