[GitHub] [spark] xinrong-databricks commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

GitBox Wed, 01 Sep 2021 02:15:44 -0700


xinrong-databricks commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r699575464




##########
File path: python/pyspark/pandas/typedef/typehints.py
##########
@@ -313,7 +315,9 @@ def pandas_on_spark_type(tpe: Union[str, type, Dtype]) -> 
Tuple[Dtype, types.Dat
     return dtype, spark_type
 
 
-def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> 
types.DataType:
+def infer_pd_series_spark_type(
+    pser: pd.Series, dtype: Dtype, prefer_timestamp_ntz: bool = False
+) -> types.DataType:
     """Infer Spark DataType from pandas Series dtype.
 
     :param pser: :class:`pandas.Series` to be inferred

Review comment:
       nit: Shall we add docstring for the `prefer_timestamp_ntz` parameter?

##########
File path: python/pyspark/pandas/groupby.py
##########
@@ -1439,8 +1439,15 @@ def _make_pandas_df_builder_func(
         the same pandas DataFrame as if the pandas-on-Spark DataFrame is 
collected to driver side.
         The index, column labels, etc. are re-constructed within the function.
         """
+        from pyspark.pandas.utils import default_session
+
         arguments_for_restore_index = 
psdf._internal.arguments_for_restore_index
 
+        prefer_timestamp_ntz = (

Review comment:
       I am wondering if we may save it as a global variable, then we may reuse 
it 
https://github.com/apache/spark/pull/33877/files#diff-fac4c35e2182657dfceedcaa20fd78963573ad34f08fd597067652e66dad53eeR1455-R1457
 as well.
   
   The current approach looks good enough though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Reply via email to