grundprinzip commented on code in PR #39469:
URL: https://github.com/apache/spark/pull/39469#discussion_r1066570799


##########
python/pyspark/sql/connect/session.py:
##########
@@ -215,7 +215,38 @@ def createDataFrame(
         _inferred_schema: Optional[StructType] = None
 
         if isinstance(data, pd.DataFrame):
-            _table = pa.Table.from_pandas(data)
+            from pandas.api.types import (  # type: ignore[attr-defined]
+                is_datetime64_dtype,
+                is_datetime64tz_dtype,
+            )
+            from pyspark.sql.pandas.types import (
+                _check_series_convert_timestamps_internal,
+                _get_local_timezone,
+            )
+
+            # Copying the frame to avoid modifying it.
+            data_copy = data.copy()

Review Comment:
   I can modify this to only copy when necessary. The biggest issue is that i 
need to copy it to modify the pandas type before converting it to an arrow 
table. I think there is likely a better way of doing that that does not require 
touching the pandas dataframe. 
   
   The general question is, is it ok to modify the input dataframe or should it 
be immutable. 
   
   However, even in PySpark we modify the input dataframe when we localize the 
timezones. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to