Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20213#discussion_r160588603
--- Diff: python/pyspark/sql/session.py ---
@@ -459,21 +459,23 @@ def _convert_from_pandas(self, pdf, schema, timezone):
# TODO: handle nested timestamps, such as
ArrayType(TimestampType())?
if isinstance(field.dataType, TimestampType):
s =
_check_series_convert_timestamps_tz_local(pdf[field.name], timezone)
- if not copied and s is not pdf[field.name]:
- # Copy once if the series is modified to
prevent the original Pandas
- # DataFrame from being updated
- pdf = pdf.copy()
- copied = True
- pdf[field.name] = s
+ if s is not pdf[field.name]:
+ if not copied:
--- End diff --
Looks like it was separated for assigning `pdf[field.name] = s` only if `s
is not pdf[field.name]`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]