Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r149886093
--- Diff: python/pyspark/serializers.py ---
@@ -213,7 +213,15 @@ def __repr__(self):
return "ArrowSerializer"
-def _create_batch(series):
+def _create_batch(series, copy=False):
--- End diff --
@ueshin this ended up having no effect, so I took it out. For the case of
Timestamps, the timezone conversions will make a copy regardless. For the case
of ints being promoted to floats then that means they will have null values and
need to call `fillna(0)` which makes a copy anyway. So it seems this only
makes copies when necessary.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]