Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r149760058
--- Diff: python/pyspark/serializers.py ---
@@ -213,7 +213,15 @@ def __repr__(self):
return "ArrowSerializer"
-def _create_batch(series):
+def _create_batch(series, copy=False):
--- End diff --
Right, I forgot that `fillna` returns a copy. Do you think it would be
worth it to first check for any nulls and only `fillna` if needed? The mask of
nulls is already created so just need to add a function like this in
`_create_batch`:
```
def fill_series_nulls(s, mask):
return s.fillna(0) if mask.any() else s
```
What do you think @ueshin ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]