Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19459#discussion_r149760058
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -213,7 +213,15 @@ def __repr__(self):
             return "ArrowSerializer"
     
     
    -def _create_batch(series):
    +def _create_batch(series, copy=False):
    --- End diff --
    
    Right, I forgot that `fillna` returns a copy.  Do you think it would be 
worth it to first check for any nulls and only `fillna` if needed?  The mask of 
nulls is already created so just need to add a function like this in 
`_create_batch`:
    
    ```
    def fill_series_nulls(s, mask):
        return s.fillna(0) if mask.any() else s
    ```
    What do you think @ueshin ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to