[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

HyukjinKwon Tue, 06 Feb 2018 05:40:08 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166300631
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, 
self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    Yea, I think that summarises the issue 
    
    > Can we use an invalid value as the default value for value? Then we can 
throw exception if the value is not set by user.
    
    Yea, we could define a class / instance to indeicate no value like NumPy 
does -
     https://github.com/numpy/numpy/blob/master/numpy/_globals.py#L76 . I was 
thinking resembling this way too but this is kind of a new approach to Spark 
and this is a single case so far.
    
    To get to the point, yea, we could maybe use an invalid value and unset if 
`to_replace` is a dictionary. For example, I can assign `{}`. But then the 
problem is docstring by pydoc and API documentation. It will show something 
like:
    
    ```
    Help on method replace in module pyspark.sql.dataframe:
    
    replace(self, to_replace, value={}, subset=None) method of 
pyspark.sql.dataframe.DataFrame instance
        Returns a new :class:`DataFrame` replacing a value with another value.
    ...
    ```
    
    This is pretty confusing. Up to my knowledge, we can't really override this 
signature - I tried few times before, and I failed if I remember this correctly.
    
    Maybe, this is good enough but I didn't want to start it by such because 
the issue @rxin raised sounds like because it has a default value, to be more 
strictly. 
    
    To be honest, seems Pandas's `replace` also has `None` for default value -
     
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html#pandas.DataFrame.replace.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Reply via email to