Github user avi8tr commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    This patch is not a solution for pyspark users because all of the ML stages 
in the pipeline are also not threadsafe in their creation due to this same 
wrapper.  Note that the wrapper does two separate things, enforces keywords 
only and passes the kwargs in an unsafe manner outside the call to the wrapped 
method.  We can fix this by simply omitting the wrapper's second (apparently 
unneeded) feature.  Another benefit of this omission is that wrapped functions 
do not need to be modified to use the wrapper (although the ML methods that 
have been already modified to depend upon the input_kwargs introduced by the 
defective wrapper must be switched back to using named arguments).  Note this 
also would fix the bug in Pipeline where the __init__ method's modifications to 
stages are lost.  To illustrate this approach to a fix using minimalist code 
similar to Pipeline:
    
    `from functools import wraps
    
    def keyword_only(func):
        """
        A decorator that forces keyword arguments in the wrapped method
        """
        @wraps(func)
        def wrapper(*args, **kwargs):
            if len(args) > 1:
                raise TypeError("Method %s forces keyword arguments." % 
func.__name__)
            return func(*args, **kwargs)
        return wrapper
    
    class Mytest:
    
        @keyword_only
        def __init__(self, stages=None):
            """
            __init__(self, stages=None)
            """
            self.setParams(stages=stages)
    
        @keyword_only
        def setParams(self, stages=None):
            """
            setParams(self, stages=None)
            Sets params for Pipeline.
            """
            if stages is None:
                stages = []
            return self._set(stages=stages)
    
        def _set(self,**kwargs):
            for key,value in kwargs.items():
                print ('kwargs contains ' + key + ": " + str(value))
    
    
    if __name__ == "__main__":
        print ()
        print ('zero arguments')
        baz = Mytest()
        print ()
        print ('initParams')
        foo = Mytest(stages='initParams')
        print ()
        print ('setParams')
        bar = Mytest()
        bar.setParams(stages='setParams')
        print ()
        print ('nonKeyword arguments')
        try:
            bar = Mytest('nokeywords')
        except Exception as e:
            print ('Exception: '+e.args[0])
            
        print ()
        print ('initParams with unexpected parameter')
        try:
            bat = Mytest(stages='initParams', unexpectedParameter='foo')
        except Exception as e:
            print ('Exception: '+e.args[0])
    `
    the output of which is:
    `zero arguments
    kwargs contains stages: []
    
    initParams
    kwargs contains stages: initParams
    
    setParams
    kwargs contains stages: []
    kwargs contains stages: setParams
    
    nonKeyword arguments
    Exception: Method __init__ forces keyword arguments.
    
    initParams with unexpected parameter
    Exception: __init__() got an unexpected keyword argument 
'unexpectedParameter'
    `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to