[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

holdenk Thu, 28 Dec 2017 18:50:00 -0800

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20058#discussion_r159020468
  
    --- Diff: python/pyspark/ml/base.py ---
    @@ -47,6 +86,28 @@ def _fit(self, dataset):
             """
             raise NotImplementedError()
     
    +    @since("2.3.0")
    +    def fitMultiple(self, dataset, params):
    +        """
    +        Fits a model to the input dataset for each param map in params.
    +
    +        :param dataset: input dataset, which is an instance of 
:py:class:`pyspark.sql.DataFrame`.
    +        :param params: A Sequence of param maps.
    +        :return: A thread safe iterable which contains one model for each 
param map. Each
    +                 call to `next(modelIterator)` will return `(index, 
model)` where model was fit
    +                 using `params[index]`. Params maps may be fit in an order 
different than their
    +                 order in params.
    +
    +        .. note:: DeveloperApi
    +        .. note:: Experimental
    +        """
    +        estimator = self.copy()
    +
    +        def fitSingleModel(index):
    +            return estimator.fit(dataset, params[index])
    +
    +        return FitMultipleIterator(fitSingleModel, len(params))
    --- End diff --
    
    So whats the benefit of `FitMultipleIterator` v.s. using `imap_unordered`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

Reply via email to