Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159020468
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +86,28 @@ def _fit(self, dataset):
"""
raise NotImplementedError()
+ @since("2.3.0")
+ def fitMultiple(self, dataset, params):
+ """
+ Fits a model to the input dataset for each param map in params.
+
+ :param dataset: input dataset, which is an instance of
:py:class:`pyspark.sql.DataFrame`.
+ :param params: A Sequence of param maps.
+ :return: A thread safe iterable which contains one model for each
param map. Each
+ call to `next(modelIterator)` will return `(index,
model)` where model was fit
+ using `params[index]`. Params maps may be fit in an order
different than their
+ order in params.
+
+ .. note:: DeveloperApi
+ .. note:: Experimental
+ """
+ estimator = self.copy()
+
+ def fitSingleModel(index):
+ return estimator.fit(dataset, params[index])
+
+ return FitMultipleIterator(fitSingleModel, len(params))
--- End diff --
So whats the benefit of `FitMultipleIterator` v.s. using `imap_unordered`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]