Joseph K. Bradley created SPARK-5844:
----------------------------------------

             Summary: Optimize Pipeline.fit for ParamGrid
                 Key: SPARK-5844
                 URL: https://issues.apache.org/jira/browse/SPARK-5844
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 1.3.0
            Reporter: Joseph K. Bradley


This issue was brought up by [~prudenko] in [this JIRA | 
https://issues.apache.org/jira/browse/SPARK-4766].

**Proposal**:
When Pipeline.fit is given an array of ParamMaps, it should operate 
incrementally:
* For each set of parameters applicable to the first PipelineStage,
** Fit/transform that stage using that set of parameters.
** For each set of parameters applicable to the second PipelineStage,
*** etc.

This is essentially a depth-first search on the parameters, where each 
node/level in the search tree is a PipelineStage and each node's child nodes 
correspond to the set of ParamMaps for that PipelineStage.

This will avoid recomputing intermediate RDDs during model search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to