Re: [Scikit-learn-general] Generalised warm start / parameter search

Andreas Mueller Mon, 20 May 2013 06:04:43 -0700

On 05/20/2013 02:46 PM, Joel Nothman wrote:


I agree. My approach doesn't necessarily exclude this working if one of:
* sorting parameters in descending order is sufficient;

That is estimator dependent.

* we extend the role of _plan_refits to being one of preparation, sothe estimator may set some state like a search range (which would needto be copied in clone());* we extend _plan_refits to allowing it to return the parametersettings modified (though this may make implementing the Pipelineversion harder); or* we extend _plan_refits allowing it to return some additionalinformation to be passed to fit/refit (this will definitely makeimplementing the Pipeline version harder).
    Basically your proposal addresses cases where one doesn't need to
    touch parts of the pipeline at all.
    It wouldn't help us get rid of any of the CV objects, though.
It also helps get rid of anything that may warm start from a previoussolution...
    Is there something interesting about StandardScaler, or have you
    thrown it in for fun? or for an example where transform is more
    expensive than fit?
    Just for fun ;) Basically I thought that was one that you don't
    really need to refit at all (for a given fold) as you usually
    don't search over any parameters.


Not refitting at all is easy. Not transforming at all is left till later.
So, let's take something like your proposal, but instead of havinglists of values for each parameter (which assumes a grid), we havelists of parameter settings. So we have a method on each estimatorsuch as:
def iter_fits(self, param_iter, X, y=None):
    """Generate models for each of the given parameter settings
    """

A default implementation would be an expansion of:

    param_iter, costs = self._plan_refits(param_iter)
    for params in param_iter:
        yield params, self.refit(X, y, params)

(It similarly needs a fit_transform variant.)
Note the generator references the parameters (or it could just be theindex into the parameters) as well as the model, so that they may bereordered; and it generally would yield self as the second argument.By yielding from the generator, we have full access to the model andits predicting functions.
I like the look of this better, though it means there's no option forcleverness about multiprocessing. And the recursive execution of aPipeline would be somewhat neater and not require memoizing for transform.


What do you mean with "cleverness about multiprocessing"?

Somewhere a decision has to be made which computations should beparallelized and which should be serial.The splitting into folds should be in GridSearchCV. So I don't entirelysee how this would work.

Basically GridSearchCV would need to query the estimator to know whichparameters should be searched over serially and in which order,

so it can do the dispatching.

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Generalised warm start / parameter search

Reply via email to