Re: [scikit-learn] How to not recalculate transformer in a Pipeline?

Andreas Mueller Mon, 28 Nov 2016 08:42:11 -0800

Hey Anton.
Yes, that would be great to have.

There is no solution implemented in scikit-learn right now, but thereare at least two ways that I know of.

This (ancient and probably now defunct) pr:
https://github.com/scikit-learn/scikit-learn/pull/3951


And using dask:
http://matthewrocklin.com/blog/work/2016/07/12/dask-learn-part-1

Andy


On 11/28/2016 10:24 AM, Anton Suchaneck wrote:

Hello!
I use a 2-step Pipeline with an expensive transformer and aclassification afterwards. On this I do GridSearchCV of theclassifcation parameters.
Now, theoretically GridSearchCV could know that I'm not touching anyparameters of the transformer and avoid re-doing work by keeping thetransformed X, right?!
Currently, GridSearchCV will do a clean re-run of all Pipeline steps?
Can you recommend the easiest way for me to use GridSearchCV+Pipelinewhile avoiding recomputation of all transformer steps whose parametersare not in the GridSearch? I realize this may be tricky, but anypointers to realize this most conveniently and compatible with sklearnwould be highly appreciated!
(The scoring has to be done on the initial data, so I cannot justmanually transform beforehand.)
Regards,
Anton
PS: If that all makes sense, is that a useful feature to include insklearn?
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] How to not recalculate transformer in a Pipeline?

Reply via email to