[scikit-learn] How to not recalculate transformer in a Pipeline?

Anton Suchaneck Mon, 28 Nov 2016 07:27:00 -0800

Hello!

I use a 2-step Pipeline with an expensive transformer and a classification
afterwards. On this I do GridSearchCV of the classifcation parameters.


Now, theoretically GridSearchCV could know that I'm not touching any
parameters of the transformer and avoid re-doing work by keeping the
transformed X, right?!
Currently, GridSearchCV will do a clean re-run of all Pipeline steps?

Can you recommend the easiest way for me to use GridSearchCV+Pipeline while
avoiding recomputation of all transformer steps whose parameters are not in
the GridSearch? I realize this may be tricky, but any pointers to realize
this most conveniently and compatible with sklearn would be highly
appreciated!

(The scoring has to be done on the initial data, so I cannot just manually
transform beforehand.)

Regards,
Anton

PS: If that all makes sense, is that a useful feature to include in sklearn?

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] How to not recalculate transformer in a Pipeline?

Reply via email to