Hey Anton.
Yes, that would be great to have.
There is no solution implemented in scikit-learn right now, but there
are at least two ways that I know of.
This (ancient and probably now defunct) pr:
https://github.com/scikit-learn/scikit-learn/pull/3951
And using dask:
http://matthewrocklin.com/blog/work/2016/07/12/dask-learn-part-1
Andy
On 11/28/2016 10:24 AM, Anton Suchaneck wrote:
Hello!
I use a 2-step Pipeline with an expensive transformer and a
classification afterwards. On this I do GridSearchCV of the
classifcation parameters.
Now, theoretically GridSearchCV could know that I'm not touching any
parameters of the transformer and avoid re-doing work by keeping the
transformed X, right?!
Currently, GridSearchCV will do a clean re-run of all Pipeline steps?
Can you recommend the easiest way for me to use GridSearchCV+Pipeline
while avoiding recomputation of all transformer steps whose parameters
are not in the GridSearch? I realize this may be tricky, but any
pointers to realize this most conveniently and compatible with sklearn
would be highly appreciated!
(The scoring has to be done on the initial data, so I cannot just
manually transform beforehand.)
Regards,
Anton
PS: If that all makes sense, is that a useful feature to include in
sklearn?
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn