Re: [scikit-learn] How to not recalculate transformer in a Pipeline?

Andreas Mueller Mon, 28 Nov 2016 10:48:06 -0800


On 11/28/2016 12:15 PM, Gael Varoquaux wrote:

Or would you cache the return of "fit" as well as "transform"?

Caching fit rather than transform. Fit is usually the costly step.

Caching "fit" with joblib seems non-trivial.

Why? Caching a function that takes the estimator and X and y should do
it. The transformer would clone the estimator on fit, to avoid
side-effects that would trigger recomputes.

I guess so. You'd handle parameters using an estimator_params dict in init
and pass that to the caching function?


It's a pattern that I use often, I've just never coded a good transformer
for it.

On my usecases, it works very well, provided that everything is nicely
seeded. Also, the persistence across sessions is a real time saver.

Yeah for sure :)
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] How to not recalculate transformer in a Pipeline?

Reply via email to