You probably better to go with joblib.Memory which will take care about the checking. https://pythonhosted.org/joblib/memory.html You can also check this PR https://github.com/scikit-learn/scikit-learn/pull/7990 Guillaume Lemaitre INRIA Saclay Ile-de-France / Equipe PARIETAL guillaume.lemai...@inria.fr - https://glemaitre.github.io/
I'd like to cache some functions to avoid rebuilding models like so:
@cached def train(model, dataparams): ... model is an (untrained) scikit-learn object and dataparams is a dict. The @cached annotation forms a SHA checksum out of the parameters of the function it annotates and returns the previously calculated function result if the parameters match. The tricky part here is reliably generating a checksum from the parameters. Scikit uses Python's pickle (http://scikit-learn.org/stable/modules/model_persistence.html) but the pickle library is non-deterministic (same inputs to pickle.dumps yields differing output! -- *I know*). So... any suggestions on how to generate checksums from models in python? Thanks. - Stuart |
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn