You probably better to go with joblib.Memory which will take care about the checking. 

‎https://pythonhosted.org/joblib/memory.html

You can also check this PR

‎https://github.com/scikit-learn/scikit-learn/pull/7990

Guillaume Lemaitre 
INRIA Saclay Ile-de-France / Equipe PARIETAL
guillaume.lemai...@inria.fr - https://glemaitre.github.io/
From: Stuart Reynolds
Sent: Tuesday, 13 December 2016 19:29
To: scikit-learn@python.org
Reply To: Scikit-learn user and developer mailing list
Subject: [scikit-learn] Model checksums

I'd like to cache some functions to avoid rebuilding models like so:

    @cached
    def train(model, dataparams): ...


model is an (untrained) scikit-learn object and dataparams is a dict.
The @cached annotation forms a SHA checksum out of the parameters of the function it annotates and returns the previously calculated function result if the parameters match.

The tricky part here is reliably generating a checksum from the parameters. Scikit uses Python's pickle (http://scikit-learn.org/stable/modules/model_persistence.html) but the pickle library is non-deterministic (same inputs to pickle.dumps yields differing output! -- *I know*).

So... any suggestions on how to generate checksums from models in python?

Thanks.
- Stuart



_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to