I also had some issues with Pickle in the past and have to admit that I 
actually don't trust pickle files ;). Maybe, I am too paranoid, but I am always 
afraid of corrupting or losing the data.
Probably not the most elegant solution, but I typically store estimator 
settings and model parameters as JSON files (since they are human readable in 
the worst case scenario having "reproducible research" in mind ;)).


For example:


# Model fitting and saving params to JSON

from sklearn.linear_model import LinearRegression 
from sklearn.datasets import load_diabetes

diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
regr = LinearRegression()
regr.fit(X, y)

import json

with open('./params.json', 'w', encoding='utf-8') as outfile:
    json.dump(regr.get_params(), outfile)
    
with open('./weights.json', 'w', encoding='utf-8') as outfile:    
    json.dump(regr.coef_.tolist(), outfile, separators=(',', ':'), 
sort_keys=True, indent=4)
    
with open('./intercept.json', 'w', encoding='utf-8') as outfile:    
    json.dump(regr.intercept_, outfile)  


# In a new session: load the params from the JSON files


import json
import codecs 
from sklearn.linear_model import LinearRegression 
from sklearn.datasets import load_diabetes
import numpy as np

diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

obj_text = codecs.open('./params.json', 'r', encoding='utf-8').read()
params = json.loads(obj_text)

obj_text = codecs.open('./weights.json', 'r', encoding='utf-8').read()
weights = json.loads(obj_text)

obj_text = codecs.open('./intercept.json', 'r', encoding='utf-8').read()
intercept = json.loads(obj_text)

regr = LinearRegression()
regr.set_params(**params)
regr.intercept_, regr.coef_ = intercept, np.array(weights)

regr.predict(X[:10])

array([ 206.11706979,   68.07234761,  176.88406035,  166.91796559,
        128.45984241,  106.34908972,   73.89417947,  118.85378669,
        158.81033076,  213.58408893])


In any case, I know that this isn't pretty, and I would also be looking forward 
to a better solution!

Best,
Sebastian Raschka


> On Mar 23, 2016, at 12:47 PM, Keith Lehman <kleh...@intercapenergy.com> wrote:
> 
> Hi:
>  
> I’m fairly new to scikit-learn, python, and machine learning. This community 
> has built a great set of libraries though, and is actually a large part of 
> the reason why my company has selected python to experiment with ML.
>  
> As we are developing our product, however, we keep running into trouble 
> saving various objects. When possible, we use pickle to save the objects, but 
> this can cause problems in development – objects saved during a debug session 
> can not be loaded outside of the debugger. The reason appears to be because 
> even when pickling a “pickleable” object (such as a trained 
> LinearRegression), pickle finds and saves more primitive objects that have 
> been instantiated within the debug environment. Dill and cpickle have the 
> same issue. My question is, does the scikit-learn community plan to add 
> standard load/save or dump/dumps and load/loads methods that would not create 
> these dependencies?
>  
> If there is a better forum for posting questions like these, please let me 
> know and I’ll be happy to post there instead.
>  
> Thanks! 
>  
> Keith Lehman
> Cell: 617-834-2863
> Skype: k.lehman
> e-mail: kleh...@intercapenergy.com
>  
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to