Re: [Scikit-learn-general] Persisting models

2015-09-01 Thread Gael Varoquaux
On Mon, Aug 24, 2015 at 06:02:19PM -0400, Andreas Mueller wrote:
> I think the real solution is to provide backward-compatible ``__getattr__`` 
> and
> ``__setattr_``.

It's a lot of work to support this and do QA. I am not sure we want to
add this to our plate.

I would personnally rather support PMML I/O, as it has greater value and
is probably on the same order of complexity.

Anyhow, all this is for after 1.0.

G

> Theano seems able to do that (at least that is what I was told).
> It is unclear weather we want to do this. If we want to do this, we probably
> only want it post 1.0

> On 08/19/2015 02:35 AM, Joel Nothman wrote:

> Frequently the suggestion of supporting PMML or similar is raised, but 
> it's
> not clear whether such models would be importable in to scikit-learn, or
> how to translate scikit-learn transformation pipelines into its notation
> without going mad, etc. Still, even a library of exporters for individual
> components would be welcome, IMO, if someone wanted to construct it.

> On 19 August 2015 at 15:08, Sebastian Raschka  
> wrote:

> Oh wow, thanks for the link, I just skimmed over the code, but this is
> an interesting idea snd looks like the sort of thing that would make 
> my
> life easier in future. I will dig into that! That’s great, thanks!


> > On Aug 19, 2015, at 12:58 AM, Stefan van der Walt <
> stef...@berkeley.edu> wrote:

> > On 2015-08-18 21:37:41, Sebastian Raschka 
> > wrote:
> >> I think for “simple” linear models, it would be not a bad idea
> >> to save the weight coefficients in a log file or so. Here, I
> >> think that your model is really not that dependent on the
> >> changes in the scikit-learn code base (for example, imagine that
> >> you trained a model 10 years ago and published the results in a
> >> research paper, and today, someone asked you about this
> >> model). I mean, you know all about how a logistic regression,
> >> SVM etc. works, in the worst case you just use those weights to
> >> make the prediction on new data — I think in a typical “model
> >> persistence” case you don’t “update” your model anyways so
> >> “efficiency” would not be that big of a deal in a typical “worst
> >> case use case”.

> > Agreed—this is exactly the type of use case I want to support.
> > Pickling won't work here, but using HDF5 like MNE does would
> > probably be close to ideal (thanks to Chris Holdgraf for the
> > heads-up):

> > https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py

> > Stéfan


> 
> --
> > ___
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


> 
> --
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




> 
> --



> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



> --

> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


-- 
Gael Varoquaux
Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone:  ++ 33-1-69-08-79-68
http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-30 Thread Anders Aagaard
If you want an absolutely bullet proof way of doing it. Build and serialize
your model during docker build stage. It limits your hosting alternatives,
but it is guaranteed to work.

On Tue, Aug 25, 2015, 00:19 Stefan van der Walt stef...@berkeley.edu
wrote:


 On 2015-08-24 15:08:57, Andreas Mueller t3k...@gmail.com wrote:
   Agreed—this is exactly the type of use case I want to support.
   Pickling won't work here, but using HDF5 like MNE does would
   probably be close to ideal (thanks to Chris Holdgraf for the
   heads-up):
 
  I'm not sure how this solves the issue, can you elaborate?  You
  still need to map the old data structure to the new code, right?

 Yes, I would need to map the old data structure to the new code.
 This may be trivial for some versions (code didn't change),
 slightly harder for other versions (added keywords, etc.) or
 impossible (new API / implementation).  But, given that the team
 works hard to keep the API stable, maintaining this matrix of
 conversions shouldn't be too hard.

 Stéfan


 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-24 Thread Sebastian Raschka
That’s true. Often, I create a separate venv for each project plus manifest. I 
also push everything to a private git repo (next to a couple of “regular” back 
up solutions) — I am really paranoid when it comes to back-ups and version 
control :P. 

 But if you didn't snapshot all libraries you are using, the code might 
 not run any more, or give different results ;)

I this make file approach is more useful in the context of running your 
pipeline in cases where software gets updated, e.g,. if there was a bug in a 
certain package you were using, and now you want to reproduce the results with 
this new version to see if your previous results were affected by this bug. Or 
just if a colleague/reviewer wants to reproduce your stuff ;)

 On Aug 24, 2015, at 5:59 PM, Andreas Mueller t3k...@gmail.com wrote:
 
 
 
 On 08/19/2015 12:37 AM, Sebastian Raschka wrote:
  if the unpickling failed,
 what would you do?
 One lesson “scientific research” taught me is to store the code and dataset 
 along with a “make” file under version control (git):). I would just run my 
 make file to re-construct the model and pickle the objects.
 
 But if you didn't snapshot all libraries you are using, the code might 
 not run any more, or give different results ;)
 
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-24 Thread Andreas Mueller
Agreed—this is exactly the type of use case I want to support.
Pickling won't work here, but using HDF5 like MNE does would
probably be close to ideal (thanks to Chris Holdgraf for the
heads-up):


I'm not sure how this solves the issue, can you elaborate?
You still need to map the old data structure to the new code, right?

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-24 Thread Andreas Mueller


On 08/19/2015 12:37 AM, Sebastian Raschka wrote:
   if the unpickling failed,
 what would you do?
 One lesson “scientific research” taught me is to store the code and dataset 
 along with a “make” file under version control (git):). I would just run my 
 make file to re-construct the model and pickle the objects.

But if you didn't snapshot all libraries you are using, the code might 
not run any more, or give different results ;)

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-24 Thread Andreas Mueller
I think the real solution is to provide backward-compatible 
``__getattr__`` and ``__setattr_``.

Theano seems able to do that (at least that is what I was told).
It is unclear weather we want to do this. If we want to do this, we 
probably only want it post 1.0


On 08/19/2015 02:35 AM, Joel Nothman wrote:
Frequently the suggestion of supporting PMML or similar is raised, but 
it's not clear whether such models would be importable in to 
scikit-learn, or how to translate scikit-learn transformation 
pipelines into its notation without going mad, etc. Still, even a 
library of exporters for individual components would be welcome, IMO, 
if someone wanted to construct it.


On 19 August 2015 at 15:08, Sebastian Raschka se.rasc...@gmail.com 
mailto:se.rasc...@gmail.com wrote:


Oh wow, thanks for the link, I just skimmed over the code, but
this is an interesting idea snd looks like the sort of thing that
would make my life easier in future. I will dig into that! That’s
great, thanks!


 On Aug 19, 2015, at 12:58 AM, Stefan van der Walt
stef...@berkeley.edu mailto:stef...@berkeley.edu wrote:

 On 2015-08-18 21:37:41, Sebastian Raschka se.rasc...@gmail.com
mailto:se.rasc...@gmail.com
 wrote:
 I think for “simple” linear models, it would be not a bad idea
 to save the weight coefficients in a log file or so. Here, I
 think that your model is really not that dependent on the
 changes in the scikit-learn code base (for example, imagine that
 you trained a model 10 years ago and published the results in a
 research paper, and today, someone asked you about this
 model). I mean, you know all about how a logistic regression,
 SVM etc. works, in the worst case you just use those weights to
 make the prediction on new data — I think in a typical “model
 persistence” case you don’t “update” your model anyways so
 “efficiency” would not be that big of a deal in a typical “worst
 case use case”.

 Agreed—this is exactly the type of use case I want to support.
 Pickling won't work here, but using HDF5 like MNE does would
 probably be close to ideal (thanks to Chris Holdgraf for the
 heads-up):

 https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py

 Stéfan



--
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
mailto:Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
mailto:Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--


___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-24 Thread Stefan van der Walt

On 2015-08-24 15:08:57, Andreas Mueller t3k...@gmail.com wrote:
  Agreed—this is exactly the type of use case I want to support. 
  Pickling won't work here, but using HDF5 like MNE does would 
  probably be close to ideal (thanks to Chris Holdgraf for the 
  heads-up):

 I'm not sure how this solves the issue, can you elaborate?  You 
 still need to map the old data structure to the new code, right?

Yes, I would need to map the old data structure to the new code. 
This may be trivial for some versions (code didn't change), 
slightly harder for other versions (added keywords, etc.) or 
impossible (new API / implementation).  But, given that the team 
works hard to keep the API stable, maintaining this matrix of 
conversions shouldn't be too hard.

Stéfan

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-20 Thread Joel Nothman
I suspect supporting PMML import is a separate and low-priority project.
Higher priority is support for transformers (in pipelines / feature
unions), other predictors, and tests that verify the model against an
existing PMML predictor.

On 21 August 2015 at 01:37, Dale Smith dsm...@nexidia.com wrote:

 Package sklearn_pmml appeared on github:

 https://github.com/alex-pirozhenko/sklearn-pmml

 It's still in the early stages. I have yet to experiment with it, and I
 don't think it supports pmml import.

 Dale Smith, Ph.D.
 Data Scientist
 ​


 d. 404.495.7220 x 4008   f. 404.795.7221
 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta,
 GA 30305




 -Original Message-
 From: Alexandre Gramfort [mailto:alexandre.gramf...@m4x.org]
 Sent: Thursday, August 20, 2015 4:28 AM
 To: scikit-learn-general
 Subject: Re: [Scikit-learn-general] Persisting models

 hi,

  Agreed—this is exactly the type of use case I want to support.
  Pickling won't work here, but using HDF5 like MNE does would probably
  be close to ideal (thanks to Chris Holdgraf for the
  heads-up):
 
  https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py

 For your info Eric Larson has put the file in a separate project to make
 it easier to improve and reuse.

 https://github.com/h5io/h5io

 Alex


 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-20 Thread Alexandre Gramfort
hi,

 Agreed—this is exactly the type of use case I want to support.
 Pickling won't work here, but using HDF5 like MNE does would
 probably be close to ideal (thanks to Chris Holdgraf for the
 heads-up):

 https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py

For your info Eric Larson has put the file in a separate project to make it
easier to improve and reuse.

https://github.com/h5io/h5io

Alex

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-19 Thread Joel Nothman
Frequently the suggestion of supporting PMML or similar is raised, but it's
not clear whether such models would be importable in to scikit-learn, or
how to translate scikit-learn transformation pipelines into its notation
without going mad, etc. Still, even a library of exporters for individual
components would be welcome, IMO, if someone wanted to construct it.

On 19 August 2015 at 15:08, Sebastian Raschka se.rasc...@gmail.com wrote:

 Oh wow, thanks for the link, I just skimmed over the code, but this is an
 interesting idea snd looks like the sort of thing that would make my life
 easier in future. I will dig into that! That’s great, thanks!


  On Aug 19, 2015, at 12:58 AM, Stefan van der Walt stef...@berkeley.edu
 wrote:
 
  On 2015-08-18 21:37:41, Sebastian Raschka se.rasc...@gmail.com
  wrote:
  I think for “simple” linear models, it would be not a bad idea
  to save the weight coefficients in a log file or so. Here, I
  think that your model is really not that dependent on the
  changes in the scikit-learn code base (for example, imagine that
  you trained a model 10 years ago and published the results in a
  research paper, and today, someone asked you about this
  model). I mean, you know all about how a logistic regression,
  SVM etc. works, in the worst case you just use those weights to
  make the prediction on new data — I think in a typical “model
  persistence” case you don’t “update” your model anyways so
  “efficiency” would not be that big of a deal in a typical “worst
  case use case”.
 
  Agreed—this is exactly the type of use case I want to support.
  Pickling won't work here, but using HDF5 like MNE does would
  probably be close to ideal (thanks to Chris Holdgraf for the
  heads-up):
 
  https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py
 
  Stéfan
 
 
 --
  ___
  Scikit-learn-general mailing list
  Scikit-learn-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-19 Thread Joel Nothman
See https://github.com/scikit-learn/scikit-learn/issues/1596

On 19 August 2015 at 16:35, Joel Nothman joel.noth...@gmail.com wrote:

 Frequently the suggestion of supporting PMML or similar is raised, but
 it's not clear whether such models would be importable in to scikit-learn,
 or how to translate scikit-learn transformation pipelines into its notation
 without going mad, etc. Still, even a library of exporters for individual
 components would be welcome, IMO, if someone wanted to construct it.

 On 19 August 2015 at 15:08, Sebastian Raschka se.rasc...@gmail.com
 wrote:

 Oh wow, thanks for the link, I just skimmed over the code, but this is an
 interesting idea snd looks like the sort of thing that would make my life
 easier in future. I will dig into that! That’s great, thanks!


  On Aug 19, 2015, at 12:58 AM, Stefan van der Walt stef...@berkeley.edu
 wrote:
 
  On 2015-08-18 21:37:41, Sebastian Raschka se.rasc...@gmail.com
  wrote:
  I think for “simple” linear models, it would be not a bad idea
  to save the weight coefficients in a log file or so. Here, I
  think that your model is really not that dependent on the
  changes in the scikit-learn code base (for example, imagine that
  you trained a model 10 years ago and published the results in a
  research paper, and today, someone asked you about this
  model). I mean, you know all about how a logistic regression,
  SVM etc. works, in the worst case you just use those weights to
  make the prediction on new data — I think in a typical “model
  persistence” case you don’t “update” your model anyways so
  “efficiency” would not be that big of a deal in a typical “worst
  case use case”.
 
  Agreed—this is exactly the type of use case I want to support.
  Pickling won't work here, but using HDF5 like MNE does would
  probably be close to ideal (thanks to Chris Holdgraf for the
  heads-up):
 
  https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py
 
  Stéfan
 
 
 --
  ___
  Scikit-learn-general mailing list
  Scikit-learn-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Stefan van der Walt
On 2015-08-18 20:26:15, JAGANADH GOPINADHAN jagana...@gmail.com 
wrote:
 Use joblib or pickle to achieve this

Any method that relies on pickle is broken when you upgrade 
scikit-learn or move to a different system.  I would like to 
persist models for as long as possible; thus having to deal with 
pickling errors, which don't allow you to easily extract 
coefficients in the case when the model cannot be loaded, is not 
an option.

Stéfan

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Stefan van der Walt
On 2015-08-18 21:37:41, Sebastian Raschka se.rasc...@gmail.com 
wrote:
 I think for “simple” linear models, it would be not a bad idea 
 to save the weight coefficients in a log file or so. Here, I 
 think that your model is really not that dependent on the 
 changes in the scikit-learn code base (for example, imagine that 
 you trained a model 10 years ago and published the results in a 
 research paper, and today, someone asked you about this 
 model). I mean, you know all about how a logistic regression, 
 SVM etc. works, in the worst case you just use those weights to 
 make the prediction on new data — I think in a typical “model 
 persistence” case you don’t “update” your model anyways so 
 “efficiency” would not be that big of a deal in a typical “worst 
 case use case”.

Agreed—this is exactly the type of use case I want to support. 
Pickling won't work here, but using HDF5 like MNE does would 
probably be close to ideal (thanks to Chris Holdgraf for the 
heads-up):

https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py

Stéfan

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Sebastian Raschka
Stefan, I have no experience with this problem in particular since I am not 
pickling objects that often. However, I deployed a webapp some time ago on 
Pythonanywhere (http://raschkas.pythonanywhere.com 
http://raschkas.pythonanywhere.com/) and meanwhile they upgraded their 
scikit-learn module; I was curious and just checked it out: it seems that it 
still works.
Wich pickle protocol are you using?

 On Aug 18, 2015, at 11:38 PM, Stefan van der Walt stef...@berkeley.edu 
 wrote:
 
 On 2015-08-18 20:26:15, JAGANADH GOPINADHAN jagana...@gmail.com 
 wrote:
 Use joblib or pickle to achieve this
 
 Any method that relies on pickle is broken when you upgrade 
 scikit-learn or move to a different system.  I would like to 
 persist models for as long as possible; thus having to deal with 
 pickling errors, which don't allow you to easily extract 
 coefficients in the case when the model cannot be loaded, is not 
 an option.
 
 Stéfan
 
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Stefan van der Walt
Hi Sebastian

On 2015-08-18 20:47:12, Sebastian Raschka se.rasc...@gmail.com 
wrote:
 Stefan, I have no experience with this problem in particular 
 since I am not pickling objects that often. However, I deployed 
 a webapp some time ago on Pythonanywhere 
 (http://raschkas.pythonanywhere.com 
 http://raschkas.pythonanywhere.com/) and meanwhile they 
 upgraded their scikit-learn module; I was curious and just 
 checked it out: it seems that it still works.

It would depend a lot on whether and how the underlying class code 
got modified.  One question to ask is: if the unpickling failed, 
what would you do?

I imagine that the ideal way of storing each model (coefficients, 
parameters, etc.) would differ from model to model.  But having 
that knowledge stored in the scikit-learn code base would, from a 
user standpoint at least, be better than trying to maintain it 
outside.

It's a non-trivial problem, since you'll have to track any changes 
in API carefully and somehow determine which versions are 
compatible with which (or can be made compatible with a few basic 
assumptions w.r.t. parameters etc.).

Stéfan

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Sebastian Raschka
  if the unpickling failed, 
 what would you do?

One lesson “scientific research” taught me is to store the code and dataset 
along with a “make” file under version control (git) :). I would just run my 
make file to re-construct the model and pickle the objects.

 I imagine that the ideal way of storing each model (coefficients, 
 parameters, etc.) would differ from model to model.  But having 
 that knowledge stored in the scikit-learn code base would, from a 
 user standpoint at least, be better than trying to maintain it 
 outside.

I think for “simple” linear models, it would be not a bad idea to save the 
weight coefficients in a log file or so. Here, I think that your model is 
really not that dependent on the changes in the scikit-learn code base (for 
example, imagine that you trained a model 10 years ago and published the 
results in a research paper, and today, someone asked you about this model). I 
mean, you know all about how a logistic regression, SVM etc. works, in the 
worst case you just use those weights to make the prediction on new data — I 
think in a typical “model persistence” case you don’t “update” your model 
anyways so “efficiency” would not be that big of a deal in a typical “worst 
case use case”.


 On Aug 19, 2015, at 12:16 AM, Stefan van der Walt stef...@berkeley.edu 
 wrote:
 
 Hi Sebastian
 
 On 2015-08-18 20:47:12, Sebastian Raschka se.rasc...@gmail.com 
 wrote:
 Stefan, I have no experience with this problem in particular 
 since I am not pickling objects that often. However, I deployed 
 a webapp some time ago on Pythonanywhere 
 (http://raschkas.pythonanywhere.com 
 http://raschkas.pythonanywhere.com/) and meanwhile they 
 upgraded their scikit-learn module; I was curious and just 
 checked it out: it seems that it still works.
 
 It would depend a lot on whether and how the underlying class code 
 got modified.  One question to ask is: if the unpickling failed, 
 what would you do?
 
 I imagine that the ideal way of storing each model (coefficients, 
 parameters, etc.) would differ from model to model.  But having 
 that knowledge stored in the scikit-learn code base would, from a 
 user standpoint at least, be better than trying to maintain it 
 outside.
 
 It's a non-trivial problem, since you'll have to track any changes 
 in API carefully and somehow determine which versions are 
 compatible with which (or can be made compatible with a few basic 
 assumptions w.r.t. parameters etc.).
 
 Stéfan
 
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] Persisting models

2015-08-18 Thread Stefan van der Walt
Hi all,

What, currently, is the recommended way for storing trained models 
to disk for later use?

Regards
Stéfan

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Sebastian Raschka
I would still say the best way to go is joblib (because of the NumPy arrays). 
But I would also be interested in better alternatives (if there are any?).

PS: One caveat: joblib didn’t work well for me in the past when I was deploying 
it on Apache servers for webapps on “cheap”, hosted server racks such as 
BlueHost or even Pythonanywhere. I am not sure why exactly this is, maybe the 
hardware architecture. Here I just used regular (c)Pickle and it worked.

Best,
Sebastian


 On Aug 18, 2015, at 6:18 PM, Stefan van der Walt stef...@berkeley.edu wrote:
 
 Hi all,
 
 What, currently, is the recommended way for storing trained models 
 to disk for later use?
 
 Regards
 Stéfan
 
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Persisting models

2015-08-18 Thread Sebastian Raschka
Oh wow, thanks for the link, I just skimmed over the code, but this is an 
interesting idea snd looks like the sort of thing that would make my life 
easier in future. I will dig into that! That’s great, thanks!


 On Aug 19, 2015, at 12:58 AM, Stefan van der Walt stef...@berkeley.edu 
 wrote:
 
 On 2015-08-18 21:37:41, Sebastian Raschka se.rasc...@gmail.com 
 wrote:
 I think for “simple” linear models, it would be not a bad idea 
 to save the weight coefficients in a log file or so. Here, I 
 think that your model is really not that dependent on the 
 changes in the scikit-learn code base (for example, imagine that 
 you trained a model 10 years ago and published the results in a 
 research paper, and today, someone asked you about this 
 model). I mean, you know all about how a logistic regression, 
 SVM etc. works, in the worst case you just use those weights to 
 make the prediction on new data — I think in a typical “model 
 persistence” case you don’t “update” your model anyways so 
 “efficiency” would not be that big of a deal in a typical “worst 
 case use case”.
 
 Agreed—this is exactly the type of use case I want to support. 
 Pickling won't work here, but using HDF5 like MNE does would 
 probably be close to ideal (thanks to Chris Holdgraf for the 
 heads-up):
 
 https://github.com/mne-tools/mne-python/blob/master/mne/_hdf5.py
 
 Stéfan
 
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general