On Tue, Jan 31, 2012 at 10:57:50PM +0100, Andreas wrote:
> That might be a stupid question, but what are the kinds of things that 
> break the models?
> I imagine it would be things like renaming and removing attributes. What 
> else is there?
> 
> Having code that says "Attribute A was called B in the last version and 
> C in the one before that"
> seems not desirable to me.
> 
> How about providing extra functions that convert old models to new 
> models if incompatible changes
> where introduced?
> 
> One more question:
> Did I understand correctly that the main reason for not using pickles 
> would be
> to be interoperable with 3rd party software?
> Or are there other reasons not to use pickle for storing models?

<rant>

There are a lot of reasons not to use pickles, among them that

 - The implementations in the standard library are, in my experience,
   horribly buggy and in many cases really stupidly written. The exceptions
   that are deliberately raised rarely tell you anything useful about what
   the problem is, and I used to routinely get exceptions from cPickle
   that told me nothing other than some flag had been set but PyExc_Whatever
   hadn't been called properly.

   Another thing: it uses recursion to traverse the object graph but, with
   Python's default recursion limit of 1000, even medium-sized object graphs can
   break it.  This isn't really a problem in scikit-learn's case, but it's
   indicative of a wider problem that the developers of pickle were either
   asleep at the wheel or content with their module being nothing more than a
   toy.

 - Unpickling from disk will use 2X the memory, as the pickled representation
   is first loaded into memory and then the pseudolanguage is executed.
   Joblib's wrappers solve this problem for the case of huge ndarrays, but
   once again: toy.

 - They break when your code changes and recovering from such breakages is
   a pain. Can be remedied with __setstate__ hackery but really, not worth
   the trouble.

 - They pickle format is actually a code language that is executed on a VM,
   and is insecure and vulnerable to maliciously crafted code. They warn
   you about this in the documentation, but again: toy.

TL;DR unless you're doing trivial stuff from a data size/API
stability/security/everything perspective, pickle is next to worthless.

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to