On Tue, Jan 31, 2012 at 10:57:50PM +0100, Andreas wrote: > That might be a stupid question, but what are the kinds of things that > break the models? > I imagine it would be things like renaming and removing attributes. What > else is there? > > Having code that says "Attribute A was called B in the last version and > C in the one before that" > seems not desirable to me. > > How about providing extra functions that convert old models to new > models if incompatible changes > where introduced? > > One more question: > Did I understand correctly that the main reason for not using pickles > would be > to be interoperable with 3rd party software? > Or are there other reasons not to use pickle for storing models?
<rant> There are a lot of reasons not to use pickles, among them that - The implementations in the standard library are, in my experience, horribly buggy and in many cases really stupidly written. The exceptions that are deliberately raised rarely tell you anything useful about what the problem is, and I used to routinely get exceptions from cPickle that told me nothing other than some flag had been set but PyExc_Whatever hadn't been called properly. Another thing: it uses recursion to traverse the object graph but, with Python's default recursion limit of 1000, even medium-sized object graphs can break it. This isn't really a problem in scikit-learn's case, but it's indicative of a wider problem that the developers of pickle were either asleep at the wheel or content with their module being nothing more than a toy. - Unpickling from disk will use 2X the memory, as the pickled representation is first loaded into memory and then the pseudolanguage is executed. Joblib's wrappers solve this problem for the case of huge ndarrays, but once again: toy. - They break when your code changes and recovering from such breakages is a pain. Can be remedied with __setstate__ hackery but really, not worth the trouble. - They pickle format is actually a code language that is executed on a VM, and is insecure and vulnerable to maliciously crafted code. They warn you about this in the documentation, but again: toy. TL;DR unless you're doing trivial stuff from a data size/API stability/security/everything perspective, pickle is next to worthless. ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
