On Wed, Oct 26, 2011 at 01:35:07PM +0100, Brian Holt wrote: > My question is; is there a way to improve the performance of loading > classifiers, either using different pickle options (of which I don't > know any, but there may be), or by using a different scheme > (marshalling sounds promising based on [1]), or any other way? > Perhaps I can implement a a pickle loader in cython?
I may sound as repeating myself, but I would _really_ like to stay away from such option. In my opinion, the best approach is to represent the tree as a small set of arrays, and use this representation to do the I/O. I/O is then going to be lightning fast. The problem is more general than I/O. The current representation is fundementally a C-based representation that is exposed in Python via a forest of objects. This is a very unefficient representation in Python. Any large manipulation on these objects will be inefficient. This will come up when you use these objects in parallel computing for instance. There are two ways down the road. Either you start writing everything in C (or Cython), or you use a Python-friendly representation: a small number of objects, most likely arrays, carrying all the information. This representation can be either a intermediate representation, or the core representation (arrays are easy to expose in C). I guess that I would have originally liked that we spent time on having an array-based representation :). As an example, you can have a look at the hierachical clustering code, that does not have this problem. Pickling is very fast with it, especially if you use joblib's pickler (joblib.dump/joblib.save). My two cents. Gaƫl ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
