On Wed, Oct 26, 2011 at 01:35:07PM +0100, Brian Holt wrote:
> My question is; is there a way to improve the performance of loading
> classifiers, either using different pickle options (of which I don't
> know any, but there may be), or by using a different scheme
> (marshalling sounds promising based on [1]), or any other way?
> Perhaps I can implement a a pickle loader in cython?

I may sound as repeating myself, but I would _really_ like to stay away
from such option.

In my opinion, the best approach is to represent the tree as a small set
of arrays, and use this representation to do the I/O. I/O is then going
to be lightning fast. 

The problem is more general than I/O. The current representation is
fundementally a C-based representation that is exposed in Python via a
forest of objects. This is a very unefficient representation in Python.
Any large manipulation on these objects will be inefficient. This will
come up when you use these objects in parallel computing for instance.

There are two ways down the road. Either you start writing everything in
C (or Cython), or you use a Python-friendly representation: a small
number of objects, most likely arrays, carrying all the information. This
representation can be either a intermediate representation, or the core
representation (arrays are easy to expose in C). I guess that I would
have originally liked that we spent time on having an array-based
representation :).

As an example, you can have a look at the hierachical clustering code,
that does not have this problem. Pickling is very fast with it,
especially if you use joblib's pickler (joblib.dump/joblib.save).

My two cents.

Gaƫl

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to