Hello everyone,

I wrote a class which takes a base estimator in its constructor. For
efficiency reasons, it is best if the estimator supports dense input.
I would like thus to issue a warning if the given estimator supports
only sparse input (as is the case of e.g. svm.sparse.LinearSVC). This
raises the question of how shall we make this information available in
the scikit. One solution would be to use class variables in all
estimators. For example:

class NaiveBayes(BaseEstimator):
    __ndarray_input__ = True
    __csr_matrix_input__ = True

    [...]

This would require to tag each class with the supported estimators.
This seems like a fairly reasonable solution to me.

Another idea would be to have a dedicated method:

def accepts(klass):
    X = np.random.random((3,2))
    X = klass(X)
    y = np.random.random((3,))
    clf = self.__class__()
    try:
        clf.fit(X, y)
        return True
    except:
        return False

>>> clf = NaiveBayes()
>>> clf.accepts(sp.csr_matrix)
True

This solution is a bit of a hack but should work as each fit method
should do input validation anyway.

We should keep in mind that, in the future, other data containers may
be supported: pytable, carray, ...

Also, currently the class reference is indexed by module name but it
would be nice if it could be also indexed by input support (what
classes support dense / sparse / both inputs).
With the above, we could generate the rst file automatically.

Mathieu

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to