I think we could have `classes=None` constructor parameter in
SGDClassifier an possibly many other classifiers. When provided we
would not use the traditional `self.classes_ = np.unique(y)` idiom
already implemented in some classifiers of the project (but not all).

+1 also for raising a ValueError exception when `classes != None` and
if the `y` provided at fit time has some values not in `classes`.
However we need to check with some benchmarks that this integrity
check is not too costly.

This constructor parameters could be overriden by a `fit_param` to
preserve backward compat, especially for classifier models with a
`partial_fit` method.

The expected behavior for a classifier that is passed a non-None
`classes` constructor param would be to never predict a class value.
In case of predict_proba method the missing fit-time class
probabilities should be 0.0.

This protocol (including expected exception types and error messages)
should be formalized as a series of common tests in
sklearn/tests/test_common.py and redundant book keeping code should be
factorized in the sklearn.base.py's ClassifierMixin class IMHO.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to