I think we could have `classes=None` constructor parameter in SGDClassifier an possibly many other classifiers. When provided we would not use the traditional `self.classes_ = np.unique(y)` idiom already implemented in some classifiers of the project (but not all).
+1 also for raising a ValueError exception when `classes != None` and if the `y` provided at fit time has some values not in `classes`. However we need to check with some benchmarks that this integrity check is not too costly. This constructor parameters could be overriden by a `fit_param` to preserve backward compat, especially for classifier models with a `partial_fit` method. The expected behavior for a classifier that is passed a non-None `classes` constructor param would be to never predict a class value. In case of predict_proba method the missing fit-time class probabilities should be 0.0. This protocol (including expected exception types and error messages) should be formalized as a series of common tests in sklearn/tests/test_common.py and redundant book keeping code should be factorized in the sklearn.base.py's ClassifierMixin class IMHO. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
