2012/2/2 Mathieu Blondel <[email protected]>: > On Thu, Feb 2, 2012 at 7:17 PM, Gael Varoquaux > <[email protected]> wrote: >> Just a heads up: I am going to merge in label propagation >> https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour >> unless somebody has concerns with the code. > > I personally don't like using -1 to encode unlabeled data. I would > prefer np.nan (which require y to be np.float) or -2 (if you prefer y > to be np.int). > > -1 is commonly used to encode the negative class in binary > classification so it's confusing. Moreover, for Naive Bayes, it would > be very natural to use the same class for the supervised and > semi-supervised settings. In the absence of unlabeled data, the > algorithm can downgrade gracefully to supervised learning. Therefore, > it would be better not to use a label encoding which is commonly used > in supervised learning.
I am -1 for using np.nan and floats for classification. I am ok-ish to use -2 as the default unlabeled marker as long as we keep it configurable as a constructor param. Alternatively we could switch back to the fit(X_labeled, y_labeled, X_unlabeled=None) convention but that will impact the code quite a bit. I wonder which representation is the nicest for the end user? It might be the case that keeping the unlabeled data as a separate variable might be more natural but that will probably impact pipeline-ability and cross-validation since X_unlabeld.shape[0] won't be the same as X_labeled.shape[0] and y_labeled.shaped[0]. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
