On 02/02/2012 12:34 PM, Olivier Grisel wrote: > 2012/2/2 Mathieu Blondel<[email protected]>: > >> On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel<[email protected]> >> wrote: >> >> >>> I wonder which representation is the nicest for the end user? It might >>> be the case that keeping the unlabeled data as a separate variable >>> might be more natural but that will probably impact pipeline-ability >>> and cross-validation since X_unlabeld.shape[0] won't be the same as >>> X_labeled.shape[0] and y_labeled.shaped[0]. >>> >> cross-validation will probably break any way as the unlabeled examples >> cannot be used in the test set. This is also shows that we should >> probably have a library-wide default encoding for unlabeled data (this >> way, we will be able to make sure that all the unlabeled data goes to >> the training set). >> >> Keeping the label propagation and semi-supervised NB PRs on hold >> forever doesn't help. We should merge them and keep in mind that their >> API is a work-in-progress. >> > Alright: this should be made explicit in the whats_new. > > Larsmans pointed out that the -1 is already used in One-Class SVM (and I think one other place).
I agree that there is a conflict between using -1 as unlabeled and the use in binary classification. But I also think this is not really part of the PR. I would consider changing the unlabeled marker to -2, but then after merging the two semi-supervises algos and doing it in a separate PR. ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
