2012/2/2 Mathieu Blondel <[email protected]>: > On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel <[email protected]> > wrote: > >> I wonder which representation is the nicest for the end user? It might >> be the case that keeping the unlabeled data as a separate variable >> might be more natural but that will probably impact pipeline-ability >> and cross-validation since X_unlabeld.shape[0] won't be the same as >> X_labeled.shape[0] and y_labeled.shaped[0]. > > cross-validation will probably break any way as the unlabeled examples > cannot be used in the test set. This is also shows that we should > probably have a library-wide default encoding for unlabeled data (this > way, we will be able to make sure that all the unlabeled data goes to > the training set). > > Keeping the label propagation and semi-supervised NB PRs on hold > forever doesn't help. We should merge them and keep in mind that their > API is a work-in-progress.
Alright: this should be made explicit in the whats_new. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
