On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel <[email protected]> wrote:
> I wonder which representation is the nicest for the end user? It might > be the case that keeping the unlabeled data as a separate variable > might be more natural but that will probably impact pipeline-ability > and cross-validation since X_unlabeld.shape[0] won't be the same as > X_labeled.shape[0] and y_labeled.shaped[0]. cross-validation will probably break any way as the unlabeled examples cannot be used in the test set. This is also shows that we should probably have a library-wide default encoding for unlabeled data (this way, we will be able to make sure that all the unlabeled data goes to the training set). Keeping the label propagation and semi-supervised NB PRs on hold forever doesn't help. We should merge them and keep in mind that their API is a work-in-progress. Mathieu ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
