2012/2/2 Mathieu Blondel <[email protected]>:
> On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel <[email protected]> 
> wrote:
>
>> I wonder which representation is the nicest for the end user? It might
>> be the case that keeping the unlabeled data as a separate variable
>> might be more natural but that will probably impact pipeline-ability
>> and cross-validation since X_unlabeld.shape[0] won't be the same as
>> X_labeled.shape[0] and y_labeled.shaped[0].
>
> cross-validation will probably break any way as the unlabeled examples
> cannot be used in the test set. This is also shows that we should
> probably have a library-wide default encoding for unlabeled data (this
> way, we will be able to make sure that all the unlabeled data goes to
> the training set).
>
> Keeping the label propagation and semi-supervised NB PRs on hold
> forever doesn't help. We should merge them and keep in mind that their
> API is a work-in-progress.

Alright: this should be made explicit in the whats_new.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to