2012/2/2 Mathieu Blondel <[email protected]>:
> On Thu, Feb 2, 2012 at 7:17 PM, Gael Varoquaux
> <[email protected]> wrote:
>> Just a heads up: I am going to merge in label propagation
>> https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour
>> unless somebody has concerns with the code.
>
> I personally don't like using -1 to encode unlabeled data. I would
> prefer np.nan (which require y to be np.float) or -2 (if you prefer y
> to be np.int).
>
> -1 is commonly used to encode the negative class in binary
> classification so it's confusing. Moreover, for Naive Bayes, it would
> be very natural to use the same class for the supervised and
> semi-supervised settings. In the absence of unlabeled data, the
> algorithm can downgrade gracefully to supervised learning. Therefore,
> it would be better not to use a label encoding which is commonly used
> in supervised learning.

I am -1 for using np.nan and floats for classification. I am ok-ish to
use -2 as the default unlabeled marker as long as we keep it
configurable as a constructor param.

Alternatively we could switch back to the fit(X_labeled, y_labeled,
X_unlabeled=None) convention but that will impact the code quite a
bit.

I wonder which representation is the nicest for the end user? It might
be the case that keeping the unlabeled data as a separate variable
might be more natural but that will probably impact pipeline-ability
and cross-validation since X_unlabeld.shape[0] won't be the same as
X_labeled.shape[0] and y_labeled.shaped[0].

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to