Re: [Scikit-learn-general] LabelEncoder with never seen before values

Lars Buitinck Mon, 03 Feb 2014 02:04:05 -0800

2014-02-02 Andy <t3k...@gmail.com>:
> Now, with respect to sinning: there is really no additional information
> in the labels that could be used during learning.


Actually there is: the presence of classes outside the training set
affects probability distributions. Lidstone-smoothed multinomial and
Bernoulli naive Bayes, as well as all (other) variants of
logistic/softmax regression, never output zero probabilities, so they
must assign some fraction of probability mass to the unseen classes
(without affecting the Bayes optimal decision, so predict output is
unchanged). For closed-form and zero-initialized models, the
distribution over the unseen classes will be uniform, but I'm not sure
how neural nets will fare, since those are initialized randomly.

> The only case when that could
> be important is if the labels have some meaningful labeling and it is
> important to know the position of the labels with respect to the
> previous ones.
> But that is somewhat of a weird thing to encode here anyhow.

... because we don't support sequence/structured stuff.

Is the original problem related to evaluation?

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] LabelEncoder with never seen before values

Reply via email to