Re: [Scikit-learn-general] LabelEncoder with never seen before values

Andy Tue, 04 Feb 2014 13:01:14 -0800

On 02/03/2014 11:01 AM, Lars Buitinck wrote:
> 2014-02-02 Andy <[email protected]>:
>> Now, with respect to sinning: there is really no additional information
>> in the labels that could be used during learning.
> Actually there is: the presence of classes outside the training set
> affects probability distributions. Lidstone-smoothed multinomial and
> Bernoulli naive Bayes, as well as all (other) variants of
> logistic/softmax regression, never output zero probabilities, so they
> must assign some fraction of probability mass to the unseen classes
> (without affecting the Bayes optimal decision, so predict output is
> unchanged). For closed-form and zero-initialized models, the
> distribution over the unseen classes will be uniform, but I'm not sure
> how neural nets will fare, since those are initialized randomly.
>


I agree that the probability predicted by the model for an unseen class 
would not be zero.
I don't think that is what I wanted to claim. I think what I tried to 
say was that the model will essentially be the same.
There will be additional columns of zeros and some bias term that 
depends on the regularization.
[I think this will even be the case for the neural net]


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] LabelEncoder with never seen before values

Reply via email to