On 02/03/2014 11:01 AM, Lars Buitinck wrote:
> 2014-02-02 Andy :
>> Now, with respect to sinning: there is really no additional information
>> in the labels that could be used during learning.
> Actually there is: the presence of classes outside the training set
> affects probability distributions.
2014-02-02 Andy :
> Now, with respect to sinning: there is really no additional information
> in the labels that could be used during learning.
Actually there is: the presence of classes outside the training set
affects probability distributions. Lidstone-smoothed multinomial and
Bernoulli naive B
On 01/11/2014 06:49 PM, Christian Jauvin wrote:
> Another take on my previous question is this other question:
>
> Is fitting a LabelEncoder on the *entire* dataset (instead of only on
> the training set) an equivalent "sin" (i.e. a common ML mistake) as
> say doing so with a Scaler or some other p
Another take on my previous question is this other question:
Is fitting a LabelEncoder on the *entire* dataset (instead of only on
the training set) an equivalent "sin" (i.e. a common ML mistake) as
say doing so with a Scaler or some other preprocessing technique?
If the answer is yes (which is w
Hi,
If a LabelEncoder has been fitted on a training set, it might break if it
encounters new values when used on a test set.
The only solution I could come up with for this is to map everything new in
the test set (i.e. not belonging to any existing class) to "", and
then explicitly add a corresp