Re: [Scikit-learn-general] LabelEncoder with never seen before values

2014-02-04 Thread Andy
On 02/03/2014 11:01 AM, Lars Buitinck wrote: > 2014-02-02 Andy : >> Now, with respect to sinning: there is really no additional information >> in the labels that could be used during learning. > Actually there is: the presence of classes outside the training set > affects probability distributions.

Re: [Scikit-learn-general] LabelEncoder with never seen before values

2014-02-03 Thread Lars Buitinck
2014-02-02 Andy : > Now, with respect to sinning: there is really no additional information > in the labels that could be used during learning. Actually there is: the presence of classes outside the training set affects probability distributions. Lidstone-smoothed multinomial and Bernoulli naive B

Re: [Scikit-learn-general] LabelEncoder with never seen before values

2014-02-02 Thread Andy
On 01/11/2014 06:49 PM, Christian Jauvin wrote: > Another take on my previous question is this other question: > > Is fitting a LabelEncoder on the *entire* dataset (instead of only on > the training set) an equivalent "sin" (i.e. a common ML mistake) as > say doing so with a Scaler or some other p

Re: [Scikit-learn-general] LabelEncoder with never seen before values

2014-01-11 Thread Christian Jauvin
Another take on my previous question is this other question: Is fitting a LabelEncoder on the *entire* dataset (instead of only on the training set) an equivalent "sin" (i.e. a common ML mistake) as say doing so with a Scaler or some other preprocessing technique? If the answer is yes (which is w

[Scikit-learn-general] LabelEncoder with never seen before values

2014-01-09 Thread Christian Jauvin
Hi, If a LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The only solution I could come up with for this is to map everything new in the test set (i.e. not belonging to any existing class) to "", and then explicitly add a corresp