[Scikit-learn-general] LabelEncoder with never seen before values

Christian Jauvin Thu, 09 Jan 2014 12:23:57 -0800

Hi,

If a LabelEncoder has been fitted on a training set, it might break if it
encounters new values when used on a test set.


The only solution I could come up with for this is to map everything new in
the test set (i.e. not belonging to any existing class) to "<unknown>", and
then explicitly add a corresponding class to the LabelEncoder afterward:

# train and test are pandas.DataFrame's and c is whatever column
le = LabelEncoder()
train[c] = le.fit_transform(train[c])
test[c] = test[c].map(lambda s: '<unknown>' if s not in le.classes_ else s)
le.classes_ = np.append(le.classes_, '<unknown>')
test[c] = le.transform(test[c])

This works, but is there a better solution?

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] LabelEncoder with never seen before values

Reply via email to