Hi,
If a LabelEncoder has been fitted on a training set, it might break if it
encounters new values when used on a test set.
The only solution I could come up with for this is to map everything new in
the test set (i.e. not belonging to any existing class) to "<unknown>", and
then explicitly add a corresponding class to the LabelEncoder afterward:
# train and test are pandas.DataFrame's and c is whatever column
le = LabelEncoder()
train[c] = le.fit_transform(train[c])
test[c] = test[c].map(lambda s: '<unknown>' if s not in le.classes_ else s)
le.classes_ = np.append(le.classes_, '<unknown>')
test[c] = le.transform(test[c])
This works, but is there a better solution?
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general