Hi, hm, I think that dropping a column in onehot encoded features is quite uncommon in machine learning practice -- based on the applications and implementations I've seen. My guess is that the onehot encoded features are multicolinear anyway!? There may be certain algorithms that benefit from dropping a column, though (e.g., linear regression as a simple example). For instance, pandas' get_dummies has a "drop_first" parameter ... I think it would make sense to have such a parameter in the onehotencoder as well, e.g., for working with pipelines.
Best, Sebastian > On Jun 25, 2017, at 7:48 AM, Parminder Singh <parmsingh...@gmail.com> wrote: > > Hy Sci-kittens! :-) > > I was doing machine learning a-z course on Udemy, there they told that every > time one-hot encoding is done, one of the columns should be dropped as it is > like doubling same category twice and redundant to model. I thought if > instead of having user find the index and drop it after preprocessing, > OneHotEncoder had a drop_one variable, and it automatically removed the last > column. What are your thoughts about this? I am new to this community, would > like to contribute this myself if it is possible addition. > > Thanks, > Trion129 > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn