Hi,

hm, I think that dropping a column in onehot encoded features is quite uncommon 
in machine learning practice -- based on the applications and implementations 
I've seen. My guess is that the onehot encoded features are multicolinear 
anyway!? There may be certain algorithms that benefit from dropping a column, 
though (e.g., linear regression as a simple example). For instance, pandas' 
get_dummies has a "drop_first" parameter ...
I think it would make sense to have such a parameter in the onehotencoder as 
well, e.g., for working with pipelines.

Best,
Sebastian


> On Jun 25, 2017, at 7:48 AM, Parminder Singh <parmsingh...@gmail.com> wrote:
> 
> Hy Sci-kittens! :-)
> 
> I was doing machine learning a-z course on Udemy, there they told that every 
> time one-hot encoding is done, one of the columns should be dropped as it is 
> like doubling same category twice and redundant to model. I thought if 
> instead of having user find the index and drop it after preprocessing, 
> OneHotEncoder had a drop_one variable, and it automatically removed the last 
> column. What are your thoughts about this? I am new to this community, would 
> like to contribute this myself if it is possible addition.
> 
> Thanks,
> Trion129
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to