Hi Sean ,
Thanks for the reply !!
Is there anything already available in spark that can fix the depth of
categorical variables. The OneHotEncoder changes the level of the vector
created depending on the number of distinct values coming in the stream.
Is there any parameter available with the
Yeah, for this to work, you need to know the number of distinct values
a categorical feature will take on, ever. Sometimes that's known,
sometimes it's not.
One option is to use an algorithm that can use categorical features
directly, like decision trees.
You could consider hashing your features