subject:"Re\: Handling categorical variables in StreamingLogisticRegressionwithSGD"

Re: Handling categorical variables in StreamingLogisticRegressionwithSGD

2016-07-13 Thread kundan kumar

Hi Sean , Thanks for the reply !! Is there anything already available in spark that can fix the depth of categorical variables. The OneHotEncoder changes the level of the vector created depending on the number of distinct values coming in the stream. Is there any parameter available with the

Re: Handling categorical variables in StreamingLogisticRegressionwithSGD

2016-07-12 Thread Sean Owen

Yeah, for this to work, you need to know the number of distinct values a categorical feature will take on, ever. Sometimes that's known, sometimes it's not. One option is to use an algorithm that can use categorical features directly, like decision trees. You could consider hashing your features