Re: Does spark's random forest need categorical features to be one hot encoded?

2017-03-23 Thread Ryan
no you don't need one hot. but since the feature column is a vector and vector only accepts numbers, if your feature is string then a StringIndexer is needed. http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier here's an example. On Thu, Mar 23, 2017 at

Does spark's random forest need categorical features to be one hot encoded?

2017-03-23 Thread Aseem Bansal
I was reading http://datascience.stackexchange.com/questions/5226/strings-as-features-in-decision-tree-random-forest and found that needs to be done in sklearn. Is that required in spark?