Hi Bala Can't you do a simple dictionnary and map those values to numbers?
Cheers Guillaume On 5 November 2015 at 09:54, Balachandar R.A. <balachandar...@gmail.com> wrote: > HI > > > I am new to spark MLlib and machine learning. I have a csv file that > consists of around 100 thousand rows and 20 columns. Of these 20 columns, > 10 contains string values. Each value in these columns are not necessarily > unique. They are kind of categorical, that is, the values could be one > amount, say 10 values. To start with, I could run examples, especially, > random forest algorithm in my local spark (1.5.1.) platform. However, I > have a challenge with my dataset due to these strings as the APIs takes > numerical values. Can any one tell me how I can map these categorical > values (strings) into numbers and use them with random forest algorithms? > Any example will be greatly appreciated. > > > regards > > Bala > -- PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net