If you corpus is large (nlp) this is indeed the best solution otherwise
(few words I.e. Categories) I guess you will end up with the same result
On Friday, 6 November 2015, Balachandar R.A.
wrote:
> Hi Guillaume,
>
>
> This is always an option. However, I read about HashingTF which exactly
> do
Hi Guillaume,
This is always an option. However, I read about HashingTF which exactly
does this quite efficiently and can scale too. Hence, looking for a
solution using this technique.
regards
Bala
On 5 November 2015 at 18:50, tog wrote:
> Hi Bala
>
> Can't you do a simple dictionnary and m
Hi Bala
Can't you do a simple dictionnary and map those values to numbers?
Cheers
Guillaume
On 5 November 2015 at 09:54, Balachandar R.A.
wrote:
> HI
>
>
> I am new to spark MLlib and machine learning. I have a csv file that
> consists of around 100 thousand rows and 20 columns. Of these 20 co
HI
I am new to spark MLlib and machine learning. I have a csv file that
consists of around 100 thousand rows and 20 columns. Of these 20 columns,
10 contains string values. Each value in these columns are not necessarily
unique. They are kind of categorical, that is, the values could be one
amoun