How many labels does your dataset have? -Xiangrui

On Sat, Apr 26, 2014 at 6:03 PM, DB Tsai <dbt...@stanford.edu> wrote:
> Which version of mllib are you using? For Spark 1.0, mllib will
> support sparse feature vector which will improve performance a lot
> when computing the distance between points and centroid.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Sat, Apr 26, 2014 at 5:49 AM, John King <usedforprinting...@gmail.com> 
> wrote:
>> I'm just wondering are the SparkVector calculations really taking into
>> account the sparsity or just converting to dense?
>>
>>
>> On Fri, Apr 25, 2014 at 10:06 PM, John King <usedforprinting...@gmail.com>
>> wrote:
>>>
>>> I've been trying to use the Naive Bayes classifier. Each example in the
>>> dataset is about 2 million features, only about 20-50 of which are non-zero,
>>> so the vectors are very sparse. I keep running out of memory though, even
>>> for about 1000 examples on 30gb RAM while the entire dataset is 4 million
>>> examples. And I would also like to note that I'm using the sparse vector
>>> class.
>>
>>

Reply via email to