Re: [mllib] GradientDescent requires huge memory for storing weight vector

2015-01-12 Thread Reza Zadeh
I guess you're not using too many features (e.g. < 10m), just that hashing
the index makes it look that way, is that correct?

If so, the simple dictionary that maps your feature index -> rank can be
broadcast and used everywhere, so you can pass mllib just the feature's
rank as its index.

Reza

On Mon, Jan 12, 2015 at 4:26 PM, Tianshuo Deng 
wrote:

> Hi,
> Currently in GradientDescent.scala, weights is constructed as a dense
> vector:
>
> initialWeights = Vectors.dense(new Array[Double](numFeatures))
>
> And the numFeatures is determined in the loadLibSVMFile as the max index
> of features.
>
> But in the case of using hash function to compute feature index, it
> results in a huge dense vector being generated taking lots of memory space.
>
> Any suggestions?
>
>


[mllib] GradientDescent requires huge memory for storing weight vector

2015-01-12 Thread Tianshuo Deng
Hi,
Currently in GradientDescent.scala, weights is constructed as a dense
vector:

initialWeights = Vectors.dense(new Array[Double](numFeatures))

And the numFeatures is determined in the loadLibSVMFile as the max index of
features.

But in the case of using hash function to compute feature index, it results
in a huge dense vector being generated taking lots of memory space.

Any suggestions?