I guess you're not using too many features (e.g. < 10m), just that hashing
the index makes it look that way, is that correct?
If so, the simple dictionary that maps your feature index -> rank can be
broadcast and used everywhere, so you can pass mllib just the feature's
rank as its index.
Reza
O
Hi,
Currently in GradientDescent.scala, weights is constructed as a dense
vector:
initialWeights = Vectors.dense(new Array[Double](numFeatures))
And the numFeatures is determined in the loadLibSVMFile as the max index of
features.
But in the case of using hash function to compute feature ind