Well, yes, the hack below works (that's all I have time for), but is not
satisfactory - it is not safe, and is verbose and very cumbersome to use,
does not separately deal with SparseVector case and is not complete either.

My question is, out of hundreds of users on this list, someone must have
come up with a better solution - please?


import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.{Vector =>SparkVector}

def toBreeze(v:SparkVector) = BV(v.toArray)

def fromBreeze(bv:BV[Double]) = Vectors.dense(bv.toArray)

def add(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) +
toBreeze(v2))

def subtract(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) -
toBreeze(v2))

def scalarMultiply(a:Double, v:SparkVector) = fromBreeze( a*toBreeze(v1) )


On Tue, Aug 25, 2015 at 9:41 AM, Sonal Goyal <sonalgoy...@gmail.com> wrote:

> From what I have understood, you probably need to convert your vector to
> breeze and do your operations there. Check
> stackoverflow.com/questions/28232829/addition-of-two-rddmllib-linalg-vectors
> On Aug 25, 2015 7:06 PM, "Kristina Rogale Plazonic" <kpl...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I'm still not clear what is the best (or, ANY) way to add/subtract
>> two org.apache.spark.mllib.Vector objects in Scala.
>>
>> Ok, I understand there was a conscious Spark decision not to support
>> linear algebra operations in Scala and leave it to the user to choose a
>> linear algebra library.
>>
>> But, for any newcomer from R or Python, where you don't think twice about
>> adding two vectors, it is such a productivity shot in the foot to have to
>> write your own + operation. I mean, there is support in Spark for p-norm of
>> Vectors, for sqdist between two Vectors, but not for +/-? As I said, I'm a
>> newcomer to linear algebra in Scala and am not familiar with Breeze or
>> apache.commons - I am willing to learn, but would really benefit from
>> guidance from more experienced users. I am also not used to optimizing
>> low-level code and am sure that any hack I do will be just horrible.
>>
>> So, please, could somebody point me to a blog post, documentation, or
>> just patches for this really basic functionality. What do you do to get
>> around it? Am I the only one to have a problem? (And, would it really be so
>> onerous to add +/- to Spark? After all, even org.apache.spark.sql.Column
>> class does have +,-,*,/  )
>>
>> My stupid little use case is to generate some toy data for Kmeans, and I
>> need to translate a Gaussian blob to another center (for streaming and
>> nonstreaming KMeans both).
>>
>> Many thanks! (I am REALLY embarassed to ask such a simple question...)
>>
>> Kristina
>>
>

Reply via email to