Re: spark mllib kmeans

2015-05-11 Thread Driesprong, Fokko
Hi Paul, I would say that it should be possible, but you'll need a different distance measure which conforms to your coordinate system. 2015-05-11 14:59 GMT+02:00 Pa Rö : > hi, > > it is possible to use a custom distance measure and a other data typ as > vector? > i want cluster temporal geo dat

Re: MLLib SVM probability

2015-05-04 Thread Driesprong, Fokko
Hi Robert, I would say, taking the sign of the numbers represent the class of the input-vector. What kind of data are you using, and what kind of traning-set do you use. Fundamentally a SVM is able to separate only two classes, you can do one vs the rest as you mentioned. I don't see how LVQ can

Re: spark 1.3.1

2015-05-04 Thread Driesprong, Fokko
Hi Saurabh, Did you check the log of maven? 2015-05-04 15:17 GMT+02:00 Saurabh Gupta : > HI, > > I am trying to build a example code given at > > https://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds > > code is: > > // Import factory methods provided by DataTy

Re: Compute pairwise distance

2015-04-30 Thread Driesprong, Fokko
: > >> This is my first thought, please suggest any further improvement: >> 1. Create a rdd of your dataset >> 2. Do an cross join to generate pairs >> 3. Apply reducebykey and compute distance. You will get a rdd with >> keypairs and distance >> >>

Compute pairwise distance

2015-04-29 Thread Driesprong, Fokko
Dear Sparkers, I am working on an algorithm which requires the pair distance between all points (eg. DBScan, LOF, etc.). Computing this for *n* points will require produce a n^2 matrix. If the distance measure is symmetrical, this can be reduced to (n^2)/2. What would be the most optimal way of co