[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-05-18 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-103178283 Ok thx ! Sent from my iPhone > On May 18, 2015, at 8:47 PM, Sean Owen wrote: > > We can't directly but there's an aut

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-05-18 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-103169503 Sure, but I'm traveling now. Would you mind closing it for me? Sent from my iPhone > On May 18, 2015, at 8:19 PM, Sean Owen wrote: >

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-25 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r23507379 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-24 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r23495418 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-22 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r23430407 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

2015-01-21 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/4144#issuecomment-70969423 @mengxr You may want to refer to the newer code in GitHub <https://github.com/derrickburns/generalized-kmeans-clustering> instead of the old PR

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

2015-01-21 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/4144#issuecomment-70948894 @mengxr FYI, I'm about to work on the performance of clustering millions of sparse vectors of very high dimension particularly when using KL diver

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

2015-01-21 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/4144#issuecomment-70946876 @mengxr A nit: I'd pull the range creation on line 331 out of the inner loop. Sent from my iPhone > On Jan 21, 2015, at 3:37 PM,

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

2015-01-21 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/4144#issuecomment-70946101 @mengxr It looks like the final costs rdd is still persisted in exit from the initialization method. Sent from my iPhone > On Jan 21, 2

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

2015-01-21 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/4144#issuecomment-70941228 @mengxr I back-ported your port to my com.massivedatascience.clusterer GitHub project (modulo the conversion of data to dense form). :) --- If your project is set

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

2015-01-21 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/4144#issuecomment-70940665 The conversion of vectors to dense form will only work if the dimension of the space is small, in which case there was little need to provide vectors in sparse form

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-19 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r23198417 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-19 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r23198020 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-19 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70589276 In my application (n-gram contexts), the sparse vectors can be of extremely high dimension. To make the problem manageable, I select the k most important dimensions

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-19 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70583162 @mengxr One more thing regarding sparse vectors. Sparse vectors can become dense under cluster creation, which, in turn, can cause the running time of

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-18 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70443890 @mengxr I have implemented several variants of Kullback-Leibler divergence in my separate GitHub repository <https://github.com/derrickbu

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-16 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70345238 I see the problem with sparse vectors and the KL divergence. I implemented a smoothing operation to approximate KL divergence. Sent from my iPhone

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-11 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r22774169 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/package.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-11 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r22768878 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/package.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-10 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r22764827 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-09 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-69404412 Thanks for the information on the speedup that you obtained by eliminating Breeze. I was unaware that the performance is so poor. To what do you attribute

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-06 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-68985129 The pull request that you integrated on December 3 is redundant to this one. Therefore you need not worry about merging in those changes. Simply select this

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-05 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-68804845 I've said this before, so please forgive me for being repetitive. The new implementation is a rewrite, not a patch, so it is not possible to parce

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-12-30 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-68425655 Thx for taking this on. Sent from my iPhone > On Dec 30, 2014, at 3:23 PM, Xiangrui Meng wrote: > > @derrickburns I'm goin

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-12-30 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-68414436 No problem. Thx! Sent from my iPhone > On Dec 30, 2014, at 3:23 PM, Xiangrui Meng wrote: > > @derrickburns I'm goin

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-12-28 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-68218597 That would be great! On Sat, Dec 27, 2014 at 12:59 PM, Nicholas Chammas wrote: > @mengxr <https://github.com/mengxr> Now that 1.2.0 is ou

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-23 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-60336780 @mengxr Is there interest in this pull request? Should I delete it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-08 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58449199 The closure data capture problem occurs MultiKMeans.scala:105. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58126789 @mengxr I restored the KMeans class and public methods on that class. I did not mark them for deprecation. I also re-formatted the code to follow the Spark

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58122976 Ah, I see. Will fix. Sent from my iPhone > On Oct 6, 2014, at 5:55 PM, Xiangrui Meng wrote: > > @derrickburns I don't know

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58106509 @mengxr Is there an IntelliJ or Eclipse configuration that i can use to reformat the code according to the guidelines? The breaking change in

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58106085 I know exactly which closure is exceeding the size limitation. The problem, is that I cannot see how to make the closure capture less data! On Mon, Oct 6

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-03 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-57894117 I ran the style tests. The pass. Is there something else in the style guide that is not captured in the tests ? I have expended much effort to avoid

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-02 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-57714029 @mengxr This is as expected. I need help in solving the closure data capture problem. Sent from my iPhone > On Oct 2, 2014, at 2:10

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-02 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-57698333 @mingxr I created a new clean pull request. I *still need help* to understand/fix a closure that is capturing too much data. --- If your project is set up for it

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-02 Thread derrickburns
GitHub user derrickburns opened a pull request: https://github.com/apache/spark/pull/2634 [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-3424] [RESUBMIT] MLLIB K-Means Clusterer This commit introduces a general distance function trait, `PointOps`, for the Spark K-Means clusterer

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-02 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-57696626 I will close this pull request and create another. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-02 Thread derrickburns
Github user derrickburns closed the pull request at: https://github.com/apache/spark/pull/2419 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-02 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-57691032 Uh oh. Sent from my iPhone > On Oct 2, 2014, at 11:35 AM, Nicholas Chammas wrote: > > @derrickburns Side note: It looks like

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-30 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-57404998 @mengxr 1. I fixed the merge issue and also remerged to capture more recent changes. 2. I did as you suggested and introduced a local variable to hold a

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-29 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-57238377 Will do Sent from my iPhone > On Sep 29, 2014, at 2:32 PM, Xiangrui Meng wrote: > > @derrickburns Could you merge the PR cleanly

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-26 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-57023670 @mengxr I ran the tests according to your instructions. One issue remains that I cannot resolve. *I need help with that issue*. ## Issues First

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-25 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56907546 @mengxr I will do as you suggest. On Thu, Sep 25, 2014 at 4:15 PM, Xiangrui Meng wrote: > @derrickburns <https://github.com/derrickburns

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-24 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56775648 I know that this may not be the most pressing issue for anyone on Spark, but I would like to complete this work. I cannot do that without some help. @mengxr

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-22 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56422112 I still need some help with determining why some test fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-20 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56285643 I deleted that file in my original pull request. — Sent from Mailbox On Fri, Sep 19, 2014 at 4:32 PM, Nicholas Chammas wrote: > FYI

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-18 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56127880 I don't understand the test failure. Can someone help me? Sent from my iPhone > On Sep 16, 2014, at 6:59 PM, Nicholas Chammas

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55996597 @mengxr, can someone help me to understand the test failure? The test "task size should be small in both training and prediction" in

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55992851 Jenkins, retest this please. On Wed, Sep 17, 2014 at 9:08 PM, Derrick Burns wrote: > Tests fixed. Please re-run. > > On We

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55992818 Tests fixed. Please re-run. On Wed, Sep 17, 2014 at 7:04 PM, Apache Spark QA wrote: > QA tests have finished >

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55972212 @mengxr per your request, here is a pull request that addresses many of the outstanding issues with the 1.1.0 Spark K-Means clusterer. --- If your project is set

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55971460 To understand and evaluate this pull request, I would suggest that a reviewer do the following: 1) Look at the `PointOps` trait and its `FastEuclideanOps

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55968915 @srowen, I moved most of the line comments into code comments that I have committed. Thx! --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-17 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55898846 I agree that some of my comments should go in the code. As for the Big Bang change, I understand your concern. The distance functions touches practically

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640478 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeans.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640379 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/package.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640296 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -87,7 +87,7 @@ class KMeansSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640286 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -75,7 +75,7 @@ class KMeansSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640161 --- Diff: mllib/src/test/java/org/apache/spark/mllib/clustering/JavaKMeansSuite.java --- @@ -76,15 +76,12 @@ public void runKMeansUsingConstructor

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640135 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/package.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640123 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/EuclideanOps.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeansClusterer.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640097 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeans.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640049 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeans.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640037 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeans.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17640018 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LocalKMeans.scala --- @@ -1,127 +0,0 @@ -/* - * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17639992 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LocalKMeans.scala --- @@ -1,127 +0,0 @@ -/* - * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17639973 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansRandom.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17639919 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansPlusPlus.scala --- @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17639873 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansParallel.scala --- @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17639860 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansParallel.scala --- @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-16 Thread derrickburns
Github user derrickburns commented on a diff in the pull request: https://github.com/apache/spark/pull/2419#discussion_r17639842 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansParallel.scala --- @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: This commit addresses SPARK-3218, SPARK-3219, ...

2014-09-16 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55830264 Thanks @nchammas, I added the Apache license headers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: This commit addresses SPARK-3218, SPARK-3219, ...

2014-09-16 Thread derrickburns
GitHub user derrickburns opened a pull request: https://github.com/apache/spark/pull/2419 This commit addresses SPARK-3218, SPARK-3219, SPARK-3261, and SPARK-3424... This commit introduces a general distance function (PointOps) for the KMeans clusterer. There are no