[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-04-12 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-209193632 Good point. According to matrices multiplication benchmark, we can get peak performance on modern CPUs with square matrices somewhere between 4Kx4K and 8Kx8K. So, it w

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-04-12 Thread thunterdb
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-209081278 I see that the size of the blocks can be tuned and is fairly small by default (128). Out of curiosity, how did you pick this number, instead of the full size of the p

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-201114983 I will make another pass soon:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread NarineK
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200571650 Thanks @yanboliang, I'll create the jira soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200341696 @mengxr After this PR, ```KMeans``` can get 2.5 times as fast as the original version. Here is the latest performance test result: After this PR(with optimized BL

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200324564 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200324565 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200324279 **[Test build #53922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53922/consoleFull)** for PR 10806 at commit [`5b76bd9`](https://g

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200291772 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200291776 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200291628 **[Test build #53926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53926/consoleFull)** for PR 10806 at commit [`85b4122`](https://g

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200291569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200291567 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200291410 **[Test build #53924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53924/consoleFull)** for PR 10806 at commit [`e166e86`](https://g

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200276706 **[Test build #53926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53926/consoleFull)** for PR 10806 at commit [`85b4122`](https://gi

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200274750 **[Test build #53924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53924/consoleFull)** for PR 10806 at commit [`e166e86`](https://gi

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-200272126 **[Test build #53922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53922/consoleFull)** for PR 10806 at commit [`5b76bd9`](https://gi

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-21 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-199585578 @mengxr Sorry for late response, I will update it and post latest performance results soon. --- If your project is set up for it, you can reply to this email and ha

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-199444265 @yanboliang Do you have time to update this PR? Could you also post the latest performance results after the update? Thanks! @NarineK Yes, I think it is useful t

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-08 Thread NarineK
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-194033385 Hi everyone, @yanboliang, thanks for optimizing Kmeans. I have a question. Is it possible to add Within Cluster Sum Square (in total and for individual cluster)

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-191623707 @avulanov Thanks for you comments. I will update my PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub a

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-191536127 @yanboliang @mengxr I made one pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54825748 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala --- @@ -96,14 +96,14 @@ class PowerIterationClustering

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54825291 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54824442 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -204,17 +204,54 @@ class KMeans private ( + " parent RDDs

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54823847 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54823716 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54822719 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54822694 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54822242 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54821633 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54820556 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -224,142 +261,133 @@ class KMeans private ( /** * Implem

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54820188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -204,17 +204,54 @@ class KMeans private ( + " parent RDDs

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54819341 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -204,17 +204,54 @@ class KMeans private ( + " parent RDDs

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-03-02 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/10806#discussion_r54819088 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -204,17 +204,54 @@ class KMeans private ( + " parent RDDs

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-02-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-189500672 cc: @avulanov --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172586988 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172586998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172586516 **[Test build #49598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49598/consoleFull)** for PR 10806 at commit [`68d830c`](https://g

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172573688 **[Test build #49598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49598/consoleFull)** for PR 10806 at commit [`68d830c`](https://gi

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172566618 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172566615 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172566565 **[Test build #49596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49596/consoleFull)** for PR 10806 at commit [`d0653cb`](https://g

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10306#issuecomment-172562607 @mengxr I have a new and advanced implementation for this issue at #10806 , let's move the discussion there. I will close this PR now. --- If your project is set up

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/10306 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10306#issuecomment-172558579 @mengxr I found the misconfiguration of my test environment and updated it, thanks! Now ```gemm``` is about 20-30 times faster than ```axpy/dot``` in the update

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172556491 **[Test build #49596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49596/consoleFull)** for PR 10806 at commit [`d0653cb`](https://gi

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172555661 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10806#issuecomment-172555666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-18 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/10806 [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [MLlib] Optimize KMeans implementation * Use BLAS Level 3 matrix-matrix multiplications to compute pairwise distance in k-means. * Remove runs re

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-15 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10306#issuecomment-171942083 @mengxr Thanks for the prompt. I will check my environment and re-run the test. --- If your project is set up for it, you can reply to this email and have your repl

[GitHub] spark pull request: [SPARK-8519][SPARK-11560][SPARK-11559] [ML] [M...

2016-01-14 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10306#issuecomment-171834736 Regarding your local performance test: 1. Make sure you installed optimized BLAS on your system and loaded correctly in JVM via netlib-java. The different should