Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-209193632
Good point. According to matrices multiplication benchmark, we can get peak
performance on modern CPUs with square matrices somewhere between 4Kx4K and
8Kx8K. So, it w
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-209081278
I see that the size of the blocks can be tuned and is fairly small by
default (128). Out of curiosity, how did you pick this number, instead of the
full size of the p
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-201114983
I will make another pass soon:)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user NarineK commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200571650
Thanks @yanboliang, I'll create the jira soon!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200341696
@mengxr After this PR, ```KMeans``` can get 2.5 times as fast as the
original version. Here is the latest performance test result:
After this PR(with optimized BL
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200324564
Build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200324565
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200324279
**[Test build #53922 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53922/consoleFull)**
for PR 10806 at commit
[`5b76bd9`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200291772
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200291776
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200291628
**[Test build #53926 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53926/consoleFull)**
for PR 10806 at commit
[`85b4122`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200291569
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200291567
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200291410
**[Test build #53924 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53924/consoleFull)**
for PR 10806 at commit
[`e166e86`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200276706
**[Test build #53926 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53926/consoleFull)**
for PR 10806 at commit
[`85b4122`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200274750
**[Test build #53924 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53924/consoleFull)**
for PR 10806 at commit
[`e166e86`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-200272126
**[Test build #53922 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53922/consoleFull)**
for PR 10806 at commit
[`5b76bd9`](https://gi
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-199585578
@mengxr Sorry for late response, I will update it and post latest
performance results soon.
---
If your project is set up for it, you can reply to this email and ha
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-199444265
@yanboliang Do you have time to update this PR? Could you also post the
latest performance results after the update? Thanks!
@NarineK Yes, I think it is useful t
Github user NarineK commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-194033385
Hi everyone,
@yanboliang, thanks for optimizing Kmeans.
I have a question. Is it possible to add Within Cluster Sum Square (in
total and for individual cluster)
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-191623707
@avulanov Thanks for you comments. I will update my PR soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub a
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-191536127
@yanboliang @mengxr I made one pass.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54825748
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala
---
@@ -96,14 +96,14 @@ class PowerIterationClustering
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54825291
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54824442
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -204,17 +204,54 @@ class KMeans private (
+ " parent RDDs
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54823847
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54823716
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54822719
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54822694
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54822242
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54821633
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54820556
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implem
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54820188
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -204,17 +204,54 @@ class KMeans private (
+ " parent RDDs
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54819341
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -204,17 +204,54 @@ class KMeans private (
+ " parent RDDs
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54819088
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -204,17 +204,54 @@ class KMeans private (
+ " parent RDDs
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-189500672
cc: @avulanov
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172586988
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172586998
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172586516
**[Test build #49598 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49598/consoleFull)**
for PR 10806 at commit
[`68d830c`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172573688
**[Test build #49598 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49598/consoleFull)**
for PR 10806 at commit
[`68d830c`](https://gi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172566618
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172566615
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172566565
**[Test build #49596 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49596/consoleFull)**
for PR 10806 at commit
[`d0653cb`](https://g
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10306#issuecomment-172562607
@mengxr I have a new and advanced implementation for this issue at #10806 ,
let's move the discussion there. I will close this PR now.
---
If your project is set up
Github user yanboliang closed the pull request at:
https://github.com/apache/spark/pull/10306
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10306#issuecomment-172558579
@mengxr I found the misconfiguration of my test environment and updated it,
thanks!
Now ```gemm``` is about 20-30 times faster than ```axpy/dot``` in the
update
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172556491
**[Test build #49596 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49596/consoleFull)**
for PR 10806 at commit
[`d0653cb`](https://gi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172555661
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10806#issuecomment-172555666
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/10806
[SPARK-8519][SPARK-11560][SPARK-11559] [ML] [MLlib] Optimize KMeans
implementation
* Use BLAS Level 3 matrix-matrix multiplications to compute pairwise
distance in k-means.
* Remove runs re
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10306#issuecomment-171942083
@mengxr Thanks for the prompt. I will check my environment and re-run the
test.
---
If your project is set up for it, you can reply to this email and have your
repl
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10306#issuecomment-171834736
Regarding your local performance test:
1. Make sure you installed optimized BLAS on your system and loaded
correctly in JVM via netlib-java. The different should
52 matches
Mail list logo