[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-29 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57206298 Thanks for the review @mengxr ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-29 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57204030 LGTM. Merged into master! Thanks @rezazadeh ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57036641 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20889/consoleFull) for PR 1778 at commit [`404c64c`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57036647 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57033325 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20889/consoleFull) for PR 1778 at commit [`404c64c`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57021242 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20873/consoleFull) for PR 1778 at commit [`4eb71c6`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57021246 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57014394 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20873/consoleFull) for PR 1778 at commit [`4eb71c6`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57013563 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57008409 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57008403 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20865/consoleFull) for PR 1778 at commit [`4eb71c6`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57001181 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20865/consoleFull) for PR 1778 at commit [`4eb71c6`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56995508 @rezazadeh Could you set the exclusion rules in `dev/MimaExcludes.scala`? --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56991070 Only the binary compatibility test is failing, which is expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56940694 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56940687 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20851/consoleFull) for PR 1778 at commit [`ee8bd65`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56934015 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20851/consoleFull) for PR 1778 at commit [`ee8bd65`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56933417 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56910017 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56909148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56909130 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56909126 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20831/consoleFull) for PR 1778 at commit [`3467cff`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56908641 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20830/consoleFull) for PR 1778 at commit [`aea0247`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56908646 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56908521 @mengxr I also added broadcasting of p and v to further optimize space usage. Also now we're avoiding divide by zero if there is a column with zero magnitude. --- If

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56905126 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20831/consoleFull) for PR 1778 at commit [`3467cff`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56904496 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20830/consoleFull) for PR 1778 at commit [`aea0247`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56904264 @mengxr Merged in your changes and added ability for the threshold to be larger with a warning. Tests pass. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56897059 @rezazadeh I sent a PR to your repo at: https://github.com/rezazadeh/spark/pull/1 . Could you check the changes and merge it if they are correct (hopefully) and look good

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56894886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56894583 @mengxr Thanks for the optimizations. I merged the latest master into my branch and pushed to here. Would you like me to merge your branch into mine? There is n

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56789080 @rezazadeh I made some changes in a local branch: https://github.com/mengxr/spark/blob/rezazadeh-dimsumv2/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/Row

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-22 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56396971 The code looks good for sparse input but for dense input is there any issue with using activeIterator ? I understand that due to dimsum threshold check we have to iter

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-21 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56321520 ahh I saw it in the code now...that will do...no need for absolute values...numbers are positive for me..thanks.. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-21 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56316095 Why do you say normL1 is not implemented? I have implemented normL1 in MultivariateOnlineSummarizer, with tests. Do you want a version without absolute values? If so, j

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-21 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56311340 The colMags right now have sqrt(sum(column1)^2 + sum(column2)^2 + ... + sum(columnN)^2) It will be good to have (sum(column1) + sum(column2) + ... + sum(columnN))

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56262006 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20608/consoleFull) for PR 1778 at commit [`0e4eda4`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818724 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,40 @@ class RowMatrixSuite extends FunSuite

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818718 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.scala --- @@ -53,4 +53,14 @@ trait MultivariateStatisticalSummary

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818659 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818661 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818657 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818655 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818651 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818650 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818645 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -18,6 +18,7 @@ package org.apache.spark.mllib.linalg.d

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818640 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,33 @@ class RowMatrixSuite extends FunSuite

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818648 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -27,10 +28,12 @@ import com.github.fommil.netlib.BLAS.{getI

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56260508 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20608/consoleFull) for PR 1778 at commit [`0e4eda4`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55685262 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20358/consoleFull) for PR 1778 at commit [`25e9d0d`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579689 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,40 @@ class RowMatrixSuite extends FunSuite wit

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579690 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,33 @@ class RowMatrixSuite extends FunSuite wit

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579687 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.scala --- @@ -53,4 +53,14 @@ trait MultivariateStatisticalSummary {

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579662 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579648 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -27,10 +28,12 @@ import com.github.fommil.netlib.BLAS.{getInst

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579676 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579669 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579677 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579647 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -18,6 +18,7 @@ package org.apache.spark.mllib.linalg.dist

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579667 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17579671 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55681033 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20358/consoleFull) for PR 1778 at commit [`25e9d0d`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55680636 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55544053 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20312/consoleFull) for PR 1778 at commit [`25e9d0d`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55542822 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20312/consoleFull) for PR 1778 at commit [`25e9d0d`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523330 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55542478 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20311/consoleFull) for PR 1778 at commit [`fb296f6`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55542450 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20311/consoleFull) for PR 1778 at commit [`fb296f6`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55542396 @mengxr All requested changes made. All tests are passing locally. However, I expect Jenkins to complain because of the new normL1 and normL2 methods added to Multivari

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523235 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -27,10 +27,13 @@ import com.github.fommil.netlib.BLAS.{getI

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523233 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523232 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523214 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523212 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523206 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523204 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(A

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523197 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,33 @@ class RowMatrixSuite extends FunSuite

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread CanoeFZH
Github user CanoeFZH commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17029601 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-5437 @rezazadeh I understand that it is easier to implement the algorithm on a row-oriented format to compute similar columns. But it still sounds more natural to me to compute

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017222 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017229 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,33 @@ class RowMatrixSuite extends FunSuite wit

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017219 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017214 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017206 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017210 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017216 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix(AB,

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17017204 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -27,10 +27,13 @@ import com.github.fommil.netlib.BLAS.{getInst

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53976440 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19528/consoleFull) for PR 1778 at commit [`75a0b51`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53976161 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19527/consoleFull) for PR 1778 at commit [`0f12ade`](https://github.com/a

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53975553 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19528/consoleFull) for PR 1778 at commit [`75a0b51`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53975264 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19527/consoleFull) for PR 1778 at commit [`0f12ade`](https://github.com/ap

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-30 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53975250 Style changes made. Experimental results below. We run DIMSUM daily on a production-scale ads dataset. After replacing the traditional cosine similarity computa

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53943844 @rezazadeh Could you update the PR to follow [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)? Thanks! --- If your proj