[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-03-28 Thread johnc1231
GitHub user johnc1231 opened a pull request: https://github.com/apache/spark/pull/17459 [SPARK-20109][MLlib] Added toBlockMatrixDense to IndexedRowMatrix ## What changes were proposed in this pull request? -I added the method `toBlockMatrixDense` to the IndexedRowMatrix

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-04 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya I made changes exactly as you did in your prototype, plus a few style edits. But yeah, I think this is a good, easy to use implementation that will be better in all use cases than current

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 Alright, I agree with this. We'll switch off Dense or Sparse matrix backings based on what the type of the first vector in the iterator is. I'd be happy to take on making these adjustments

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya I think we definitely care about giving users the ability to make either dense or sparse Block matrices. I made a 100k by 10k IndexedRowMatrix of random doubles, then converted

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-04 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 I think your prototype looks good. I'm pretty much just gonna do exactly that then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-11 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Do you have any more comments on this, or are you happy with it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109417523 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -113,6 +114,67 @@ class IndexedRowMatrix @Since

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 Thanks for the feedback guys. All comments addressed, though if anyone else has feedback on discussion me and @viirya are having about whether there should be a separate `toBlockMatrixDense

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-03 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 Did some thinking about this, and I think that to make the API cleaner maybe we could deprecate the regular `toBlockMatrix` method and add `toBlockMatrixSparse`. Until it's removed, we could just

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-02 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109320386 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -89,11 +89,42 @@ class IndexedRowMatrixSuite

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-02 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109320642 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -113,6 +114,67 @@ class IndexedRowMatrix @Since

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to Indexed...

2017-04-02 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 Addressed comments where everything was clear, replied to the last one about only having one toBlockMatrix. Back to you @viirya . Thanks for feedback. --- If your project is set up for it, you

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-02 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109320701 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -89,11 +89,42 @@ class IndexedRowMatrixSuite

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-02 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109320468 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -113,6 +114,67 @@ class IndexedRowMatrix @Since

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-04-06 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r110163778 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -89,11 +89,19 @@ class IndexedRowMatrixSuite

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-06 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Added appropriate tests, back to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-04-12 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r111302756 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -87,21 +87,74 @@ class IndexedRowMatrixSuite

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-04-12 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r111302391 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -108,8 +108,64 @@ class IndexedRowMatrix @Since

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-04-13 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r111492414 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -87,19 +87,92 @@ class IndexedRowMatrixSuite

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-19 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Gonna fix that last test tonight, but just doing a quick timing on my laptop, (2014 Macbook Pro with 2.2Hz i7), the difference was as follows on a 10k by 10k matrix of random doubles

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-24 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya I fixed the test as you asked, so please take a look when you get a chance. I'm having a little bit of trouble with my local spark build for some reason, but I'll do that other benchmark

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-27 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Any more feedback on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-10 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 Also, made changes suggested by @srowen . Don't know if he now has to sign off on those changes being done. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-10 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Now that style nitpicks and sparse benchmarks are done, are you good with this? Also, per your recommendation, CCing @MLnick and @jkbradley for review of this. Should be easy to review

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-15 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 This has been reviewed pretty thoroughly at this point. Can a committer give this a quick look? @srowen @MLnick @jkbradley I think it's basically ready to go in. --- If your project is set up

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-05-26 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r118793567 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -87,19 +87,96 @@ class IndexedRowMatrixSuite

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-05-26 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r118793614 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -108,8 +108,69 @@ class IndexedRowMatrix @Since

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-26 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @srowen Addressed comments, back to you. And thanks for taking time to look this over. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-05-26 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r118793854 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -108,8 +108,69 @@ class IndexedRowMatrix @Since

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-26 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @srowen Fixed both, back to you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-05-20 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r117620801 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -108,8 +108,64 @@ class IndexedRowMatrix @Since

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-05-20 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r117621336 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -108,8 +108,64 @@ class IndexedRowMatrix @Since

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-31 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @srowen @viirya All comments addressed, back to you guys. Hopefully we've just about reached something ready to commit. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method...

2017-05-31 Thread johnc1231
Github user johnc1231 commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r119463845 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -108,8 +108,69 @@ class IndexedRowMatrix @Since

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-05 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Addressed style nitpicks and did spark benchmarks. Think that should be everything. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-05-05 Thread johnc1231
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 Did a sparse benchmark (2014 Macbook Pro with 2.2Hz i7) 60 partitions, 10k by 10k matrix with mostly 0's, 10% 1's, made of SparseVectors. Both old method and new method took about 7.5 seconds