[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123412447 I played with it this morning. The bugs were occurring because `ids = List()`; apparently Breeze calls `dgemv` with an invalid `LDA` parameter when you row-index

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123413737 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123413707 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123414657 Ran some local perf tests. Before PR: ``` bin/run-example mllib.LDAExample docs/*.md --maxIterations 100 --algorithm online --vocabSize 100 --k 3 ```

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123414568 [Test build #37968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37968/consoleFull) for PR 7454 at commit

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7454#discussion_r35131297 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -387,39 +387,32 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7454#discussion_r35131292 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -387,39 +387,32 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-21 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123413852 Oh, I see. Thanks for investigating! In my example, the numbers of terms is limited to 10 (so I could print the topics), probably making some documents empty.

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123090642 I'll make a pass. Can you please make a JIRA for this and put it in the title? Also, can you please test this to verify the speedups? It sounds like local

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/7454#discussion_r35070522 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -387,39 +387,32 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123165337 Ohh, actually, it might be from me trying to stats...which might be some weird Breeze object which does not implement toString properly. Let me retry --- If your

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123165239 I think there's a bug. I tried running the LDAExample as follows, and it failed with the following exception: I ran: ``` bin/run-example

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/7454#discussion_r35070523 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -387,39 +387,32 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123165267 I'm wondering if it's a mis-matched shape issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-123173789 Hm, no, I think something is wrong. Can you try running the example as I wrote above? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122212916 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122212897 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-12260 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-12204 [Test build #37610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37610/console) for PR 7454 at commit

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122213493 [Test build #37610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37610/consoleFull) for PR 7454 at commit

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122132823 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122132814 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread feynmanliang
GitHub user feynmanliang opened a pull request: https://github.com/apache/spark/pull/7454 [MLlib]OnlineLDA Performance Improvements Use range-slicing (coalesced memory access), in-place updates, and reduce number of transposes in OnlineLDA implementation. You can merge this pull

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122138072 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122137146 [Test build #37553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37553/consoleFull) for PR 7454 at commit

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122137002 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122137013 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122138031 [Test build #37550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37550/console) for PR 7454 at commit

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122141480 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122133223 [Test build #37550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37550/consoleFull) for PR 7454 at commit

[GitHub] spark pull request: [MLlib]OnlineLDA Performance Improvements

2015-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7454#issuecomment-122141399 [Test build #37553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37553/console) for PR 7454 at commit