[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]collapsed Gibbs sampli...

2014-09-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-56989958 @witgo Since we are converging on a GraphX-based implementation and distributed representation of the topic model, do you mind closing this PR? Thanks! --- If your

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]collapsed Gibbs sampli...

2014-09-26 Thread witgo
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/1983 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-5274 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20321/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-14 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55549348 @witgo i have saw you new performance test configuration。 I will try your new code and test in my data today --- If your project is set up for it, you can reply

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-14 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-1073 @witgo i have try ur latest code in my corpus 。 it will not Stuck in broadcasting . However ,some exception are throw。

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-3050 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20321/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55514423 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20294/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55515317 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20294/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55376707 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20222/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55381531 @witgo @allwefantasy English | 自动翻译的中文 | Let's try to keep the comments in English as much as possible. |

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55381621 @witgo @allwefantasy We had an offline discussion about LDA's implementation. Please check the JIRA page for the notes. --

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55382369 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20222/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55391092 @mengxr @allwefantasy The current broadcast-based implementation, especially in the corpus is large, the performance loss is more serious. Next week I will

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1983#discussion_r17490277 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,397 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55478709 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20246/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55479363 @allwefantasy I have updated the code, you can try the latest code. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55479772 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20246/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-11 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55343319 @witgo 好的。如果有更新后请通知我。我这里也可以第一时间进行测试。 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-11 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55223673 @allwefantasy Spark是可以调整executor同时运行的task数量的. 如果你想让每个executor同时可以运行17个task.

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-11 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55238978 @witgo 感谢这个技巧的分享。 我目前还遇到一个问题。昨天你 问我这边24w文档的词数是多少,我统计了下,是 2400w

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-11 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55280269 @allwefantasy çŽ°æœ‰çš„ä»£ç  åœ¨è¿­ä»£è®¡ç®—过程中创建了太多的TopicModel实例, 我现在正在尝试解决这个问题. 感谢你的反馈. --- If your

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55089256 @witgo 看了你的性能测试 你 里面没有提到迭代次数。是多少次迭代呢?一个小时就完成了。

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55094784 @allwefantasy 我的测试语料库大小是`196558` 个文档, `7897767` 个词. 迭代次数是`100`次. 你的24万文档总共有多个词? 你

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55095034 (Pardon, I think it's important to also summarize in English, as the lingua franca of the project, for the benefit of other readers.) --- If your project is set up for

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55095890 @allwefantasy 我认为这里的代码` Document(parts(0).toInt,(0 until wordInfo.value.size).map(k= values.getOrElse(k,0)).toArray)` 是有点问题的..

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55096559 @srowen I will try to translate the comments into English --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55129263 @witgo 那就是我犯了错误,对Document 中content 理解错了。我以为content

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55213212 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20133/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55216611 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20133/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread allwefantasy
Github user allwefantasy commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55221324 @witgo 西面这一段代码可以多线程化么? for (i - 0 until content.length) { val term = content(i) val

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54952703 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20037/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54957948 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20037/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54720047 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19918/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54722195 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19918/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54696954 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19874/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54698906 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19874/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54064930 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19562/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54065121 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19562/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-01 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54065708 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54066312 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19563/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-54072638 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19563/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53855239 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19467/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53864655 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19467/consoleFull)** after a configured wait of `120m`. --- If your project is

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53698415 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19395/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53699785 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19396/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53699939 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19396/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53700053 @mengxr This patch removed the `accumulable` operation . repair formula errors in `dropOneDistSampler ` method and some of the performance optimization. About how I

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53704442 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19398/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53704512 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19395/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53709038 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19398/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53547793 @witgo Thanks for working on LDA! Could you briefly describe what you changed in this PR? The major feedback of #476 is how we store the model, which may be worth more

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53440660 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19213/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53449444 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19213/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53273744 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19139/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53280797 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19139/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-52587323 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18809/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-52589954 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18809/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-52419415 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18702/consoleFull) for PR 1983 at commit

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-52420410 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18702/consoleFull) for PR 1983 at commit