subject:"\[jira\] \[Commented\] \(SPARK\-5563\) LDA with online variational inference"

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-03-17 Thread Matthew Willson (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365073#comment-14365073
]

Matthew Willson commented on SPARK-5563:

No thank you for working on this! great stuff.

Vowpal is indeed scarily fast :)

If you're looking for more suggestions, one other quick win for LDA is the
ability to seed topics with specific keywords by specifying non-symmetric
Dirichlet priors for some of the topic-word distributions. This was very easy
to add to gensim's online LDA implementation for example.

LDA with online variational inference
-

Key: SPARK-5563
URL: https://issues.apache.org/jira/browse/SPARK-5563
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: yuhao yang

Latent Dirichlet Allocation (LDA) parameters can be inferred using online
variational inference, as in Hoffman, Blei and Bach. “Online Learning for
Latent Dirichlet Allocation.” NIPS, 2010. This algorithm should be very
efficient and should be able to handle much larger datasets than batch
algorithms for LDA.
This algorithm will also be important for supporting Streaming versions of
LDA.
The implementation will ideally use the same API as the existing LDA but use
a different underlying optimizer.
This will require hooking in to the existing mllib.optimization frameworks.
This will require some discussion about whether batch versions of online
variational inference should be supported, as well as what variational
approximation should be used now or in the future.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-03-16 Thread yuhao yang (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364350#comment-14364350
]

yuhao yang commented on SPARK-5563:
---

Matthew Willson. Thanks for the attention and idea. Apart from Gensim,
vowpal-wabbit also has a distributed implementation provided by Matthew D.
Hoffman, which seems to be amazingly fast. I'll refer to those libraries as
much as possible. And suggestions are always welcome.

LDA with online variational inference
-

Key: SPARK-5563
URL: https://issues.apache.org/jira/browse/SPARK-5563
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: yuhao yang

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-03-16 Thread Matthew Willson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363308#comment-14363308
 ] 

Matthew Willson commented on SPARK-5563:


Definitely keen on this!

In case it's useful, for reference gensim (https://github.com/piskvorky/gensim 
) has a great Python/numpy implementation of this, alongside various other 
goodies including an online version of HDP (Hierarchical Dirichlet Process) 
LDA, which doesn't require the number of topics to be fixed in advance. (I've 
been told that if you're going to implement online variational inference for 
LDA, there isn't much extra cost in going the whole way and implementing it for 
HDP-LDA...)

 LDA with online variational inference
 -

 Key: SPARK-5563
 URL: https://issues.apache.org/jira/browse/SPARK-5563
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: yuhao yang

 Latent Dirichlet Allocation (LDA) parameters can be inferred using online 
 variational inference, as in Hoffman, Blei and Bach. “Online Learning for 
 Latent Dirichlet Allocation.”  NIPS, 2010.  This algorithm should be very 
 efficient and should be able to handle much larger datasets than batch 
 algorithms for LDA.
 This algorithm will also be important for supporting Streaming versions of 
 LDA.
 The implementation will ideally use the same API as the existing LDA but use 
 a different underlying optimizer.
 This will require hooking in to the existing mllib.optimization frameworks.
 This will require some discussion about whether batch versions of online 
 variational inference should be supported, as well as what variational 
 approximation should be used now or in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-03-16 Thread Joseph K. Bradley (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363578#comment-14363578
]

Joseph K. Bradley commented on SPARK-5563:
--

[~matthjw] That's a good point: For most inference algorithms (sampling
batch EM), it's significantly more expensive/difficult to go from LDA to the
HDP, but it may be a lot easier for online variational inference.

LDA with online variational inference
-

Key: SPARK-5563
URL: https://issues.apache.org/jira/browse/SPARK-5563
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: yuhao yang

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-02-05 Thread Apache Spark (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308584#comment-14308584
]

Apache Spark commented on SPARK-5563:
-

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/4419

LDA with online variational inference
-

Key: SPARK-5563
URL: https://issues.apache.org/jira/browse/SPARK-5563
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-02-04 Thread yuhao yang (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305115#comment-14305115
]

yuhao yang commented on SPARK-5563:
---

Thanks Joseph for helping create the jira.
Paste previous [comment
link|https://issues.apache.org/jira/browse/SPARK-1405?focusedCommentId=14302952page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14302952]
here and share the current implementation at
https://github.com/hhbyyh/OnlineLDA_Spark.

I agree with the suggestion listed above and will propose a PR for more
detailed discussion soon. Thanks

LDA with online variational inference
-

Key: SPARK-5563
URL: https://issues.apache.org/jira/browse/SPARK-5563
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

2015-02-04 Thread yuhao yang (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305199#comment-14305199
]

yuhao yang commented on SPARK-5563:
---

BTW, batch versions of online variational inference is useful when processing
small data set (especially toy data in ut).

LDA with online variational inference
-

Key: SPARK-5563
URL: https://issues.apache.org/jira/browse/SPARK-5563
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5563) LDA with online variational inference

[jira] [Commented] (SPARK-5563) LDA with online variational inference

[jira] [Commented] (SPARK-5563) LDA with online variational inference

[jira] [Commented] (SPARK-5563) LDA with online variational inference

[jira] [Commented] (SPARK-5563) LDA with online variational inference

[jira] [Commented] (SPARK-5563) LDA with online variational inference

[jira] [Commented] (SPARK-5563) LDA with online variational inference

7 matches

Site Navigation

Mail list logo

Footer information