[jira] [Closed] (HIVEMALL-86) Change Hadoop version dependencies to v2.4.0

2017-04-20 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-86.
--
Resolution: Fixed
  Assignee: Makoto Yui

> Change Hadoop version dependencies to v2.4.0
> 
>
> Key: HIVEMALL-86
> URL: https://issues.apache.org/jira/browse/HIVEMALL-86
> Project: Hivemall
>  Issue Type: Improvement
>Reporter: Makoto Yui
>Assignee: Makoto Yui
>
> Change Hadoop version dependencies to v2.4.0
> For historical reasons, Hivemall depends on Hadoop 0.2.0.2-chd3u6 for 
> "provided" scope as follows:
> {code}
> $find . -type f | grep pom.xml | xargs grep cdh
> ./core/pom.xml: 0.20.2-cdh3u6
> ./mixserv/pom.xml:  0.20.2-cdh3u6
> ./nlp/pom.xml:  0.20.2-cdh3u6
> ./spark/spark-common/pom.xml:   
> 0.20.2-cdh3u6
> {code}
> Better to change the version dependencies to Hadoop v2.4.0 (not v2.6.x). 
> Then, dependencies packages change and careful verification is required.
> This branch changed the dependencies to v2.4.0
> https://github.com/myui/hivemall/tree/dev/yarnkit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (HIVEMALL-91) Implement Online LDA

2017-04-20 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-91.
--
Resolution: Fixed

> Implement Online LDA
> 
>
> Key: HIVEMALL-91
> URL: https://issues.apache.org/jira/browse/HIVEMALL-91
> Project: Hivemall
>  Issue Type: New Feature
>Reporter: Makoto Yui
>Assignee: Takuya Kitazawa
>
> Implement OnlineLDA [1,2].
> Online Learning for Latent Dirichlet Allocation
> [1] http://dl.acm.org/citation.cfm?id=2997285
> https://wellecks.wordpress.com/2014/10/26/ldaoverflow-with-online-lda/
> http://mlwave.com/tutorial-online-lda-with-vowpal-wabbit/
> https://github.com/miberk/jolda
> https://github.com/blei-lab/onlineldavb
> http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> Streaming LDA is improved version of online LDA.
> https://github.com/jessykate/streamLDA
> [2] http://kzhai.github.io/paper/2013_icml.pdf 
> Rush implementation
> https://github.com/NaokiStones/hivemall/tree/dev/lda



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA

2017-04-20 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/66
  
@takuti Merged w/ some refactoring. Great work! Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #66: [HIVEMALL-91] Implement Online LDA

2017-04-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/66


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA

2017-04-20 Thread takuti
Github user takuti commented on the issue:

https://github.com/apache/incubator-hivemall/pull/66
  
**Note on the performance**

For 
[news20-multiclass](https://github.com/apache/incubator-hivemall/tree/master/core/src/test/resources/hivemall/classifier)
 data, I have translated [our Java test 
case](https://github.com/takuti/incubator-hivemall/blob/709848d5626f0df7e7361511224e0e9284b3484d/core/src/test/java/hivemall/topicmodel/OnlineLDAModelTest.java#L147-L223)
 to [Python scikit-learn 
implementation](https://github.com/takuti-sandbox/tmp/blob/57f740a3d0283e5586cc2cd170a8dd15b9cf96ac/python/lda/news20.py)
 w/ (almost) same setting.

In our Java code, unit test finishes in **8 sec** w/ approximately 30 
iterations. By contrast, the Python implementation takes around **15 sec** for 
30 iterations. Thus, even if `train_lda()` takes very long time for large-scale 
data, it should be natural. Hopefully, larger `-delta`, smaller `-iteration` or 
smaller `-eps` option could reduce running time (and end up w/ poor results).

* Python code actually creates and handles a 20-by-62061 huge, sparse 
matrix. It might be unfair, but Java code alternatively has many inefficient 
Map and Array accesses.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA

2017-04-20 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/66
  

[![Coverage 
Status](https://coveralls.io/builds/11159512/badge)](https://coveralls.io/builds/11159512)

Coverage increased (+1.04%) to 38.063% when pulling 
**97adc5ce3d22e10e485c4f190b0a488db69d99e5 on takuti:lda** into 
**bba252ac10fccda022b630e3137460dd8d2f9302 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA

2017-04-20 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/66
  

[![Coverage 
Status](https://coveralls.io/builds/11159290/badge)](https://coveralls.io/builds/11159290)

Coverage increased (+1.3%) to 38.364% when pulling 
**d781b6602538577202fcb571b12b4ffd3e5ab92d on takuti:lda** into 
**bba252ac10fccda022b630e3137460dd8d2f9302 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---