[jira] [Closed] (HIVEMALL-86) Change Hadoop version dependencies to v2.4.0
[ https://issues.apache.org/jira/browse/HIVEMALL-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Makoto Yui closed HIVEMALL-86. -- Resolution: Fixed Assignee: Makoto Yui > Change Hadoop version dependencies to v2.4.0 > > > Key: HIVEMALL-86 > URL: https://issues.apache.org/jira/browse/HIVEMALL-86 > Project: Hivemall > Issue Type: Improvement >Reporter: Makoto Yui >Assignee: Makoto Yui > > Change Hadoop version dependencies to v2.4.0 > For historical reasons, Hivemall depends on Hadoop 0.2.0.2-chd3u6 for > "provided" scope as follows: > {code} > $find . -type f | grep pom.xml | xargs grep cdh > ./core/pom.xml: 0.20.2-cdh3u6 > ./mixserv/pom.xml: 0.20.2-cdh3u6 > ./nlp/pom.xml: 0.20.2-cdh3u6 > ./spark/spark-common/pom.xml: > 0.20.2-cdh3u6 > {code} > Better to change the version dependencies to Hadoop v2.4.0 (not v2.6.x). > Then, dependencies packages change and careful verification is required. > This branch changed the dependencies to v2.4.0 > https://github.com/myui/hivemall/tree/dev/yarnkit -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (HIVEMALL-91) Implement Online LDA
[ https://issues.apache.org/jira/browse/HIVEMALL-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Makoto Yui closed HIVEMALL-91. -- Resolution: Fixed > Implement Online LDA > > > Key: HIVEMALL-91 > URL: https://issues.apache.org/jira/browse/HIVEMALL-91 > Project: Hivemall > Issue Type: New Feature >Reporter: Makoto Yui >Assignee: Takuya Kitazawa > > Implement OnlineLDA [1,2]. > Online Learning for Latent Dirichlet Allocation > [1] http://dl.acm.org/citation.cfm?id=2997285 > https://wellecks.wordpress.com/2014/10/26/ldaoverflow-with-online-lda/ > http://mlwave.com/tutorial-online-lda-with-vowpal-wabbit/ > https://github.com/miberk/jolda > https://github.com/blei-lab/onlineldavb > http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html > Streaming LDA is improved version of online LDA. > https://github.com/jessykate/streamLDA > [2] http://kzhai.github.io/paper/2013_icml.pdf > Rush implementation > https://github.com/NaokiStones/hivemall/tree/dev/lda -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/66 @takuti Merged w/ some refactoring. Great work! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #66: [HIVEMALL-91] Implement Online LDA
Github user asfgit closed the pull request at: https://github.com/apache/incubator-hivemall/pull/66 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA
Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/66 **Note on the performance** For [news20-multiclass](https://github.com/apache/incubator-hivemall/tree/master/core/src/test/resources/hivemall/classifier) data, I have translated [our Java test case](https://github.com/takuti/incubator-hivemall/blob/709848d5626f0df7e7361511224e0e9284b3484d/core/src/test/java/hivemall/topicmodel/OnlineLDAModelTest.java#L147-L223) to [Python scikit-learn implementation](https://github.com/takuti-sandbox/tmp/blob/57f740a3d0283e5586cc2cd170a8dd15b9cf96ac/python/lda/news20.py) w/ (almost) same setting. In our Java code, unit test finishes in **8 sec** w/ approximately 30 iterations. By contrast, the Python implementation takes around **15 sec** for 30 iterations. Thus, even if `train_lda()` takes very long time for large-scale data, it should be natural. Hopefully, larger `-delta`, smaller `-iteration` or smaller `-eps` option could reduce running time (and end up w/ poor results). * Python code actually creates and handles a 20-by-62061 huge, sparse matrix. It might be unfair, but Java code alternatively has many inefficient Map and Array accesses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA
Github user coveralls commented on the issue: https://github.com/apache/incubator-hivemall/pull/66 [![Coverage Status](https://coveralls.io/builds/11159512/badge)](https://coveralls.io/builds/11159512) Coverage increased (+1.04%) to 38.063% when pulling **97adc5ce3d22e10e485c4f190b0a488db69d99e5 on takuti:lda** into **bba252ac10fccda022b630e3137460dd8d2f9302 on apache:master**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #66: [HIVEMALL-91] Implement Online LDA
Github user coveralls commented on the issue: https://github.com/apache/incubator-hivemall/pull/66 [![Coverage Status](https://coveralls.io/builds/11159290/badge)](https://coveralls.io/builds/11159290) Coverage increased (+1.3%) to 38.364% when pulling **d781b6602538577202fcb571b12b4ffd3e5ab92d on takuti:lda** into **bba252ac10fccda022b630e3137460dd8d2f9302 on apache:master**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---