[ANNOUNCE] Apache Hivemall 0.6.0-incubating
Hi all, The Apache Hivemall (incubating) project team is proud to announce Apache Hivemall 0.6.0-incubating has been released. Apache Hivemall is a scalable machine learning library implemented as Hive UDFs/UDAFs/UDTFs. Hivemall runs on Hadoop-based data processing frameworks, specifically on Apache Hive, Apache Spark, and Apache Pig. This is the 3rd Apache release as an Apache Incubator project and a major feature introduced in this release includes supports for XGBoost training/prediction. The release artifacts and ChangeLog can be found at: https://github.com/apache/incubator-hivemall/blob/master/ChangeLog.md Release artifacts in Maven Central: https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.hivemall%22%20AND%20v%3A%220.6.0-incubating%22 Find more about our project at: - Project Site: https://hivemall.incubator.apache.org/ - Github mirror: https://github.com/apache/incubator-hivemall - Mailing list(s): d...@hivemall.incubator.apache.org u...@hivemall.incubator.apache.org Thanks, Makoto on behalf of Apache Hivemall PPMC -- Makoto YUI Principal Engineer, Arm Treasure Data. http://myui.github.io/
Apache Hivemall v0.5.0 (incubating) release voting
Hi, We started Apache Hivemall v0.5.0-rc3 release voting in . https://www.mail-archive.com/general@incubator.apache.org/msg62599.html It's a Hive-related incubator project. We need three +1 votes from IPMC members for our Incubator release. Please join to the vote if interested. Thanks, Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/
Re: Roaring Bitmap UDFs
Approximate Counting using HLL+ is supported in Apache Hivemall. http://hivemall.incubator.apache.org/userguide/misc/approx.html FYI Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/
Re: FYI: Backports of Hive UDFs
Alan, Putting Hive backported UDFs to Hive branch-1 will cause dependencies to the specific Hive branch-1, the next stable release of v1.x. Artifact should be a distinct jar that only includes backported UDFs to use it in exiting Hive clusters. Better to support possibly all Hive versions since v0.13.0 or later. So, better to be a distinct Maven submodule. Edward, Gems-like dynamic plugin loading from Maven repository (or github repos by using jitpack.io) is possible by using Eclipse Aether but dynamic plugin/class loading involves security issues. https://stackoverflow.com/questions/35598239/load-maven-artifact-via-classloader https://github.com/treasure-data/digdag/tree/master/digdag-core/src/main/java/io/digdag/core/plugin Thanks, Makoto 2017-06-03 3:26 GMT+09:00 Edward Capriolo : > Don't we currently support features that load functions from external places > like maven http server etc? I wonder if it would be easier to back port that > back port a handful of functions ? > > On Fri, Jun 2, 2017 at 2:22 PM, Alan Gates wrote: >> >> Rather than put that code in hive/contrib I was thinking that you could >> just backport the Hive 2.2 UDFs into the same locations in Hive 1 branch. >> That seems better than putting them into different locations on different >> branches. >> >> If you are willing to do the porting and post the patches (including >> relevant unit tests so we know they work) I and other Hive committers can >> review the patches and commit them to branch-1. >> >> Alan. >> >> On Thu, Jun 1, 2017 at 6:36 PM, Makoto Yui wrote: >>> >>> That's would be a help for existing Hive users. >>> Welcome to put it into hive/contrib or something else. >>> >>> Minimum dependancies are hive 0.13.0 and hadoop 2.4.0. >>> It'll work for any Hive environment, version 0.13.0 or later. >>> https://github.com/myui/hive-udf-backports/blob/master/pom.xml#L49 >>> >>> Thanks, >>> Makoto >>> >>> -- >>> Makoto YUI >>> Research Engineer, Treasure Data, Inc. >>> http://myui.github.io/ >>> >>> 2017-06-02 2:24 GMT+09:00 Alan Gates : >>> > I'm curious why these can't be backported inside Hive. If someone is >>> > willing to do the work to do the backport we can check them into the >>> > Hive 1 >>> > branch. >>> > >>> > On Thu, Jun 1, 2017 at 1:44 AM, Makoto Yui wrote: >>> >> >>> >> Hi, >>> >> >>> >> I created a repository for backporting recent Hive UDFs (as of v2.2.0) >>> >> to legacy Hive environment (v0.13.0 or later). >>> >> >>> >>https://github.com/myui/hive-udf-backports >>> >> >>> >> Hope this helps for those who are using old Hive env :-( >>> >> >>> >> FYI >>> >> >>> >> Makoto >>> >> >>> >> -- >>> >> Makoto YUI >>> >> Research Engineer, Treasure Data, Inc. >>> >> http://myui.github.io/ >>> > >>> > >> >> > -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/
Re: FYI: Backports of Hive UDFs
That's would be a help for existing Hive users. Welcome to put it into hive/contrib or something else. Minimum dependancies are hive 0.13.0 and hadoop 2.4.0. It'll work for any Hive environment, version 0.13.0 or later. https://github.com/myui/hive-udf-backports/blob/master/pom.xml#L49 Thanks, Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/ 2017-06-02 2:24 GMT+09:00 Alan Gates : > I'm curious why these can't be backported inside Hive. If someone is > willing to do the work to do the backport we can check them into the Hive 1 > branch. > > On Thu, Jun 1, 2017 at 1:44 AM, Makoto Yui wrote: >> >> Hi, >> >> I created a repository for backporting recent Hive UDFs (as of v2.2.0) >> to legacy Hive environment (v0.13.0 or later). >> >>https://github.com/myui/hive-udf-backports >> >> Hope this helps for those who are using old Hive env :-( >> >> FYI >> >> Makoto >> >> -- >> Makoto YUI >> Research Engineer, Treasure Data, Inc. >> http://myui.github.io/ > >
FYI: Backports of Hive UDFs
Hi, I created a repository for backporting recent Hive UDFs (as of v2.2.0) to legacy Hive environment (v0.13.0 or later). https://github.com/myui/hive-udf-backports Hope this helps for those who are using old Hive env :-( FYI Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/
Re: Requesting write access to the Hive wiki
Lefty, Thanks! Makoto 2016-12-03 15:02 GMT+09:00 Lefty Leverenz : > You now have edit permissions. Welcome to the Hive wiki team, Makoto! > > -- Lefty > > > On Fri, Dec 2, 2016 at 9:33 PM, Makoto Yui wrote: >> >> Hi, >> >> I would like to edit Hive wiki page. >> https://cwiki.apache.org/confluence/display/Hive/RelatedProjects >> >> My ASF wiki account id is 'myui'. >> >> Thanks, >> Makoto > >
Requesting write access to the Hive wiki
Hi, I would like to edit Hive wiki page. https://cwiki.apache.org/confluence/display/Hive/RelatedProjects My ASF wiki account id is 'myui'. Thanks, Makoto
[ANN] Hivemall entered Apache Incubator
Hello all, I would like to announce that Hivemall project moved to Apache Incubator. Apache Hivemall is a scalable machine learning library that runs on Apache Hive, Apache Spark, and Apache Pig. http://hivemall.incubator.apache.org/ (project top page) http://www.slideshare.net/myui/dots20161029-myui/11 (latest slide) While it's mainly for machine learning, we welcome contributing your generic UDFs/UDAFs/UDTFs to our project. Fork us in https://github.com/apache/incubator-hivemall We plan the first apache release in Q1, 2017. BTW, I would like to edit the Hive's related project page [1]. Can someone please give me the permission to edit the Hive wiki pages? My ASF wiki account name is 'myui'. [1] https://cwiki.apache.org/confluence/display/Hive/RelatedProjects Thanks, Makoto
[ANN] Hivemall v0.4.0 is now available
Hello all, We released a newer version of Hivemall, v0.4.0. Hivemall provides machine learning functionality over Hive UDFs/UDAFs/UDTFs. Hivemall is easy to use because every machine learning step is done within HiveQL. https://github.com/myui/hivemall In the latest release (v0.4.0), we introduced o Factorization Machine classification/regression https://github.com/myui/hivemall/wiki/Movielens-Rating-Prediction-using-Factorization-Machine o RandomForest classification/regression https://github.com/myui/hivemall/wiki/Iris-multi-class-classification-using-RandomForest https://github.com/myui/hivemall/wiki/Kaggle-Titanic-binary-classification-using-Random-Forest Aside from machine learning, Hivemall also provides a UDTF to run Top-K processing efficiently on Apache Hive. https://github.com/myui/hivemall/wiki/Efficient-Top-k-computation-on-Apache-Hive-using-Hivemall-UDTF Hope you enjoy the release. Feedback and pull requests are welcome. Thanks, Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/
[ANN] Hivemall v0.3.2 is now available
Hello all, We released a newer version of Hivemall, v0.3.2. Hivemall provides machine learning functionality over Hive UDFs/UDAFs/UDTFs. Hivemall is easy to use because every machine learning step is done within HiveQL. https://github.com/myui/hivemall In the latest release (v0.3.2), we introduced o Anomaly Detection using Local Outlier Factor, and o Polynomial features that is useful for non-linear regression/classification. Anomaly Detection in Hivemall [1] is very easy to use. 1) Just prepare a table (e.g., a table containing sensor data) as follows. | rowid | features |---| -- | 1 | ["reflectance:0.5252967","specific_heat:0.19863537","weight:0.0"] | 2 | ["reflectance:0.5950446","specific_heat:0.09166764","weight:0.052084323"] | 3 | ["reflectance:0.6797837","specific_heat:0.12567581","weight:0.13255163"] | 4 | ... 2) Run a query to find top-K outliers. Then, you can get outlier candidates. | rowid | LOF value | - | - | 87 | 3.031143750623693 (<- rowid 87 is outlier is this case) | 16 | 1.975556449228491 | 1| 1.8415763677073722 Hope you enjoy the release! Feedback and pull requests are welcome. Last but not least, we have changed the license of Hivemall from LGPL v2 to Apache License v2 since v0.3.1. [1] https://github.com/myui/hivemall/wiki/Outlier-Detection-using-Local-Outlier-Factor Thanks, Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/
map-side join fails when a serialized table contains arrays
Hi, I got the attached error on a map-side join where a serialized table contains an array column. When setting map-side join off via setting hive.mapjoin.optimized.hashtable=false, exceptions do not occur. It seems that a wrong ObjectInspector was set at CommonJoinOperator#initializeOp. I am using Hive 1.0.0 (Tez 0.6) on Hadoop 2.6.0. I found a similar report at http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2 Is this a known issue/bug? Thanks, Makoto task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"gid":1,"userid":4422,"movieid":1213,"rating":5} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"gid":1,"userid":4422,"movieid":1213,"rating":5} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"gid":1,"userid":4422,"movieid":1213,"rating":5} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Unexpected exception: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:299)
[ANN] Hivemall v0.3 is now available
Hello all, We are excited to announce that a new stable version of Hivemall (v0.3.0) is now available. https://github.com/myui/hivemall/releases/tag/v0.3.0 Hivemall provides a collection of machine learning algorithms as Hive UDFs/UDAFs/UDTFs. The main enhancement in v0.3.0 is the support for matrix factorization. Hope you enjoy the release! Feedback and pull requests are welcome. Thanks, Makoto
[ANN] Hivemall (a scalable machine learning library for Apache Hive) v0.3-beta
Hello all, We have released a newer version of Hivemall, v0.3-beta2. Hivemall is an open-source implementation of a scalable machine learning that runs on Hive/Hadoop. https://github.com/myui/hivemall http://bit.ly/hivemall-hadoopsummit14 (slide at Hadoop Summit'14) Hivemall is easy to use if you have a Hive environment because every machine learning step is done within HiveQL. In the latest release (v0.3), we have supported the following state of the art convex optimization algorithms (please refer the project site for the complete list of supported algorithms): o AdaGrad o AdaGradRDA o AdaDelta Moreover, Hivemall v0.3 now supports parameter mixing for better stable/prediction performance and fast convergence of a learning process. https://github.com/myui/hivemall/wiki/How-to-use-Model-Mixing With the MIX protocol, distributed learners (run as distinct Hadoop tasks) communicate with each other by using an external communication support service. By using the MIX protocol (and Hivemall's amplifier method), iterations are no more mandatory and machine learning perfectly runs on the plain Hadoop/Hive. Hivemall runs on Tez as well. Hope you enjoy the release! Feedback and pull requests are welcome. Thanks, Makoto
Re: [ANN] Hivemall: Hive scalable machine learning library
Hi, I added support for the-state-of-the-art classifiers (those are not yet supported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall 0.1-rc3. Newly supported classifiers include - Confidence Weighted (CW) - Adaptive Regularization of Weight Vectors (AROW) - Soft Confidence Weighted (SCW1, SCW2) Those classifiers are much smart comparing to the standard SGD-based or passive aggressive classifiers. Please check it out by yourself. Thanks, Makoto (2013/10/11 4:28), Clark Yang (杨卓荦) wrote: I looks really cool, I think I will try it on. Cheers, Zhuoluo (Clark) Yang 2013/10/5 Makoto YUI mailto:yuin...@gmail.com>> Hi Edward, Thank you for your interst. Hivemall project does not have a plan to have a specific mailing list, I will answer following questions/comments on twitter or through Github issues (with a question label). BTW, I just added a CTR (Click-Through-Rate) prediction example that is provided by a commercial search engine provider for the KDDCup 2012 track 2. https://github.com/myui/__hivemall/wiki/KDDCup-2012-__track-2-CTR-prediction-dataset <https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset> I guess many of you working on ad CTR/CVR predictions. This example might be some help understanding how to do it only within Hive. Thanks, Makoto @myui (2013/10/04 23:02), Edward Capriolo wrote: Looks cool im already starting to play with it. On Friday, October 4, 2013, Makoto Yui mailto:yuin...@gmail.com> <mailto:yuin...@gmail.com <mailto:yuin...@gmail.com>>> wrote: > Hi Dean, > > Thank you for your interest in Hivemall. > > Twitter's paper actually influenced me in developing Hivemall and I > initially implemented such functionality as Pig UDFs. > > Though my Pig ML library is not released, you can find a similar > attempt for Pig in > https://github.com/y-tag/java-__pig-MyUDFs <https://github.com/y-tag/java-pig-MyUDFs> > > Thanks, > Makoto > > 2013/10/3 Dean Wampler mailto:deanwamp...@gmail.com> <mailto:deanwamp...@gmail.com <mailto:deanwamp...@gmail.com>>__>: >> This is great news! I know that Twitter has done something similar with UDFs >> for Pig, as described in this paper: >> http://www.umiacs.umd.edu/~__jimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf <http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf> <http://www.umiacs.umd.edu/%__7Ejimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf <http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>> >> >> I'm glad to see the same thing start with Hive. >> >> Dean >> >> >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI mailto:yuin...@gmail.com> <mailto:yuin...@gmail.com <mailto:yuin...@gmail.com>>> wrote: >>> >>> Hello all, >>> >>> My employer, AIST, has given the thumbs up to open source our machine >>> learning library, named Hivemall. >>> >>> Hivemall is a scalable machine learning library running on Hive/Hadoop, >>> licensed under the LGPL 2.1. >>> >>> https://github.com/myui/__hivemall <https://github.com/myui/hivemall> >>> >>> Hivemall provides machine learning functionality as well as feature >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed >>> to be scalable to the number of training instances as well as the number >>> of training features. >>> >>> Hivemall is very easy to use as every machine learning step is done >>> within HiveQL. >>> >>> -- Installation is just as follows: >>> add jar /tmp/hivemall.jar; >>> source /tmp/define-all.hive; >>> >>> -- Logistic regression is performed by a query. >>> SELECT >>> feature, >>> avg(weight) as weight >>> FROM >>> (SELECT logress(features,label) as (feature,weight) FROM >>> training_features) t >>&
Re: [ANN] Hivemall: Hive scalable machine learning library
Hi Edward, Thank you for your interst. Hivemall project does not have a plan to have a specific mailing list, I will answer following questions/comments on twitter or through Github issues (with a question label). BTW, I just added a CTR (Click-Through-Rate) prediction example that is provided by a commercial search engine provider for the KDDCup 2012 track 2. https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset I guess many of you working on ad CTR/CVR predictions. This example might be some help understanding how to do it only within Hive. Thanks, Makoto @myui (2013/10/04 23:02), Edward Capriolo wrote: Looks cool im already starting to play with it. On Friday, October 4, 2013, Makoto Yui mailto:yuin...@gmail.com>> wrote: > Hi Dean, > > Thank you for your interest in Hivemall. > > Twitter's paper actually influenced me in developing Hivemall and I > initially implemented such functionality as Pig UDFs. > > Though my Pig ML library is not released, you can find a similar > attempt for Pig in > https://github.com/y-tag/java-pig-MyUDFs > > Thanks, > Makoto > > 2013/10/3 Dean Wampler mailto:deanwamp...@gmail.com>>: >> This is great news! I know that Twitter has done something similar with UDFs >> for Pig, as described in this paper: >> http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf <http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf> >> >> I'm glad to see the same thing start with Hive. >> >> Dean >> >> >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI mailto:yuin...@gmail.com>> wrote: >>> >>> Hello all, >>> >>> My employer, AIST, has given the thumbs up to open source our machine >>> learning library, named Hivemall. >>> >>> Hivemall is a scalable machine learning library running on Hive/Hadoop, >>> licensed under the LGPL 2.1. >>> >>> https://github.com/myui/hivemall >>> >>> Hivemall provides machine learning functionality as well as feature >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed >>> to be scalable to the number of training instances as well as the number >>> of training features. >>> >>> Hivemall is very easy to use as every machine learning step is done >>> within HiveQL. >>> >>> -- Installation is just as follows: >>> add jar /tmp/hivemall.jar; >>> source /tmp/define-all.hive; >>> >>> -- Logistic regression is performed by a query. >>> SELECT >>> feature, >>> avg(weight) as weight >>> FROM >>> (SELECT logress(features,label) as (feature,weight) FROM >>> training_features) t >>> GROUP BY feature; >>> >>> You can find detailed examples on our wiki pages. >>> https://github.com/myui/hivemall/wiki/_pages >>> >>> Though we consider that Hivemall is much easier to use and more scalable >>> than Mahout for classification/regression tasks, please check it by >>> yourself. If you have a Hive environment, you can evaluate Hivemall >>> within 5 minutes or so. >>> >>> Hope you enjoy the release! Feedback (and pull request) is always welcome. >>> >>> Thank you, >>> Makoto >> >> >> >> >> -- >> Dean Wampler, Ph.D. >> @deanwampler >> http://polyglotprogramming.com >
Re: [ANN] Hivemall: Hive scalable machine learning library
Hi Dean, Thank you for your interest in Hivemall. Twitter's paper actually influenced me in developing Hivemall and I initially implemented such functionality as Pig UDFs. Though my Pig ML library is not released, you can find a similar attempt for Pig in https://github.com/y-tag/java-pig-MyUDFs Thanks, Makoto 2013/10/3 Dean Wampler : > This is great news! I know that Twitter has done something similar with UDFs > for Pig, as described in this paper: > http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf > > I'm glad to see the same thing start with Hive. > > Dean > > > On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI wrote: >> >> Hello all, >> >> My employer, AIST, has given the thumbs up to open source our machine >> learning library, named Hivemall. >> >> Hivemall is a scalable machine learning library running on Hive/Hadoop, >> licensed under the LGPL 2.1. >> >> https://github.com/myui/hivemall >> >> Hivemall provides machine learning functionality as well as feature >> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed >> to be scalable to the number of training instances as well as the number >> of training features. >> >> Hivemall is very easy to use as every machine learning step is done >> within HiveQL. >> >> -- Installation is just as follows: >> add jar /tmp/hivemall.jar; >> source /tmp/define-all.hive; >> >> -- Logistic regression is performed by a query. >> SELECT >> feature, >> avg(weight) as weight >> FROM >> (SELECT logress(features,label) as (feature,weight) FROM >> training_features) t >> GROUP BY feature; >> >> You can find detailed examples on our wiki pages. >> https://github.com/myui/hivemall/wiki/_pages >> >> Though we consider that Hivemall is much easier to use and more scalable >> than Mahout for classification/regression tasks, please check it by >> yourself. If you have a Hive environment, you can evaluate Hivemall >> within 5 minutes or so. >> >> Hope you enjoy the release! Feedback (and pull request) is always welcome. >> >> Thank you, >> Makoto > > > > > -- > Dean Wampler, Ph.D. > @deanwampler > http://polyglotprogramming.com
[ANN] Hivemall: Hive scalable machine learning library
Hello all, My employer, AIST, has given the thumbs up to open source our machine learning library, named Hivemall. Hivemall is a scalable machine learning library running on Hive/Hadoop, licensed under the LGPL 2.1. https://github.com/myui/hivemall Hivemall provides machine learning functionality as well as feature engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed to be scalable to the number of training instances as well as the number of training features. Hivemall is very easy to use as every machine learning step is done within HiveQL. -- Installation is just as follows: add jar /tmp/hivemall.jar; source /tmp/define-all.hive; -- Logistic regression is performed by a query. SELECT feature, avg(weight) as weight FROM (SELECT logress(features,label) as (feature,weight) FROM training_features) t GROUP BY feature; You can find detailed examples on our wiki pages. https://github.com/myui/hivemall/wiki/_pages Though we consider that Hivemall is much easier to use and more scalable than Mahout for classification/regression tasks, please check it by yourself. If you have a Hive environment, you can evaluate Hivemall within 5 minutes or so. Hope you enjoy the release! Feedback (and pull request) is always welcome. Thank you, Makoto