[ANNOUNCE] Apache Hivemall 0.6.0-incubating

2019-12-25 Thread Makoto Yui
Hi all,

The Apache Hivemall (incubating) project team is proud to announce
Apache Hivemall 0.6.0-incubating has been released.

Apache Hivemall is a scalable machine learning library implemented as
Hive UDFs/UDAFs/UDTFs. Hivemall runs on Hadoop-based data processing
frameworks, specifically on Apache Hive, Apache Spark, and Apache Pig.

This is the 3rd Apache release as an Apache Incubator project and
a major feature introduced in this release includes supports for
XGBoost training/prediction.

The release artifacts and ChangeLog can be found at:
https://github.com/apache/incubator-hivemall/blob/master/ChangeLog.md

Release artifacts in Maven Central:
 
https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.hivemall%22%20AND%20v%3A%220.6.0-incubating%22

Find more about our project at:
 - Project Site:  https://hivemall.incubator.apache.org/
 - Github mirror: https://github.com/apache/incubator-hivemall
 - Mailing list(s): d...@hivemall.incubator.apache.org
  u...@hivemall.incubator.apache.org

Thanks,
Makoto
on behalf of Apache Hivemall PPMC

-- 
Makoto YUI 
Principal Engineer, Arm Treasure Data.
http://myui.github.io/


Apache Hivemall v0.5.0 (incubating) release voting

2018-02-27 Thread Makoto Yui
Hi,

We started Apache Hivemall v0.5.0-rc3 release voting in
.

  https://www.mail-archive.com/general@incubator.apache.org/msg62599.html

It's a Hive-related incubator project.

We need three +1 votes from IPMC members for our Incubator release.
Please join to the vote if interested.

Thanks,
Makoto

-- 
Makoto YUI 
Research Engineer, Treasure Data, Inc.
http://myui.github.io/


Re: Roaring Bitmap UDFs

2018-02-05 Thread Makoto Yui
Approximate Counting using HLL+ is supported in Apache Hivemall.
http://hivemall.incubator.apache.org/userguide/misc/approx.html

FYI

Makoto

-- 
Makoto YUI 
Research Engineer, Treasure Data, Inc.
http://myui.github.io/


Re: FYI: Backports of Hive UDFs

2017-06-05 Thread Makoto Yui
Alan,

Putting Hive backported UDFs to Hive branch-1 will cause dependencies
to the specific Hive branch-1, the next stable release of v1.x.
Artifact should be a distinct jar that only includes backported UDFs
to use it in exiting Hive clusters.

Better to support possibly all Hive versions since v0.13.0 or later.
So, better to be a distinct Maven submodule.

Edward,

Gems-like dynamic plugin loading from Maven repository (or github
repos by using jitpack.io) is possible by using Eclipse Aether but
dynamic plugin/class loading involves security issues.
https://stackoverflow.com/questions/35598239/load-maven-artifact-via-classloader
https://github.com/treasure-data/digdag/tree/master/digdag-core/src/main/java/io/digdag/core/plugin

Thanks,
Makoto

2017-06-03 3:26 GMT+09:00 Edward Capriolo :
> Don't we currently support features that load functions from external places
> like maven http server etc? I wonder if it would be easier to back port that
> back port a handful of functions ?
>
> On Fri, Jun 2, 2017 at 2:22 PM, Alan Gates  wrote:
>>
>> Rather than put that code in hive/contrib I was thinking that you could
>> just backport the Hive 2.2 UDFs into the same locations in Hive 1 branch.
>> That seems better than putting them into different locations on different
>> branches.
>>
>> If you are willing to do the porting and post the patches (including
>> relevant unit tests so we know they work) I and other Hive committers can
>> review the patches and commit them to branch-1.
>>
>> Alan.
>>
>> On Thu, Jun 1, 2017 at 6:36 PM, Makoto Yui  wrote:
>>>
>>> That's would be a help for existing Hive users.
>>> Welcome to put it into hive/contrib or something else.
>>>
>>> Minimum dependancies are hive 0.13.0 and hadoop 2.4.0.
>>> It'll work for any Hive environment, version 0.13.0 or later.
>>> https://github.com/myui/hive-udf-backports/blob/master/pom.xml#L49
>>>
>>> Thanks,
>>> Makoto
>>>
>>> --
>>> Makoto YUI 
>>> Research Engineer, Treasure Data, Inc.
>>> http://myui.github.io/
>>>
>>> 2017-06-02 2:24 GMT+09:00 Alan Gates :
>>> > I'm curious why these can't be backported inside Hive.  If someone is
>>> > willing to do the work to do the backport we can check them into the
>>> > Hive 1
>>> > branch.
>>> >
>>> > On Thu, Jun 1, 2017 at 1:44 AM, Makoto Yui  wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I created a repository for backporting recent Hive UDFs (as of v2.2.0)
>>> >> to legacy Hive environment (v0.13.0 or later).
>>> >>
>>> >>https://github.com/myui/hive-udf-backports
>>> >>
>>> >> Hope this helps for those who are using old Hive env :-(
>>> >>
>>> >> FYI
>>> >>
>>> >> Makoto
>>> >>
>>> >> --
>>> >> Makoto YUI 
>>> >> Research Engineer, Treasure Data, Inc.
>>> >> http://myui.github.io/
>>> >
>>> >
>>
>>
>



-- 
Makoto YUI 
Research Engineer, Treasure Data, Inc.
http://myui.github.io/


Re: FYI: Backports of Hive UDFs

2017-06-01 Thread Makoto Yui
That's would be a help for existing Hive users.
Welcome to put it into hive/contrib or something else.

Minimum dependancies are hive 0.13.0 and hadoop 2.4.0.
It'll work for any Hive environment, version 0.13.0 or later.
https://github.com/myui/hive-udf-backports/blob/master/pom.xml#L49

Thanks,
Makoto

-- 
Makoto YUI 
Research Engineer, Treasure Data, Inc.
http://myui.github.io/

2017-06-02 2:24 GMT+09:00 Alan Gates :
> I'm curious why these can't be backported inside Hive.  If someone is
> willing to do the work to do the backport we can check them into the Hive 1
> branch.
>
> On Thu, Jun 1, 2017 at 1:44 AM, Makoto Yui  wrote:
>>
>> Hi,
>>
>> I created a repository for backporting recent Hive UDFs (as of v2.2.0)
>> to legacy Hive environment (v0.13.0 or later).
>>
>>https://github.com/myui/hive-udf-backports
>>
>> Hope this helps for those who are using old Hive env :-(
>>
>> FYI
>>
>> Makoto
>>
>> --
>> Makoto YUI 
>> Research Engineer, Treasure Data, Inc.
>> http://myui.github.io/
>
>


FYI: Backports of Hive UDFs

2017-06-01 Thread Makoto Yui
Hi,

I created a repository for backporting recent Hive UDFs (as of v2.2.0)
to legacy Hive environment (v0.13.0 or later).

   https://github.com/myui/hive-udf-backports

Hope this helps for those who are using old Hive env :-(

FYI

Makoto

-- 
Makoto YUI 
Research Engineer, Treasure Data, Inc.
http://myui.github.io/


Re: Requesting write access to the Hive wiki

2016-12-03 Thread Makoto Yui
Lefty,

Thanks!

Makoto

2016-12-03 15:02 GMT+09:00 Lefty Leverenz :
> You now have edit permissions.  Welcome to the Hive wiki team, Makoto!
>
> -- Lefty
>
>
> On Fri, Dec 2, 2016 at 9:33 PM, Makoto Yui  wrote:
>>
>> Hi,
>>
>> I would like to edit Hive wiki page.
>> https://cwiki.apache.org/confluence/display/Hive/RelatedProjects
>>
>> My ASF wiki account id is 'myui'.
>>
>> Thanks,
>> Makoto
>
>


Requesting write access to the Hive wiki

2016-12-02 Thread Makoto Yui
Hi,

I would like to edit Hive wiki page.
https://cwiki.apache.org/confluence/display/Hive/RelatedProjects

My ASF wiki account id is 'myui'.

Thanks,
Makoto


[ANN] Hivemall entered Apache Incubator

2016-12-01 Thread Makoto Yui
Hello all,

I would like to announce that Hivemall project moved to Apache Incubator.

Apache Hivemall is a scalable machine learning library that runs on
Apache Hive, Apache Spark, and Apache Pig.

  http://hivemall.incubator.apache.org/ (project top page)
  http://www.slideshare.net/myui/dots20161029-myui/11 (latest slide)

While it's mainly for machine learning, we welcome contributing your
generic UDFs/UDAFs/UDTFs to our project. Fork us in

  https://github.com/apache/incubator-hivemall

We plan the first apache release in Q1, 2017.

BTW, I would like to edit the Hive's related project page [1].
Can someone please give me the permission to edit the Hive wiki pages?
My ASF wiki account name is 'myui'.

[1] https://cwiki.apache.org/confluence/display/Hive/RelatedProjects

Thanks,
Makoto


[ANN] Hivemall v0.4.0 is now available

2015-11-19 Thread Makoto Yui
Hello all,

We released a newer version of Hivemall, v0.4.0.

Hivemall provides machine learning functionality over Hive UDFs/UDAFs/UDTFs.
Hivemall is easy to use because every machine learning step is done
within HiveQL.

   https://github.com/myui/hivemall

In the latest release (v0.4.0), we introduced

   o Factorization Machine classification/regression
  
https://github.com/myui/hivemall/wiki/Movielens-Rating-Prediction-using-Factorization-Machine
   o RandomForest classification/regression
  
https://github.com/myui/hivemall/wiki/Iris-multi-class-classification-using-RandomForest
  
https://github.com/myui/hivemall/wiki/Kaggle-Titanic-binary-classification-using-Random-Forest

Aside from machine learning, Hivemall also provides a UDTF to run
Top-K processing efficiently on Apache Hive.
https://github.com/myui/hivemall/wiki/Efficient-Top-k-computation-on-Apache-Hive-using-Hivemall-UDTF

Hope you enjoy the release. Feedback and pull requests are welcome.

Thanks,
Makoto

--
Makoto YUI
Research Engineer, Treasure Data, Inc.
http://myui.github.io/


[ANN] Hivemall v0.3.2 is now available

2015-06-10 Thread Makoto Yui
Hello all,

We released a newer version of Hivemall, v0.3.2.

Hivemall provides machine learning functionality over Hive UDFs/UDAFs/UDTFs.
Hivemall is easy to use because every machine learning step is done
within HiveQL.

   https://github.com/myui/hivemall

In the latest release (v0.3.2), we introduced

   o Anomaly Detection using Local Outlier Factor, and
   o Polynomial features that is useful for non-linear
regression/classification.

Anomaly Detection in Hivemall [1] is very easy to use.

1) Just prepare a table (e.g., a table containing sensor data) as follows.

| rowid | features
|---| --
| 1 | ["reflectance:0.5252967","specific_heat:0.19863537","weight:0.0"]
| 2 | 
["reflectance:0.5950446","specific_heat:0.09166764","weight:0.052084323"]
| 3 | 
["reflectance:0.6797837","specific_heat:0.12567581","weight:0.13255163"]
| 4 | ...

2) Run a query to find top-K outliers. Then, you can get outlier candidates.

| rowid | LOF value
| - | -
|  87   | 3.031143750623693  (<- rowid 87 is outlier is this case)
|  16   | 1.975556449228491
|  1| 1.8415763677073722

Hope you enjoy the release! Feedback and pull requests are welcome.

Last but not least, we have changed the license of Hivemall from LGPL
v2 to Apache License v2 since v0.3.1.

[1] 
https://github.com/myui/hivemall/wiki/Outlier-Detection-using-Local-Outlier-Factor

Thanks,
Makoto

--
Makoto YUI
Research Engineer, Treasure Data, Inc.
http://myui.github.io/


map-side join fails when a serialized table contains arrays

2015-03-02 Thread Makoto Yui
Hi,

I got the attached error on a map-side join where a serialized table
contains an array column.

When setting map-side join off via setting
hive.mapjoin.optimized.hashtable=false, exceptions do not occur.

It seems that a wrong ObjectInspector was set at
CommonJoinOperator#initializeOp.

I am using Hive 1.0.0 (Tez 0.6) on Hadoop 2.6.0.

I found a similar report at
http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2


Is this a known issue/bug?

Thanks,
Makoto


task:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {"gid":1,"userid":4422,"movieid":1213,"rating":5}
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {"gid":1,"userid":4422,"movieid":1213,"rating":5}
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row
{"gid":1,"userid":4422,"movieid":1213,"rating":5}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
exception: Unexpected exception:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast
to [Ljava.lang.Object;
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
exception: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray
cannot be cast to [Ljava.lang.Object;
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:299)
 

[ANN] Hivemall v0.3 is now available

2015-02-06 Thread Makoto Yui
Hello all,

We are excited to announce that a new stable version of Hivemall
(v0.3.0) is now available.

https://github.com/myui/hivemall/releases/tag/v0.3.0

Hivemall provides a collection of machine learning algorithms as Hive
UDFs/UDAFs/UDTFs.

The main enhancement in v0.3.0 is the support for matrix factorization.

Hope you enjoy the release! Feedback and pull requests are welcome.

Thanks,
Makoto


[ANN] Hivemall (a scalable machine learning library for Apache Hive) v0.3-beta

2014-09-10 Thread Makoto Yui
Hello all,

We have released a newer version of Hivemall, v0.3-beta2.

Hivemall is an open-source implementation of a scalable machine learning
that runs on Hive/Hadoop.

  https://github.com/myui/hivemall
  http://bit.ly/hivemall-hadoopsummit14 (slide at Hadoop Summit'14)

Hivemall is easy to use if you have a Hive environment because every
machine learning step is done within HiveQL.

In the latest release (v0.3), we have supported the following state of
the art convex optimization algorithms (please refer the project site
for the complete list of supported algorithms):

  o AdaGrad
  o AdaGradRDA
  o AdaDelta

Moreover, Hivemall v0.3 now supports parameter mixing for better
stable/prediction performance and fast convergence of a learning process.
https://github.com/myui/hivemall/wiki/How-to-use-Model-Mixing

With the MIX protocol, distributed learners (run as distinct Hadoop
tasks) communicate with each other by using an external communication
support service.

By using the MIX protocol (and Hivemall's amplifier method), iterations
are no more mandatory and machine learning perfectly runs on the plain
Hadoop/Hive. Hivemall runs on Tez as well.


Hope you enjoy the release! Feedback and pull requests are welcome.

Thanks,
Makoto


Re: [ANN] Hivemall: Hive scalable machine learning library

2013-10-11 Thread Makoto YUI

Hi,

I added support for the-state-of-the-art classifiers (those are not yet 
supported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall 
0.1-rc3.


Newly supported classifiers include
- Confidence Weighted (CW)
- Adaptive Regularization of Weight Vectors (AROW)
- Soft Confidence Weighted (SCW1, SCW2)

Those classifiers are much smart comparing to the standard SGD-based or 
passive aggressive classifiers. Please check it out by yourself.


Thanks,
Makoto

(2013/10/11 4:28), Clark Yang (杨卓荦) wrote:

I looks really cool, I think I will try it on.

Cheers,
Zhuoluo (Clark) Yang


2013/10/5 Makoto YUI mailto:yuin...@gmail.com>>

Hi Edward,

Thank you for your interst.

Hivemall project does not have a plan to have a specific mailing
list, I will answer following questions/comments on twitter or
through Github issues (with a question label).

BTW, I just added a CTR (Click-Through-Rate) prediction example that is
provided by a commercial search engine provider for the KDDCup 2012
track 2.

https://github.com/myui/__hivemall/wiki/KDDCup-2012-__track-2-CTR-prediction-dataset

<https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset>

I guess many of you working on ad CTR/CVR predictions. This example
might be some help understanding how to do it only within Hive.

Thanks,
Makoto @myui


(2013/10/04 23:02), Edward Capriolo wrote:

Looks cool im already starting to play with it.

On Friday, October 4, 2013, Makoto Yui mailto:yuin...@gmail.com>
<mailto:yuin...@gmail.com <mailto:yuin...@gmail.com>>> wrote:
  > Hi Dean,
  >
  > Thank you for your interest in Hivemall.
  >
  > Twitter's paper actually influenced me in developing
Hivemall and I
  > initially implemented such functionality as Pig UDFs.
  >
  > Though my Pig ML library is not released, you can find a similar
  > attempt for Pig in
  > https://github.com/y-tag/java-__pig-MyUDFs
<https://github.com/y-tag/java-pig-MyUDFs>
  >
  > Thanks,
  > Makoto
  >
  > 2013/10/3 Dean Wampler mailto:deanwamp...@gmail.com>
<mailto:deanwamp...@gmail.com <mailto:deanwamp...@gmail.com>>__>:

  >> This is great news! I know that Twitter has done something
similar
with UDFs
  >> for Pig, as described in this paper:
  >>

http://www.umiacs.umd.edu/~__jimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf

<http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>

<http://www.umiacs.umd.edu/%__7Ejimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf

<http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>>

  >>
  >> I'm glad to see the same thing start with Hive.
  >>
  >> Dean
  >>
  >>
  >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI
mailto:yuin...@gmail.com>
<mailto:yuin...@gmail.com <mailto:yuin...@gmail.com>>> wrote:
  >>>
  >>> Hello all,
  >>>
  >>> My employer, AIST, has given the thumbs up to open source
our machine
  >>> learning library, named Hivemall.
  >>>
  >>> Hivemall is a scalable machine learning library running on
Hive/Hadoop,
  >>> licensed under the LGPL 2.1.
  >>>
  >>> https://github.com/myui/__hivemall
<https://github.com/myui/hivemall>
  >>>
  >>> Hivemall provides machine learning functionality as well
as feature
  >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It
is designed
  >>> to be scalable to the number of training instances as well
as the
number
  >>> of training features.
  >>>
  >>> Hivemall is very easy to use as every machine learning
step is done
  >>> within HiveQL.
  >>>
  >>> -- Installation is just as follows:
  >>> add jar /tmp/hivemall.jar;
  >>> source /tmp/define-all.hive;
  >>>
  >>> -- Logistic regression is performed by a query.
  >>> SELECT
  >>>   feature,
  >>>   avg(weight) as weight
  >>> FROM
  >>>  (SELECT logress(features,label) as (feature,weight) FROM
  >>> training_features) t
  >>&

Re: [ANN] Hivemall: Hive scalable machine learning library

2013-10-04 Thread Makoto YUI

Hi Edward,

Thank you for your interst.

Hivemall project does not have a plan to have a specific mailing list, I 
will answer following questions/comments on twitter or through Github 
issues (with a question label).


BTW, I just added a CTR (Click-Through-Rate) prediction example that is
provided by a commercial search engine provider for the KDDCup 2012 
track 2.

https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset

I guess many of you working on ad CTR/CVR predictions. This example 
might be some help understanding how to do it only within Hive.


Thanks,
Makoto @myui

(2013/10/04 23:02), Edward Capriolo wrote:

Looks cool im already starting to play with it.

On Friday, October 4, 2013, Makoto Yui mailto:yuin...@gmail.com>> wrote:
 > Hi Dean,
 >
 > Thank you for your interest in Hivemall.
 >
 > Twitter's paper actually influenced me in developing Hivemall and I
 > initially implemented such functionality as Pig UDFs.
 >
 > Though my Pig ML library is not released, you can find a similar
 > attempt for Pig in
 > https://github.com/y-tag/java-pig-MyUDFs
 >
 > Thanks,
 > Makoto
 >
 > 2013/10/3 Dean Wampler mailto:deanwamp...@gmail.com>>:
 >> This is great news! I know that Twitter has done something similar
with UDFs
 >> for Pig, as described in this paper:
 >>
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf 
<http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>
 >>
 >> I'm glad to see the same thing start with Hive.
 >>
 >> Dean
 >>
 >>
 >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI mailto:yuin...@gmail.com>> wrote:
 >>>
 >>> Hello all,
 >>>
 >>> My employer, AIST, has given the thumbs up to open source our machine
 >>> learning library, named Hivemall.
 >>>
 >>> Hivemall is a scalable machine learning library running on Hive/Hadoop,
 >>> licensed under the LGPL 2.1.
 >>>
 >>> https://github.com/myui/hivemall
 >>>
 >>> Hivemall provides machine learning functionality as well as feature
 >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
 >>> to be scalable to the number of training instances as well as the
number
 >>> of training features.
 >>>
 >>> Hivemall is very easy to use as every machine learning step is done
 >>> within HiveQL.
 >>>
 >>> -- Installation is just as follows:
 >>> add jar /tmp/hivemall.jar;
 >>> source /tmp/define-all.hive;
 >>>
 >>> -- Logistic regression is performed by a query.
 >>> SELECT
 >>>   feature,
 >>>   avg(weight) as weight
 >>> FROM
 >>>  (SELECT logress(features,label) as (feature,weight) FROM
 >>> training_features) t
 >>> GROUP BY feature;
 >>>
 >>> You can find detailed examples on our wiki pages.
 >>> https://github.com/myui/hivemall/wiki/_pages
 >>>
 >>> Though we consider that Hivemall is much easier to use and more
scalable
 >>> than Mahout for classification/regression tasks, please check it by
 >>> yourself. If you have a Hive environment, you can evaluate Hivemall
 >>> within 5 minutes or so.
 >>>
 >>> Hope you enjoy the release! Feedback (and pull request) is always
welcome.
 >>>
 >>> Thank you,
 >>> Makoto
 >>
 >>
 >>
 >>
 >> --
 >> Dean Wampler, Ph.D.
 >> @deanwampler
 >> http://polyglotprogramming.com
 >




Re: [ANN] Hivemall: Hive scalable machine learning library

2013-10-03 Thread Makoto Yui
Hi Dean,

Thank you for your interest in Hivemall.

Twitter's paper actually influenced me in developing Hivemall and I
initially implemented such functionality as Pig UDFs.

Though my Pig ML library is not released, you can find a similar
attempt for Pig in
https://github.com/y-tag/java-pig-MyUDFs

Thanks,
Makoto

2013/10/3 Dean Wampler :
> This is great news! I know that Twitter has done something similar with UDFs
> for Pig, as described in this paper:
> http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
>
> I'm glad to see the same thing start with Hive.
>
> Dean
>
>
> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI  wrote:
>>
>> Hello all,
>>
>> My employer, AIST, has given the thumbs up to open source our machine
>> learning library, named Hivemall.
>>
>> Hivemall is a scalable machine learning library running on Hive/Hadoop,
>> licensed under the LGPL 2.1.
>>
>>   https://github.com/myui/hivemall
>>
>> Hivemall provides machine learning functionality as well as feature
>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
>> to be scalable to the number of training instances as well as the number
>> of training features.
>>
>> Hivemall is very easy to use as every machine learning step is done
>> within HiveQL.
>>
>> -- Installation is just as follows:
>> add jar /tmp/hivemall.jar;
>> source /tmp/define-all.hive;
>>
>> -- Logistic regression is performed by a query.
>> SELECT
>>   feature,
>>   avg(weight) as weight
>> FROM
>>  (SELECT logress(features,label) as (feature,weight) FROM
>> training_features) t
>> GROUP BY feature;
>>
>> You can find detailed examples on our wiki pages.
>> https://github.com/myui/hivemall/wiki/_pages
>>
>> Though we consider that Hivemall is much easier to use and more scalable
>> than Mahout for classification/regression tasks, please check it by
>> yourself. If you have a Hive environment, you can evaluate Hivemall
>> within 5 minutes or so.
>>
>> Hope you enjoy the release! Feedback (and pull request) is always welcome.
>>
>> Thank you,
>> Makoto
>
>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com


[ANN] Hivemall: Hive scalable machine learning library

2013-10-02 Thread Makoto YUI
Hello all,

My employer, AIST, has given the thumbs up to open source our machine
learning library, named Hivemall.

Hivemall is a scalable machine learning library running on Hive/Hadoop,
licensed under the LGPL 2.1.

  https://github.com/myui/hivemall

Hivemall provides machine learning functionality as well as feature
engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
to be scalable to the number of training instances as well as the number
of training features.

Hivemall is very easy to use as every machine learning step is done
within HiveQL.

-- Installation is just as follows:
add jar /tmp/hivemall.jar;
source /tmp/define-all.hive;

-- Logistic regression is performed by a query.
SELECT
  feature,
  avg(weight) as weight
FROM
 (SELECT logress(features,label) as (feature,weight) FROM
training_features) t
GROUP BY feature;

You can find detailed examples on our wiki pages.
https://github.com/myui/hivemall/wiki/_pages

Though we consider that Hivemall is much easier to use and more scalable
than Mahout for classification/regression tasks, please check it by
yourself. If you have a Hive environment, you can evaluate Hivemall
within 5 minutes or so.

Hope you enjoy the release! Feedback (and pull request) is always welcome.

Thank you,
Makoto