Re: [DISCUSS] Looking to the future hivemall graduation

2017-03-03 Thread Edward Capriolo
On Fri, Mar 3, 2017 at 9:39 AM, Makoto Yui  wrote:

> Edward,
>
> Thank you for your suggestion. It's certainly an option for our
> project graduation.
> I'm discussing about it with other PPMC members of Hivemall.
>
> My concerns are
> 1) Hivemall is not only for Hive but also targets Spark and Pig as the
> runtime.
>  - It has some Spark Dataframe related features
>http://hivemall.incubator.apache.org/userguide/spark/
> misc/topk_join.html
> 2) Project management (e.g., release process) for a subproject.
>  - artifacts better to be separated to Hive (Separation of Concerns)
>  - It seems that Apache DB subproject are distinct ones.
>https://db.apache.org/newproject.html
>
> We are now on a very early stage in Apache incubation, planning the
> first release in early Q2.
> It might be too early to discuss but we welcome your suggestion.
>
> Thanks,
> Makoto
>
> 2017-03-03 15:15 GMT+09:00 Edward Capriolo :
> > Hivemall in the incubator has a fairly impressive set of features that do
> > machine learning directly from hive.
> >
> > http://hivemall.incubator.apache.org/overview.html
> > https://github.com/myui/hivemall/wiki/Logistic-
> regression-dataset-generation
> >
> > While we can not put the cart before the horse, i can imagine that upon
> > graduation hivemall would be a natural fit to become part of hive (maybe
> as
> > a  sub project).
> >
> > I could imagine we can setup like we did for hcat where we make a subtree
> > and give commit rights to the tree eventually converting those interested
> > in other parts of hive to hive committers as well.
> >
> > In any case hivemall devs, amazing work!
> >
> > Thanks,
> > Edward
>

Those are fair concerns. I can say this.

1) I believe this is not a large issue for us. I believe we have sub
modules that link to other things outside of hive for testing.

2) Our storage-api which lives inside hive source code is released
separately from hive

I understand that your graduation is far off, and when that happens you
will make the choice that is right for your project (toplevel, part of
hive, something else). I only wanted to say I would do my best to clear any
technical or organizational concerns you have if you decide that landing
Hive is the right course for you.


Re: [DISCUSS] Looking to the future hivemall graduation

2017-03-03 Thread Makoto Yui
Edward,

Thank you for your suggestion. It's certainly an option for our
project graduation.
I'm discussing about it with other PPMC members of Hivemall.

My concerns are
1) Hivemall is not only for Hive but also targets Spark and Pig as the runtime.
 - It has some Spark Dataframe related features
   http://hivemall.incubator.apache.org/userguide/spark/misc/topk_join.html
2) Project management (e.g., release process) for a subproject.
 - artifacts better to be separated to Hive (Separation of Concerns)
 - It seems that Apache DB subproject are distinct ones.
   https://db.apache.org/newproject.html

We are now on a very early stage in Apache incubation, planning the
first release in early Q2.
It might be too early to discuss but we welcome your suggestion.

Thanks,
Makoto

2017-03-03 15:15 GMT+09:00 Edward Capriolo :
> Hivemall in the incubator has a fairly impressive set of features that do
> machine learning directly from hive.
>
> http://hivemall.incubator.apache.org/overview.html
> https://github.com/myui/hivemall/wiki/Logistic-regression-dataset-generation
>
> While we can not put the cart before the horse, i can imagine that upon
> graduation hivemall would be a natural fit to become part of hive (maybe as
> a  sub project).
>
> I could imagine we can setup like we did for hcat where we make a subtree
> and give commit rights to the tree eventually converting those interested
> in other parts of hive to hive committers as well.
>
> In any case hivemall devs, amazing work!
>
> Thanks,
> Edward


[DISCUSS] Looking to the future hivemall graduation

2017-03-02 Thread Edward Capriolo
Hivemall in the incubator has a fairly impressive set of features that do
machine learning directly from hive.

http://hivemall.incubator.apache.org/overview.html
https://github.com/myui/hivemall/wiki/Logistic-regression-dataset-generation

While we can not put the cart before the horse, i can imagine that upon
graduation hivemall would be a natural fit to become part of hive (maybe as
a  sub project).

I could imagine we can setup like we did for hcat where we make a subtree
and give commit rights to the tree eventually converting those interested
in other parts of hive to hive committers as well.

In any case hivemall devs, amazing work!

Thanks,
Edward