Re: No JVM - new Runner?

2018-07-17 Thread Austin Bennett
Hi Henning,

Helped me unearth: https://s.apache.org/beam-job-api

and can dig into:
https://github.com/apache/beam/tree/master/model

Additionally, see embedded ->


On Tue, Jul 17, 2018, 12:02 PM Henning Rohde  wrote:

> There are essentially 2 complementary portability API surfaces that you'd
> need to implement: job management incl. job submission and execution as
> well as some worker deployment plumbing specific to the runner. Note that
> the source of truth is the model protos -- the design docs linked from
> https://beam.apache.org/contribute/portability/ and (even more so) the
> website guides are not always up to date.
>

Sounds about right :-)

Haven't done anything with grpc/proto, so gotta begin digging through that.




>
> Currently, all runners are in Java and share numerous components and
> utilities. A non-JVM runner would have to build all that from scratch --
> although, as you mention, if you're using Go or Python the corresponding
> SDKs likely have many pieces that can be reused. A minor potential hiccup
> is that gRPC/protobuf is not natively supported everywhere, so you may end
> up interoperating with the C versions of the libraries if you pick a
> non-supported language. A separate challenge regardless of the language
> is how directly the Beam model and primitives map to the engine.
>
> All that said, I think it's definitely feasible to do something
> interesting. Are you specifically thinking of a Go Wallaroo runner?
>

That was my initial thought (go based runner, given likely ease of some
compatibility as well as potential wider reusability), though certainly
things to consider (like using Pony Lang, given that's the core of
Wallaroo).  Much of this is beyond what I'd have enough sense to be able to
implement on my own, so much is homework for me, and eventually rely on
both runner and Beam communities for guidance should this be well received
all around and if it is ever to come to fruition.  Really, much much HW for
me at this point!

Ultimately, easiest for me to rely on Beam's Python SDK for my ML focused
workflows (when writing beam -- though eagerly awaiting full py3 support,
so potentially a more impactful immediate area to contribute), so there's
also that compatibility too.

Thanks,
>  Henning
>

Thanks!
Austin



>
> On Tue, Jul 17, 2018 at 9:26 AM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Sweet; that led me to:  https://beam.apache.org/contr
>> ibute/runner-guide/#the-runner-api (which I can't believe I missed).
>>
>>
>>
>> On Tue, Jul 17, 2018 at 9:21 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Austin,
>>>
>>> If your runner provide the gRPC portabality layer (allowing any SDK to
>>> "interact" with the runner), it will work no matter how the runner is
>>> implemented (JVM or not).
>>>
>>> However, it means that you will have to mimic the Runner API for the
>>> translation.
>>>
>>> Regards
>>> JB
>>>
>>> On 17/07/2018 18:19, Austin Bennett wrote:
>>> > Hi Beam Devs,
>>> >
>>> > I still don't quite understand:
>>> >
>>> > "Apache Beam provides a portable API layer for building sophisticated
>>> > data-parallel processing pipelines that may be executed across a
>>> > diversity of execution engines, or /runners/."
>>> >
>>> > (from https://beam.apache.org/documentation/runners/capability-matrix/
>>> )
>>> >
>>> > And specifically, close reading
>>> > of: https://beam.apache.org/contribute/portability/
>>> >
>>> > What if I'd like to implement a runner that is non-JVM?  Though would
>>> > leverage the Python and Go SDKs?  Specifically, thinking of:
>>> >  https://www.wallaroolabs.com (I am out in NY meeting with friends
>>> there
>>> > later this week, and wanted to get a sense of, feasibility, work
>>> > involved, etc -- to propose that we add a new Wallaroo runner).
>>> >
>>> > Is there a way to keep java out of the mix completely and still work
>>> > with Beam on a non JVM runner (seems maybe eventually, but what about
>>> > currently/near future)?
>>> >
>>> > Any input, thoughts, ideas, other pages or info to explore -- all
>>> > appreciated; thanks!
>>> > Austin
>>> >
>>> >
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>


Re: No JVM - new Runner?

2018-07-17 Thread Henning Rohde
There are essentially 2 complementary portability API surfaces that you'd
need to implement: job management incl. job submission and execution as
well as some worker deployment plumbing specific to the runner. Note that
the source of truth is the model protos -- the design docs linked from
https://beam.apache.org/contribute/portability/ and (even more so) the
website guides are not always up to date.

Currently, all runners are in Java and share numerous components and
utilities. A non-JVM runner would have to build all that from scratch --
although, as you mention, if you're using Go or Python the corresponding
SDKs likely have many pieces that can be reused. A minor potential hiccup
is that gRPC/protobuf is not natively supported everywhere, so you may end
up interoperating with the C versions of the libraries if you pick a
non-supported language. A separate challenge regardless of the language is
how directly the Beam model and primitives map to the engine.

All that said, I think it's definitely feasible to do something
interesting. Are you specifically thinking of a Go Wallaroo runner?

Thanks,
 Henning

On Tue, Jul 17, 2018 at 9:26 AM Austin Bennett 
wrote:

> Sweet; that led me to:
> https://beam.apache.org/contribute/runner-guide/#the-runner-api (which I
> can't believe I missed).
>
>
>
> On Tue, Jul 17, 2018 at 9:21 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Austin,
>>
>> If your runner provide the gRPC portabality layer (allowing any SDK to
>> "interact" with the runner), it will work no matter how the runner is
>> implemented (JVM or not).
>>
>> However, it means that you will have to mimic the Runner API for the
>> translation.
>>
>> Regards
>> JB
>>
>> On 17/07/2018 18:19, Austin Bennett wrote:
>> > Hi Beam Devs,
>> >
>> > I still don't quite understand:
>> >
>> > "Apache Beam provides a portable API layer for building sophisticated
>> > data-parallel processing pipelines that may be executed across a
>> > diversity of execution engines, or /runners/."
>> >
>> > (from https://beam.apache.org/documentation/runners/capability-matrix/)
>> >
>> > And specifically, close reading
>> > of: https://beam.apache.org/contribute/portability/
>> >
>> > What if I'd like to implement a runner that is non-JVM?  Though would
>> > leverage the Python and Go SDKs?  Specifically, thinking of:
>> >  https://www.wallaroolabs.com (I am out in NY meeting with friends
>> there
>> > later this week, and wanted to get a sense of, feasibility, work
>> > involved, etc -- to propose that we add a new Wallaroo runner).
>> >
>> > Is there a way to keep java out of the mix completely and still work
>> > with Beam on a non JVM runner (seems maybe eventually, but what about
>> > currently/near future)?
>> >
>> > Any input, thoughts, ideas, other pages or info to explore -- all
>> > appreciated; thanks!
>> > Austin
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


Re: No JVM - new Runner?

2018-07-17 Thread Austin Bennett
Sweet; that led me to:
https://beam.apache.org/contribute/runner-guide/#the-runner-api (which I
can't believe I missed).



On Tue, Jul 17, 2018 at 9:21 AM, Jean-Baptiste Onofré 
wrote:

> Hi Austin,
>
> If your runner provide the gRPC portabality layer (allowing any SDK to
> "interact" with the runner), it will work no matter how the runner is
> implemented (JVM or not).
>
> However, it means that you will have to mimic the Runner API for the
> translation.
>
> Regards
> JB
>
> On 17/07/2018 18:19, Austin Bennett wrote:
> > Hi Beam Devs,
> >
> > I still don't quite understand:
> >
> > "Apache Beam provides a portable API layer for building sophisticated
> > data-parallel processing pipelines that may be executed across a
> > diversity of execution engines, or /runners/."
> >
> > (from https://beam.apache.org/documentation/runners/capability-matrix/)
> >
> > And specifically, close reading
> > of: https://beam.apache.org/contribute/portability/
> >
> > What if I'd like to implement a runner that is non-JVM?  Though would
> > leverage the Python and Go SDKs?  Specifically, thinking of:
> >  https://www.wallaroolabs.com (I am out in NY meeting with friends there
> > later this week, and wanted to get a sense of, feasibility, work
> > involved, etc -- to propose that we add a new Wallaroo runner).
> >
> > Is there a way to keep java out of the mix completely and still work
> > with Beam on a non JVM runner (seems maybe eventually, but what about
> > currently/near future)?
> >
> > Any input, thoughts, ideas, other pages or info to explore -- all
> > appreciated; thanks!
> > Austin
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: No JVM - new Runner?

2018-07-17 Thread Jean-Baptiste Onofré
Hi Austin,

If your runner provide the gRPC portabality layer (allowing any SDK to
"interact" with the runner), it will work no matter how the runner is
implemented (JVM or not).

However, it means that you will have to mimic the Runner API for the
translation.

Regards
JB

On 17/07/2018 18:19, Austin Bennett wrote:
> Hi Beam Devs,
> 
> I still don't quite understand:
> 
> "Apache Beam provides a portable API layer for building sophisticated
> data-parallel processing pipelines that may be executed across a
> diversity of execution engines, or /runners/."
> 
> (from https://beam.apache.org/documentation/runners/capability-matrix/)
> 
> And specifically, close reading
> of: https://beam.apache.org/contribute/portability/
> 
> What if I'd like to implement a runner that is non-JVM?  Though would
> leverage the Python and Go SDKs?  Specifically, thinking of:
>  https://www.wallaroolabs.com (I am out in NY meeting with friends there
> later this week, and wanted to get a sense of, feasibility, work
> involved, etc -- to propose that we add a new Wallaroo runner).  
> 
> Is there a way to keep java out of the mix completely and still work
> with Beam on a non JVM runner (seems maybe eventually, but what about
> currently/near future)?  
> 
> Any input, thoughts, ideas, other pages or info to explore -- all
> appreciated; thanks!
> Austin
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


No JVM - new Runner?

2018-07-17 Thread Austin Bennett
Hi Beam Devs,

I still don't quite understand:

"Apache Beam provides a portable API layer for building sophisticated
data-parallel processing pipelines that may be executed across a diversity
of execution engines, or *runners*."

(from https://beam.apache.org/documentation/runners/capability-matrix/)

And specifically, close reading of:
https://beam.apache.org/contribute/portability/

What if I'd like to implement a runner that is non-JVM?  Though would
leverage the Python and Go SDKs?  Specifically, thinking of:
https://www.wallaroolabs.com (I am out in NY meeting with friends there
later this week, and wanted to get a sense of, feasibility, work involved,
etc -- to propose that we add a new Wallaroo runner).

Is there a way to keep java out of the mix completely and still work with
Beam on a non JVM runner (seems maybe eventually, but what about
currently/near future)?

Any input, thoughts, ideas, other pages or info to explore -- all
appreciated; thanks!
Austin