Re: No JVM - new Runner?
Hi Henning, Helped me unearth: https://s.apache.org/beam-job-api and can dig into: https://github.com/apache/beam/tree/master/model Additionally, see embedded -> On Tue, Jul 17, 2018, 12:02 PM Henning Rohde wrote: > There are essentially 2 complementary portability API surfaces that you'd > need to implement: job management incl. job submission and execution as > well as some worker deployment plumbing specific to the runner. Note that > the source of truth is the model protos -- the design docs linked from > https://beam.apache.org/contribute/portability/ and (even more so) the > website guides are not always up to date. > Sounds about right :-) Haven't done anything with grpc/proto, so gotta begin digging through that. > > Currently, all runners are in Java and share numerous components and > utilities. A non-JVM runner would have to build all that from scratch -- > although, as you mention, if you're using Go or Python the corresponding > SDKs likely have many pieces that can be reused. A minor potential hiccup > is that gRPC/protobuf is not natively supported everywhere, so you may end > up interoperating with the C versions of the libraries if you pick a > non-supported language. A separate challenge regardless of the language > is how directly the Beam model and primitives map to the engine. > > All that said, I think it's definitely feasible to do something > interesting. Are you specifically thinking of a Go Wallaroo runner? > That was my initial thought (go based runner, given likely ease of some compatibility as well as potential wider reusability), though certainly things to consider (like using Pony Lang, given that's the core of Wallaroo). Much of this is beyond what I'd have enough sense to be able to implement on my own, so much is homework for me, and eventually rely on both runner and Beam communities for guidance should this be well received all around and if it is ever to come to fruition. Really, much much HW for me at this point! Ultimately, easiest for me to rely on Beam's Python SDK for my ML focused workflows (when writing beam -- though eagerly awaiting full py3 support, so potentially a more impactful immediate area to contribute), so there's also that compatibility too. Thanks, > Henning > Thanks! Austin > > On Tue, Jul 17, 2018 at 9:26 AM Austin Bennett < > whatwouldausti...@gmail.com> wrote: > >> Sweet; that led me to: https://beam.apache.org/contr >> ibute/runner-guide/#the-runner-api (which I can't believe I missed). >> >> >> >> On Tue, Jul 17, 2018 at 9:21 AM, Jean-Baptiste Onofré >> wrote: >> >>> Hi Austin, >>> >>> If your runner provide the gRPC portabality layer (allowing any SDK to >>> "interact" with the runner), it will work no matter how the runner is >>> implemented (JVM or not). >>> >>> However, it means that you will have to mimic the Runner API for the >>> translation. >>> >>> Regards >>> JB >>> >>> On 17/07/2018 18:19, Austin Bennett wrote: >>> > Hi Beam Devs, >>> > >>> > I still don't quite understand: >>> > >>> > "Apache Beam provides a portable API layer for building sophisticated >>> > data-parallel processing pipelines that may be executed across a >>> > diversity of execution engines, or /runners/." >>> > >>> > (from https://beam.apache.org/documentation/runners/capability-matrix/ >>> ) >>> > >>> > And specifically, close reading >>> > of: https://beam.apache.org/contribute/portability/ >>> > >>> > What if I'd like to implement a runner that is non-JVM? Though would >>> > leverage the Python and Go SDKs? Specifically, thinking of: >>> > https://www.wallaroolabs.com (I am out in NY meeting with friends >>> there >>> > later this week, and wanted to get a sense of, feasibility, work >>> > involved, etc -- to propose that we add a new Wallaroo runner). >>> > >>> > Is there a way to keep java out of the mix completely and still work >>> > with Beam on a non JVM runner (seems maybe eventually, but what about >>> > currently/near future)? >>> > >>> > Any input, thoughts, ideas, other pages or info to explore -- all >>> > appreciated; thanks! >>> > Austin >>> > >>> > >>> >>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >> >>
Re: No JVM - new Runner?
There are essentially 2 complementary portability API surfaces that you'd need to implement: job management incl. job submission and execution as well as some worker deployment plumbing specific to the runner. Note that the source of truth is the model protos -- the design docs linked from https://beam.apache.org/contribute/portability/ and (even more so) the website guides are not always up to date. Currently, all runners are in Java and share numerous components and utilities. A non-JVM runner would have to build all that from scratch -- although, as you mention, if you're using Go or Python the corresponding SDKs likely have many pieces that can be reused. A minor potential hiccup is that gRPC/protobuf is not natively supported everywhere, so you may end up interoperating with the C versions of the libraries if you pick a non-supported language. A separate challenge regardless of the language is how directly the Beam model and primitives map to the engine. All that said, I think it's definitely feasible to do something interesting. Are you specifically thinking of a Go Wallaroo runner? Thanks, Henning On Tue, Jul 17, 2018 at 9:26 AM Austin Bennett wrote: > Sweet; that led me to: > https://beam.apache.org/contribute/runner-guide/#the-runner-api (which I > can't believe I missed). > > > > On Tue, Jul 17, 2018 at 9:21 AM, Jean-Baptiste Onofré > wrote: > >> Hi Austin, >> >> If your runner provide the gRPC portabality layer (allowing any SDK to >> "interact" with the runner), it will work no matter how the runner is >> implemented (JVM or not). >> >> However, it means that you will have to mimic the Runner API for the >> translation. >> >> Regards >> JB >> >> On 17/07/2018 18:19, Austin Bennett wrote: >> > Hi Beam Devs, >> > >> > I still don't quite understand: >> > >> > "Apache Beam provides a portable API layer for building sophisticated >> > data-parallel processing pipelines that may be executed across a >> > diversity of execution engines, or /runners/." >> > >> > (from https://beam.apache.org/documentation/runners/capability-matrix/) >> > >> > And specifically, close reading >> > of: https://beam.apache.org/contribute/portability/ >> > >> > What if I'd like to implement a runner that is non-JVM? Though would >> > leverage the Python and Go SDKs? Specifically, thinking of: >> > https://www.wallaroolabs.com (I am out in NY meeting with friends >> there >> > later this week, and wanted to get a sense of, feasibility, work >> > involved, etc -- to propose that we add a new Wallaroo runner). >> > >> > Is there a way to keep java out of the mix completely and still work >> > with Beam on a non JVM runner (seems maybe eventually, but what about >> > currently/near future)? >> > >> > Any input, thoughts, ideas, other pages or info to explore -- all >> > appreciated; thanks! >> > Austin >> > >> > >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> > >
Re: No JVM - new Runner?
Sweet; that led me to: https://beam.apache.org/contribute/runner-guide/#the-runner-api (which I can't believe I missed). On Tue, Jul 17, 2018 at 9:21 AM, Jean-Baptiste Onofré wrote: > Hi Austin, > > If your runner provide the gRPC portabality layer (allowing any SDK to > "interact" with the runner), it will work no matter how the runner is > implemented (JVM or not). > > However, it means that you will have to mimic the Runner API for the > translation. > > Regards > JB > > On 17/07/2018 18:19, Austin Bennett wrote: > > Hi Beam Devs, > > > > I still don't quite understand: > > > > "Apache Beam provides a portable API layer for building sophisticated > > data-parallel processing pipelines that may be executed across a > > diversity of execution engines, or /runners/." > > > > (from https://beam.apache.org/documentation/runners/capability-matrix/) > > > > And specifically, close reading > > of: https://beam.apache.org/contribute/portability/ > > > > What if I'd like to implement a runner that is non-JVM? Though would > > leverage the Python and Go SDKs? Specifically, thinking of: > > https://www.wallaroolabs.com (I am out in NY meeting with friends there > > later this week, and wanted to get a sense of, feasibility, work > > involved, etc -- to propose that we add a new Wallaroo runner). > > > > Is there a way to keep java out of the mix completely and still work > > with Beam on a non JVM runner (seems maybe eventually, but what about > > currently/near future)? > > > > Any input, thoughts, ideas, other pages or info to explore -- all > > appreciated; thanks! > > Austin > > > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: No JVM - new Runner?
Hi Austin, If your runner provide the gRPC portabality layer (allowing any SDK to "interact" with the runner), it will work no matter how the runner is implemented (JVM or not). However, it means that you will have to mimic the Runner API for the translation. Regards JB On 17/07/2018 18:19, Austin Bennett wrote: > Hi Beam Devs, > > I still don't quite understand: > > "Apache Beam provides a portable API layer for building sophisticated > data-parallel processing pipelines that may be executed across a > diversity of execution engines, or /runners/." > > (from https://beam.apache.org/documentation/runners/capability-matrix/) > > And specifically, close reading > of: https://beam.apache.org/contribute/portability/ > > What if I'd like to implement a runner that is non-JVM? Though would > leverage the Python and Go SDKs? Specifically, thinking of: > https://www.wallaroolabs.com (I am out in NY meeting with friends there > later this week, and wanted to get a sense of, feasibility, work > involved, etc -- to propose that we add a new Wallaroo runner). > > Is there a way to keep java out of the mix completely and still work > with Beam on a non JVM runner (seems maybe eventually, but what about > currently/near future)? > > Any input, thoughts, ideas, other pages or info to explore -- all > appreciated; thanks! > Austin > > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
No JVM - new Runner?
Hi Beam Devs, I still don't quite understand: "Apache Beam provides a portable API layer for building sophisticated data-parallel processing pipelines that may be executed across a diversity of execution engines, or *runners*." (from https://beam.apache.org/documentation/runners/capability-matrix/) And specifically, close reading of: https://beam.apache.org/contribute/portability/ What if I'd like to implement a runner that is non-JVM? Though would leverage the Python and Go SDKs? Specifically, thinking of: https://www.wallaroolabs.com (I am out in NY meeting with friends there later this week, and wanted to get a sense of, feasibility, work involved, etc -- to propose that we add a new Wallaroo runner). Is there a way to keep java out of the mix completely and still work with Beam on a non JVM runner (seems maybe eventually, but what about currently/near future)? Any input, thoughts, ideas, other pages or info to explore -- all appreciated; thanks! Austin