Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Romain Manni-Bucau
Interesting view Thomas - and it makes a lot of sense. Would you rather see
2 modules? embedded-runner+portable-runner+direct-runner (with inheritance
in between)? Would work for me.


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-05 19:43 GMT+01:00 Thomas Groh :

> The portable java 'DirectRunner' is already in-progress, and has been for
> several months - it's tracked by https://issues.apache.org/
> jira/browse/BEAM-2899
>
> My expectation is that the actual portability augmentations is unlikely to
> require significant changes to the DirectRunner implementations. I'd prefer
> to avoid any major refactors while that effort is underway - it's likely to
> add a significant amount of overhead, and I don't think that this
> refactoring will improve the velocity for the portability changes. The
> non-checking modes (immutability, enforceability) can be for the time
> disabled with flags.
>
> After the portability runner goes in, I'm not opposed to considering a
> refactoring - but I think that splitting "Model Enforcements" into separate
> modules might be overkill for things of that scope.
>
>
> On Mon, Mar 5, 2018 at 10:25 AM Romain Manni-Bucau 
> wrote:
>
>> Hi Lukasz,
>>
>> concretely it is pretty simple - if not let me know, i'll try to gist
>> some code but I don't think we need:
>>
>> (I'll use module names, let's not discuss them, it is just to share the
>> idea) I see it as follow:
>>
>> 1. beam-java-runner - bare API impl (extracted from direct runner, this
>> is not a new impl. Advantage is to make the new portable java runner and
>> direct runner converging)
>> 2. beam-java-runner-immutability-extension: adds the option
>> EnforceImmutability
>> 3. beam-java-runner-encodability: adds the option EnforceEncodability
>> 4. beam-java-runner-portableapi: adds ProtoTranslation (+ a few other
>> parts probably), this one will lead more or less to the portable one
>> 5. beam-java-direct-runner (current one)
>>
>> Idea is to have a *unique* and production proof embedded java runner
>> which has composable extensions and the full blown flavor (with all
>> extensions) is the direct runner, an intermediate flavor is the portable
>> runner.
>> Advantage is to be able to keep adding validations and harnessing to the
>> direct runner without degrading all the other use cases.
>> This lead to keep a light embedded runner as a beam reference
>> implementation which is usable in prod until the volumes require more.
>>
>> If we don't go that way we should think about what is the reference
>> implementation and maybe just drop some usages of the direct runner and
>> enhance another runner supporting embedded runs to support all the beam API
>> (for instance flink runner).
>>
>> Does it make it clearer?
>>
>>
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-03-04 20:15 GMT+01:00 Lukasz Cwik :
>>
>>> Feel free to document what you would like the extension mechanism to do
>>> and provide some skeleton interfaces for APIs that you would like to
>>> support.
>>>
>>> On Fri, Mar 2, 2018 at 2:33 PM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 Le 2 mars 2018 22:22, "Lukasz Cwik"  a écrit :

 To my knowledge, no one has discussed an extension mechanism for the
 direct runner but the difficulty is in how to get extensions to interact
 with the internals of the direct runner cleanly.
 Note that the direct runner currently accepts a set of flags which
 enable/disable validation and control how it runs like
 "--enforceImmutability": https://github.com/apache/
 beam/blob/master/runners/direct-java/src/main/java/org/
 apache/beam/runners/direct/DirectOptions.java#L49
 Would it be easier to just add more flags which control how the direct
 runner works?


 Nop, idea is to guarantee a behavior and prevent regression whatever is
 added for other purposes



 As for having a direct runner using portability to be able to execute
 Python / Go / Java SDKs, you should look at https://issues.apache.org/
 jira/browse/BEAM-2899

 On Fri, Mar 2, 2018 at 12:53 PM, Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Hi guys,
>
> wonder if you discussed or thought to break down what is called today
> the direct 

Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Thomas Groh
The portable java 'DirectRunner' is already in-progress, and has been for
several months - it's tracked by
https://issues.apache.org/jira/browse/BEAM-2899

My expectation is that the actual portability augmentations is unlikely to
require significant changes to the DirectRunner implementations. I'd prefer
to avoid any major refactors while that effort is underway - it's likely to
add a significant amount of overhead, and I don't think that this
refactoring will improve the velocity for the portability changes. The
non-checking modes (immutability, enforceability) can be for the time
disabled with flags.

After the portability runner goes in, I'm not opposed to considering a
refactoring - but I think that splitting "Model Enforcements" into separate
modules might be overkill for things of that scope.


On Mon, Mar 5, 2018 at 10:25 AM Romain Manni-Bucau 
wrote:

> Hi Lukasz,
>
> concretely it is pretty simple - if not let me know, i'll try to gist some
> code but I don't think we need:
>
> (I'll use module names, let's not discuss them, it is just to share the
> idea) I see it as follow:
>
> 1. beam-java-runner - bare API impl (extracted from direct runner, this is
> not a new impl. Advantage is to make the new portable java runner and
> direct runner converging)
> 2. beam-java-runner-immutability-extension: adds the option
> EnforceImmutability
> 3. beam-java-runner-encodability: adds the option EnforceEncodability
> 4. beam-java-runner-portableapi: adds ProtoTranslation (+ a few other
> parts probably), this one will lead more or less to the portable one
> 5. beam-java-direct-runner (current one)
>
> Idea is to have a *unique* and production proof embedded java runner which
> has composable extensions and the full blown flavor (with all extensions)
> is the direct runner, an intermediate flavor is the portable runner.
> Advantage is to be able to keep adding validations and harnessing to the
> direct runner without degrading all the other use cases.
> This lead to keep a light embedded runner as a beam reference
> implementation which is usable in prod until the volumes require more.
>
> If we don't go that way we should think about what is the reference
> implementation and maybe just drop some usages of the direct runner and
> enhance another runner supporting embedded runs to support all the beam API
> (for instance flink runner).
>
> Does it make it clearer?
>
>
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-03-04 20:15 GMT+01:00 Lukasz Cwik :
>
>> Feel free to document what you would like the extension mechanism to do
>> and provide some skeleton interfaces for APIs that you would like to
>> support.
>>
>> On Fri, Mar 2, 2018 at 2:33 PM, Romain Manni-Bucau > > wrote:
>>
>>>
>>>
>>> Le 2 mars 2018 22:22, "Lukasz Cwik"  a écrit :
>>>
>>> To my knowledge, no one has discussed an extension mechanism for the
>>> direct runner but the difficulty is in how to get extensions to interact
>>> with the internals of the direct runner cleanly.
>>> Note that the direct runner currently accepts a set of flags which
>>> enable/disable validation and control how it runs like
>>> "--enforceImmutability":
>>> https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectOptions.java#L49
>>> Would it be easier to just add more flags which control how the direct
>>> runner works?
>>>
>>>
>>> Nop, idea is to guarantee a behavior and prevent regression whatever is
>>> added for other purposes
>>>
>>>
>>>
>>> As for having a direct runner using portability to be able to execute
>>> Python / Go / Java SDKs, you should look at
>>> https://issues.apache.org/jira/browse/BEAM-2899
>>>
>>> On Fri, Mar 2, 2018 at 12:53 PM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Hi guys,

 wonder if you discussed or thought to break down what is called today
 the direct runner in an embedded runner which would be modular an
 extensible.

 What I have in mind is the following:

 1. have a strong embedded runner implementing the whole beam API but
 limited to a single JVM
 2. keep a string test oriented runner (what we call direct runner today)

 The overall design would be to ensure 1 and 2 share the common code and
 avoid to do yet another runner. This means several extension points should
 be defined to:

 1. add the serialization validation
 2. add the portability validation
 3. add the execution randomization

 I didn't think yet to what would be the execution points (can just be
 replacements 

Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Romain Manni-Bucau
Hi Lukasz,

concretely it is pretty simple - if not let me know, i'll try to gist some
code but I don't think we need:

(I'll use module names, let's not discuss them, it is just to share the
idea) I see it as follow:

1. beam-java-runner - bare API impl (extracted from direct runner, this is
not a new impl. Advantage is to make the new portable java runner and
direct runner converging)
2. beam-java-runner-immutability-extension: adds the option
EnforceImmutability
3. beam-java-runner-encodability: adds the option EnforceEncodability
4. beam-java-runner-portableapi: adds ProtoTranslation (+ a few other parts
probably), this one will lead more or less to the portable one
5. beam-java-direct-runner (current one)

Idea is to have a *unique* and production proof embedded java runner which
has composable extensions and the full blown flavor (with all extensions)
is the direct runner, an intermediate flavor is the portable runner.
Advantage is to be able to keep adding validations and harnessing to the
direct runner without degrading all the other use cases.
This lead to keep a light embedded runner as a beam reference
implementation which is usable in prod until the volumes require more.

If we don't go that way we should think about what is the reference
implementation and maybe just drop some usages of the direct runner and
enhance another runner supporting embedded runs to support all the beam API
(for instance flink runner).

Does it make it clearer?




Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-04 20:15 GMT+01:00 Lukasz Cwik :

> Feel free to document what you would like the extension mechanism to do
> and provide some skeleton interfaces for APIs that you would like to
> support.
>
> On Fri, Mar 2, 2018 at 2:33 PM, Romain Manni-Bucau 
> wrote:
>
>>
>>
>> Le 2 mars 2018 22:22, "Lukasz Cwik"  a écrit :
>>
>> To my knowledge, no one has discussed an extension mechanism for the
>> direct runner but the difficulty is in how to get extensions to interact
>> with the internals of the direct runner cleanly.
>> Note that the direct runner currently accepts a set of flags which
>> enable/disable validation and control how it runs like
>> "--enforceImmutability": https://github.com/apache/beam
>> /blob/master/runners/direct-java/src/main/java/org/apache/be
>> am/runners/direct/DirectOptions.java#L49
>> Would it be easier to just add more flags which control how the direct
>> runner works?
>>
>>
>> Nop, idea is to guarantee a behavior and prevent regression whatever is
>> added for other purposes
>>
>>
>>
>> As for having a direct runner using portability to be able to execute
>> Python / Go / Java SDKs, you should look at https://issues.apache.org/j
>> ira/browse/BEAM-2899
>>
>> On Fri, Mar 2, 2018 at 12:53 PM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> wonder if you discussed or thought to break down what is called today
>>> the direct runner in an embedded runner which would be modular an
>>> extensible.
>>>
>>> What I have in mind is the following:
>>>
>>> 1. have a strong embedded runner implementing the whole beam API but
>>> limited to a single JVM
>>> 2. keep a string test oriented runner (what we call direct runner today)
>>>
>>> The overall design would be to ensure 1 and 2 share the common code and
>>> avoid to do yet another runner. This means several extension points should
>>> be defined to:
>>>
>>> 1. add the serialization validation
>>> 2. add the portability validation
>>> 3. add the execution randomization
>>>
>>> I didn't think yet to what would be the execution points (can just be
>>> replacements probably or specific extension points which would be less
>>> elegant but it enables to reach the same goal).
>>>
>>> The base runner (let's call it "EmbeddedRunner" to name it here) would
>>> have its EmbeddedRunnerOptions which would have a --modules options to
>>> activate all potential extension points (in 
>>> META-INF/org/apache/beam/embeddedrunner/extensions/xxx
>>> xxx being the extension name to use in --modules for instance).
>>>
>>> This would enable to have an embedded runner more usable for light/small
>>> but production oriented environments for users, would also start to align
>>> the work done for the portability (thinking to recent python enhancements
>>> in runners) without loosing the strong validation done in tests or preprod
>>> envs.
>>>
>>> Was it already mentionned/thought? If not, wdyt?
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>> 

Re: to a modular embedded java runner to replace the direct runner?

2018-03-02 Thread Romain Manni-Bucau
Le 2 mars 2018 22:22, "Lukasz Cwik"  a écrit :

To my knowledge, no one has discussed an extension mechanism for the direct
runner but the difficulty is in how to get extensions to interact with the
internals of the direct runner cleanly.
Note that the direct runner currently accepts a set of flags which
enable/disable validation and control how it runs like
"--enforceImmutability": https://github.com/apache/beam/blob/master/runners/
direct-java/src/main/java/org/apache/beam/runners/direct/
DirectOptions.java#L49
Would it be easier to just add more flags which control how the direct
runner works?


Nop, idea is to guarantee a behavior and prevent regression whatever is
added for other purposes



As for having a direct runner using portability to be able to execute
Python / Go / Java SDKs, you should look at https://issues.apache.org/
jira/browse/BEAM-2899

On Fri, Mar 2, 2018 at 12:53 PM, Romain Manni-Bucau 
wrote:

> Hi guys,
>
> wonder if you discussed or thought to break down what is called today the
> direct runner in an embedded runner which would be modular an extensible.
>
> What I have in mind is the following:
>
> 1. have a strong embedded runner implementing the whole beam API but
> limited to a single JVM
> 2. keep a string test oriented runner (what we call direct runner today)
>
> The overall design would be to ensure 1 and 2 share the common code and
> avoid to do yet another runner. This means several extension points should
> be defined to:
>
> 1. add the serialization validation
> 2. add the portability validation
> 3. add the execution randomization
>
> I didn't think yet to what would be the execution points (can just be
> replacements probably or specific extension points which would be less
> elegant but it enables to reach the same goal).
>
> The base runner (let's call it "EmbeddedRunner" to name it here) would
> have its EmbeddedRunnerOptions which would have a --modules options to
> activate all potential extension points (in 
> META-INF/org/apache/beam/embeddedrunner/extensions/xxx
> xxx being the extension name to use in --modules for instance).
>
> This would enable to have an embedded runner more usable for light/small
> but production oriented environments for users, would also start to align
> the work done for the portability (thinking to recent python enhancements
> in runners) without loosing the strong validation done in tests or preprod
> envs.
>
> Was it already mentionned/thought? If not, wdyt?
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>


Re: to a modular embedded java runner to replace the direct runner?

2018-03-02 Thread Lukasz Cwik
To my knowledge, no one has discussed an extension mechanism for the direct
runner but the difficulty is in how to get extensions to interact with the
internals of the direct runner cleanly.
Note that the direct runner currently accepts a set of flags which
enable/disable validation and control how it runs like
"--enforceImmutability":
https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectOptions.java#L49
Would it be easier to just add more flags which control how the direct
runner works?

As for having a direct runner using portability to be able to execute
Python / Go / Java SDKs, you should look at
https://issues.apache.org/jira/browse/BEAM-2899

On Fri, Mar 2, 2018 at 12:53 PM, Romain Manni-Bucau 
wrote:

> Hi guys,
>
> wonder if you discussed or thought to break down what is called today the
> direct runner in an embedded runner which would be modular an extensible.
>
> What I have in mind is the following:
>
> 1. have a strong embedded runner implementing the whole beam API but
> limited to a single JVM
> 2. keep a string test oriented runner (what we call direct runner today)
>
> The overall design would be to ensure 1 and 2 share the common code and
> avoid to do yet another runner. This means several extension points should
> be defined to:
>
> 1. add the serialization validation
> 2. add the portability validation
> 3. add the execution randomization
>
> I didn't think yet to what would be the execution points (can just be
> replacements probably or specific extension points which would be less
> elegant but it enables to reach the same goal).
>
> The base runner (let's call it "EmbeddedRunner" to name it here) would
> have its EmbeddedRunnerOptions which would have a --modules options to
> activate all potential extension points (in 
> META-INF/org/apache/beam/embeddedrunner/extensions/xxx
> xxx being the extension name to use in --modules for instance).
>
> This would enable to have an embedded runner more usable for light/small
> but production oriented environments for users, would also start to align
> the work done for the portability (thinking to recent python enhancements
> in runners) without loosing the strong validation done in tests or preprod
> envs.
>
> Was it already mentionned/thought? If not, wdyt?
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>


to a modular embedded java runner to replace the direct runner?

2018-03-02 Thread Romain Manni-Bucau
Hi guys,

wonder if you discussed or thought to break down what is called today the
direct runner in an embedded runner which would be modular an extensible.

What I have in mind is the following:

1. have a strong embedded runner implementing the whole beam API but
limited to a single JVM
2. keep a string test oriented runner (what we call direct runner today)

The overall design would be to ensure 1 and 2 share the common code and
avoid to do yet another runner. This means several extension points should
be defined to:

1. add the serialization validation
2. add the portability validation
3. add the execution randomization

I didn't think yet to what would be the execution points (can just be
replacements probably or specific extension points which would be less
elegant but it enables to reach the same goal).

The base runner (let's call it "EmbeddedRunner" to name it here) would have
its EmbeddedRunnerOptions which would have a --modules options to activate
all potential extension points (in
META-INF/org/apache/beam/embeddedrunner/extensions/xxx xxx being the
extension name to use in --modules for instance).

This would enable to have an embedded runner more usable for light/small
but production oriented environments for users, would also start to align
the work done for the portability (thinking to recent python enhancements
in runners) without loosing the strong validation done in tests or preprod
envs.

Was it already mentionned/thought? If not, wdyt?

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book