Re: Hazelcast Jet Runner

2019-07-11 Thread Maximilian Michels
I believe that is the case. Thanks Kenn.

On 10.07.19 21:35, Ismaël Mejía wrote:
> Yes please!
>
> On Wed, Jul 10, 2019 at 8:38 PM Kenneth Knowles  wrote:
> >
> > Just to make sure we have closed on the Jet runner, my understanding is: I 
> > was the main person asking for "runners-jet-experimental" but I am 
> > convinced to go with plain "runners-jet". It seems everyone else is already 
> > fine with this, so go ahead?
> >
> > On Tue, Jul 9, 2019 at 1:23 PM Maximilian Michels  wrote:
> >>
> >> We should fork the discussion around removing instances of @Experimental, 
> >> but it was good to mention it here.
> >>
> >> As for the Jet runner, I can only second Ismael: The Jet runner is the 
> >> first runner I can think of that came with ValidatesRunner and Nexmark out 
> >> of the box. Of course that doesn't mean the runner is "battled-tested", 
> >> but we do not have other means to test its maturity.
> >>
> >> For the future, we could come up with other criteria, e.g. a "probation 
> >> period", but enforcing this now seems arbitrary.
> >>
> >> If the authors of the Runners decide that it is experimental, so be it. 
> >> Otherwise I would leave it to the user to decide (it might be helpful to 
> >> list the inception date of each runner). That said, I value your concern 
> >> Kenn. I can see that we establish a consistent onboarding of new runners 
> >> which may involve marking them experimental for a while.
> >>
> >> -Max
> >>
> >> On 01.07.19 22:20, Kenneth Knowles wrote:
> >>>
> >>>
> >>> On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía  >>> > wrote:
> >>>
> >>> Seems the discussion moved a bit of my original intent that was to
> >>> make the Jet runner directory to be just called runners/jet in the
> >>> directory and mark the 'experimental' part of it in documentation as
> >>> we do for all other things in Beam.
> >>>
> >>>
> >>> Thanks for returning to the one question at hand. We don't have to make
> >>> an overall decision about all "experimental" things.
> >>>
> >>>
> >>> Can we do this or is there still any considerable argument to not do 
> >>>it?
> >>>
> >>>
> >>> I think we actually have some competing goals:
> >>>
> >>> I agree 100% on the arguments, but let’s think in the reverse terms,
> >>> highlighting lack of maturity can play against the intended goal of
> >>> use and adoption even if for a noble reason. It is basic priming 101
> >>> [1].
> >>>
> >>>
> >>> _My_ goal is exactly to highlight lack of maturity so that users are not
> >>> harmed by either (1) necessary breaking changes or (2) permanent low
> >>> quality. Only users who are willing to follow along with the project and
> >>> update their own code regularly should use experimental features.
> >>>
> >>> Evaluating the Jet runner I am convinced by your arguments, because
> >>> looking at the two dangers:
> >>> (1) necessary breaking changes -- runners don't really have their own
> >>> APIs to break, except their own small set of APIs and pipeline options
> >>> (2) permanent low quality -- because there is no API design possible,
> >>> there's no risk of permanent low quality except by fundamental
> >>> mismatches. Plus as you mention the testing is already quite good.
> >>>
> >>> So I am OK to not call it experimental. But I have a slight remaining
> >>> concern that it did not really go through what other runners went
> >>> through. I hope this just means it is more mature. I hope it does not
> >>> indicate that we are reducing rigor.
> >>>
> >>> Kenn
> >>>
> >>>
> >>> On Wed, May 29, 2019 at 3:02 PM Reza Rokni  >>> > wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Over 800 usages under java, might be worth doing a few PR...
> >>> >
> >>> > Also suggest we use a very light review process: First round go
> >>> for low hanging fruit, if anyone does a -1 against a change then we
> >>> leave that for round two.
> >>> >
> >>> > Thoughts?
> >>> >
> >>> > Cheers
> >>> >
> >>> > Reza
> >>> >
> >>> > On Wed, 29 May 2019 at 12:05, Kenneth Knowles  >>> > wrote:
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni  >>> > wrote:
> >>> >>>
> >>> >>> "Many APIs that have been in place for years and are used by
> >>> most Beam users are still marked Experimental."
> >>> >>>
> >>> >>> Should there be a formal process in place to start 'graduating'
> >>> features out of @Experimental? Perhaps even target an up coming
> >>> release with a PR to remove the annotation from well established 
> >>>API's?
> >>> >>
> >>> >>
> >>> >> Good idea. I think a PR like this would be an opportunity to
> >>> discuss whether the feature is non-experimental. Probably many of
> >>> them are ready. It would help to address Ismael's very good point
> >>> that this new 

Re: Hazelcast Jet Runner

2019-07-10 Thread Ismaël Mejía
Yes please!

On Wed, Jul 10, 2019 at 8:38 PM Kenneth Knowles  wrote:
>
> Just to make sure we have closed on the Jet runner, my understanding is: I 
> was the main person asking for "runners-jet-experimental" but I am convinced 
> to go with plain "runners-jet". It seems everyone else is already fine with 
> this, so go ahead?
>
> On Tue, Jul 9, 2019 at 1:23 PM Maximilian Michels  wrote:
>>
>> We should fork the discussion around removing instances of @Experimental, 
>> but it was good to mention it here.
>>
>> As for the Jet runner, I can only second Ismael: The Jet runner is the first 
>> runner I can think of that came with ValidatesRunner and Nexmark out of the 
>> box. Of course that doesn't mean the runner is "battled-tested", but we do 
>> not have other means to test its maturity.
>>
>> For the future, we could come up with other criteria, e.g. a "probation 
>> period", but enforcing this now seems arbitrary.
>>
>> If the authors of the Runners decide that it is experimental, so be it. 
>> Otherwise I would leave it to the user to decide (it might be helpful to 
>> list the inception date of each runner). That said, I value your concern 
>> Kenn. I can see that we establish a consistent onboarding of new runners 
>> which may involve marking them experimental for a while.
>>
>> -Max
>>
>> On 01.07.19 22:20, Kenneth Knowles wrote:
>> >
>> >
>> > On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía > > > wrote:
>> >
>> > Seems the discussion moved a bit of my original intent that was to
>> > make the Jet runner directory to be just called runners/jet in the
>> > directory and mark the 'experimental' part of it in documentation as
>> > we do for all other things in Beam.
>> >
>> >
>> > Thanks for returning to the one question at hand. We don't have to make
>> > an overall decision about all "experimental" things.
>> >
>> >
>> > Can we do this or is there still any considerable argument to not do 
>> > it?
>> >
>> >
>> > I think we actually have some competing goals:
>> >
>> > I agree 100% on the arguments, but let’s think in the reverse terms,
>> > highlighting lack of maturity can play against the intended goal of
>> > use and adoption even if for a noble reason. It is basic priming 101
>> > [1].
>> >
>> >
>> > _My_ goal is exactly to highlight lack of maturity so that users are not
>> > harmed by either (1) necessary breaking changes or (2) permanent low
>> > quality. Only users who are willing to follow along with the project and
>> > update their own code regularly should use experimental features.
>> >
>> > Evaluating the Jet runner I am convinced by your arguments, because
>> > looking at the two dangers:
>> > (1) necessary breaking changes -- runners don't really have their own
>> > APIs to break, except their own small set of APIs and pipeline options
>> > (2) permanent low quality -- because there is no API design possible,
>> > there's no risk of permanent low quality except by fundamental
>> > mismatches. Plus as you mention the testing is already quite good.
>> >
>> > So I am OK to not call it experimental. But I have a slight remaining
>> > concern that it did not really go through what other runners went
>> > through. I hope this just means it is more mature. I hope it does not
>> > indicate that we are reducing rigor.
>> >
>> > Kenn
>> >
>> >
>> > On Wed, May 29, 2019 at 3:02 PM Reza Rokni > > > wrote:
>> > >
>> > > Hi,
>> > >
>> > > Over 800 usages under java, might be worth doing a few PR...
>> > >
>> > > Also suggest we use a very light review process: First round go
>> > for low hanging fruit, if anyone does a -1 against a change then we
>> > leave that for round two.
>> > >
>> > > Thoughts?
>> > >
>> > > Cheers
>> > >
>> > > Reza
>> > >
>> > > On Wed, 29 May 2019 at 12:05, Kenneth Knowles > > > wrote:
>> > >>
>> > >>
>> > >>
>> > >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni > > > wrote:
>> > >>>
>> > >>> "Many APIs that have been in place for years and are used by
>> > most Beam users are still marked Experimental."
>> > >>>
>> > >>> Should there be a formal process in place to start 'graduating'
>> > features out of @Experimental? Perhaps even target an up coming
>> > release with a PR to remove the annotation from well established API's?
>> > >>
>> > >>
>> > >> Good idea. I think a PR like this would be an opportunity to
>> > discuss whether the feature is non-experimental. Probably many of
>> > them are ready. It would help to address Ismael's very good point
>> > that this new practice could make users think the old Experimental
>> > stuff is not experimental. Maybe it is true that it is not really
>> > still Experimental.
>> > >>
>> > >> Kenn
>> > >>
>> > >>
>> > >>>
>> > 

Re: Hazelcast Jet Runner

2019-07-10 Thread Kenneth Knowles
Just to make sure we have closed on the Jet runner, my understanding is: I
was the main person asking for "runners-jet-experimental" but I am
convinced to go with plain "runners-jet". It seems everyone else is already
fine with this, so go ahead?

On Tue, Jul 9, 2019 at 1:23 PM Maximilian Michels  wrote:

> We should fork the discussion around removing instances of @Experimental,
> but it was good to mention it here.
>
> As for the Jet runner, I can only second Ismael: The Jet runner is the
> first runner I can think of that came with ValidatesRunner and Nexmark out
> of the box. Of course that doesn't mean the runner is "battled-tested", but
> we do not have other means to test its maturity.
>
> For the future, we could come up with other criteria, e.g. a "probation
> period", but enforcing this now seems arbitrary.
>
> If the authors of the Runners decide that it is experimental, so be it.
> Otherwise I would leave it to the user to decide (it might be helpful to
> list the inception date of each runner). That said, I value your concern
> Kenn. I can see that we establish a consistent onboarding of new runners
> which may involve marking them experimental for a while.
>
> -Max
>
> On 01.07.19 22:20, Kenneth Knowles wrote:
> >
> >
> > On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía  > > wrote:
> >
> > Seems the discussion moved a bit of my original intent that was to
> > make the Jet runner directory to be just called runners/jet in the
> > directory and mark the 'experimental' part of it in documentation as
> > we do for all other things in Beam.
> >
> >
> > Thanks for returning to the one question at hand. We don't have to make
> > an overall decision about all "experimental" things.
> >
> >
> > Can we do this or is there still any considerable argument to not do
> it?
> >
> >
> > I think we actually have some competing goals:
> >
> > I agree 100% on the arguments, but let’s think in the reverse terms,
> > highlighting lack of maturity can play against the intended goal of
> > use and adoption even if for a noble reason. It is basic priming 101
> > [1].
> >
> >
> > _My_ goal is exactly to highlight lack of maturity so that users are not
> > harmed by either (1) necessary breaking changes or (2) permanent low
> > quality. Only users who are willing to follow along with the project and
> > update their own code regularly should use experimental features.
> >
> > Evaluating the Jet runner I am convinced by your arguments, because
> > looking at the two dangers:
> > (1) necessary breaking changes -- runners don't really have their own
> > APIs to break, except their own small set of APIs and pipeline options
> > (2) permanent low quality -- because there is no API design possible,
> > there's no risk of permanent low quality except by fundamental
> > mismatches. Plus as you mention the testing is already quite good.
> >
> > So I am OK to not call it experimental. But I have a slight remaining
> > concern that it did not really go through what other runners went
> > through. I hope this just means it is more mature. I hope it does not
> > indicate that we are reducing rigor.
> >
> > Kenn
> >
> >
> > On Wed, May 29, 2019 at 3:02 PM Reza Rokni  > > wrote:
> > >
> > > Hi,
> > >
> > > Over 800 usages under java, might be worth doing a few PR...
> > >
> > > Also suggest we use a very light review process: First round go
> > for low hanging fruit, if anyone does a -1 against a change then we
> > leave that for round two.
> > >
> > > Thoughts?
> > >
> > > Cheers
> > >
> > > Reza
> > >
> > > On Wed, 29 May 2019 at 12:05, Kenneth Knowles  > > wrote:
> > >>
> > >>
> > >>
> > >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni  > > wrote:
> > >>>
> > >>> "Many APIs that have been in place for years and are used by
> > most Beam users are still marked Experimental."
> > >>>
> > >>> Should there be a formal process in place to start 'graduating'
> > features out of @Experimental? Perhaps even target an up coming
> > release with a PR to remove the annotation from well established
> API's?
> > >>
> > >>
> > >> Good idea. I think a PR like this would be an opportunity to
> > discuss whether the feature is non-experimental. Probably many of
> > them are ready. It would help to address Ismael's very good point
> > that this new practice could make users think the old Experimental
> > stuff is not experimental. Maybe it is true that it is not really
> > still Experimental.
> > >>
> > >> Kenn
> > >>
> > >>
> > >>>
> > >>> On Tue, 28 May 2019 at 06:44, Reuven Lax  > > wrote:
> > 
> >  We generally use Experimental for two different things, which
> > leads to confusion.
> > 

Re: Hazelcast Jet Runner

2019-07-09 Thread Maximilian Michels
We should fork the discussion around removing instances of @Experimental, but 
it was good to mention it here.

As for the Jet runner, I can only second Ismael: The Jet runner is the first 
runner I can think of that came with ValidatesRunner and Nexmark out of the 
box. Of course that doesn't mean the runner is "battled-tested", but we do not 
have other means to test its maturity.

For the future, we could come up with other criteria, e.g. a "probation 
period", but enforcing this now seems arbitrary.

If the authors of the Runners decide that it is experimental, so be it. 
Otherwise I would leave it to the user to decide (it might be helpful to list 
the inception date of each runner). That said, I value your concern Kenn. I can 
see that we establish a consistent onboarding of new runners which may involve 
marking them experimental for a while.

-Max

On 01.07.19 22:20, Kenneth Knowles wrote:
>
>
> On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía  > wrote:
>
>     Seems the discussion moved a bit of my original intent that was to
>     make the Jet runner directory to be just called runners/jet in the
>     directory and mark the 'experimental' part of it in documentation as
>     we do for all other things in Beam.
>
>
> Thanks for returning to the one question at hand. We don't have to make
> an overall decision about all "experimental" things.
>  
>
>     Can we do this or is there still any considerable argument to not do it?
>
>
> I think we actually have some competing goals:
>
>     I agree 100% on the arguments, but let’s think in the reverse terms,
>     highlighting lack of maturity can play against the intended goal of
>     use and adoption even if for a noble reason. It is basic priming 101
>     [1].
>
>
> _My_ goal is exactly to highlight lack of maturity so that users are not
> harmed by either (1) necessary breaking changes or (2) permanent low
> quality. Only users who are willing to follow along with the project and
> update their own code regularly should use experimental features.
>
> Evaluating the Jet runner I am convinced by your arguments, because
> looking at the two dangers:
> (1) necessary breaking changes -- runners don't really have their own
> APIs to break, except their own small set of APIs and pipeline options
> (2) permanent low quality -- because there is no API design possible,
> there's no risk of permanent low quality except by fundamental
> mismatches. Plus as you mention the testing is already quite good.
>
> So I am OK to not call it experimental. But I have a slight remaining
> concern that it did not really go through what other runners went
> through. I hope this just means it is more mature. I hope it does not
> indicate that we are reducing rigor.
>
> Kenn
>  
>
>     On Wed, May 29, 2019 at 3:02 PM Reza Rokni      > wrote:
>     >
>     > Hi,
>     >
>     > Over 800 usages under java, might be worth doing a few PR...
>     >
>     > Also suggest we use a very light review process: First round go
>     for low hanging fruit, if anyone does a -1 against a change then we
>     leave that for round two.
>     >
>     > Thoughts?
>     >
>     > Cheers
>     >
>     > Reza
>     >
>     > On Wed, 29 May 2019 at 12:05, Kenneth Knowles      > wrote:
>     >>
>     >>
>     >>
>     >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni      > wrote:
>     >>>
>     >>> "Many APIs that have been in place for years and are used by
>     most Beam users are still marked Experimental."
>     >>>
>     >>> Should there be a formal process in place to start 'graduating'
>     features out of @Experimental? Perhaps even target an up coming
>     release with a PR to remove the annotation from well established API's?
>     >>
>     >>
>     >> Good idea. I think a PR like this would be an opportunity to
>     discuss whether the feature is non-experimental. Probably many of
>     them are ready. It would help to address Ismael's very good point
>     that this new practice could make users think the old Experimental
>     stuff is not experimental. Maybe it is true that it is not really
>     still Experimental.
>     >>
>     >> Kenn
>     >>
>     >>
>     >>>
>     >>> On Tue, 28 May 2019 at 06:44, Reuven Lax      > wrote:
>     
>      We generally use Experimental for two different things, which
>     leads to confusion.
>        1. Features that work stably, but where we think we might
>     still make some changes to the API.
>        2. New features that we think might not yet be stable.
>     
>      This dual usage leads to a lot of confusion IMO. The fact that
>     we tend to forget to remove the @Experimental tag also makes it
>     somewhat useless. Many APIs that have been in place for years and
>     are used by most Beam users are still marked Experimental.
>     
>      Reuven
>     
>      On Mon, May 

Re: Hazelcast Jet Runner

2019-07-01 Thread Kenneth Knowles
On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía  wrote:

> Seems the discussion moved a bit of my original intent that was to
> make the Jet runner directory to be just called runners/jet in the
> directory and mark the 'experimental' part of it in documentation as
> we do for all other things in Beam.
>

Thanks for returning to the one question at hand. We don't have to make an
overall decision about all "experimental" things.


> Can we do this or is there still any considerable argument to not do it?
>

I think we actually have some competing goals:

I agree 100% on the arguments, but let’s think in the reverse terms,
> highlighting lack of maturity can play against the intended goal of
> use and adoption even if for a noble reason. It is basic priming 101
> [1].


_My_ goal is exactly to highlight lack of maturity so that users are not
harmed by either (1) necessary breaking changes or (2) permanent low
quality. Only users who are willing to follow along with the project and
update their own code regularly should use experimental features.

Evaluating the Jet runner I am convinced by your arguments, because looking
at the two dangers:
(1) necessary breaking changes -- runners don't really have their own APIs
to break, except their own small set of APIs and pipeline options
(2) permanent low quality -- because there is no API design possible,
there's no risk of permanent low quality except by fundamental mismatches.
Plus as you mention the testing is already quite good.

So I am OK to not call it experimental. But I have a slight remaining
concern that it did not really go through what other runners went through.
I hope this just means it is more mature. I hope it does not indicate that
we are reducing rigor.

Kenn


> On Wed, May 29, 2019 at 3:02 PM Reza Rokni  wrote:
> >
> > Hi,
> >
> > Over 800 usages under java, might be worth doing a few PR...
> >
> > Also suggest we use a very light review process: First round go for low
> hanging fruit, if anyone does a -1 against a change then we leave that for
> round two.
> >
> > Thoughts?
> >
> > Cheers
> >
> > Reza
> >
> > On Wed, 29 May 2019 at 12:05, Kenneth Knowles  wrote:
> >>
> >>
> >>
> >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni  wrote:
> >>>
> >>> "Many APIs that have been in place for years and are used by most Beam
> users are still marked Experimental."
> >>>
> >>> Should there be a formal process in place to start 'graduating'
> features out of @Experimental? Perhaps even target an up coming release
> with a PR to remove the annotation from well established API's?
> >>
> >>
> >> Good idea. I think a PR like this would be an opportunity to discuss
> whether the feature is non-experimental. Probably many of them are ready.
> It would help to address Ismael's very good point that this new practice
> could make users think the old Experimental stuff is not experimental.
> Maybe it is true that it is not really still Experimental.
> >>
> >> Kenn
> >>
> >>
> >>>
> >>> On Tue, 28 May 2019 at 06:44, Reuven Lax  wrote:
> 
>  We generally use Experimental for two different things, which leads
> to confusion.
>    1. Features that work stably, but where we think we might still
> make some changes to the API.
>    2. New features that we think might not yet be stable.
> 
>  This dual usage leads to a lot of confusion IMO. The fact that we
> tend to forget to remove the @Experimental tag also makes it somewhat
> useless. Many APIs that have been in place for years and are used by most
> Beam users are still marked Experimental.
> 
>  Reuven
> 
>  On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía 
> wrote:
> >
> > > Personally, I think that it is good that moving from experimental
> to non-experimental is a breaking change in the dependency - one has
> backwards-incompatible changes and the other does not. If artifacts had
> separate versioning we could use 0.x for this.
> >
> > In theory it seems so, but in practice it is an annoyance to an end
> > user that already took the ‘risk’ of using an experimental feature.
> > Awareness is probably not the most important reason to break existing
> > code (even if it could be easily fixed). The alternative of doing
> this
> > with version numbers at least seems less impacting but can be
> > confusing.
> >
> > > But biggest motivation for me are these:
> > >
> > >  - using experimental features should be opt-in
> > >  - should be impossible to use an experimental feature without
> knowing it (so "opt-in" to a normal-looking feature is not enough)
> > > - developers of an experimental feature should be motivated to
> "graduate" it
> >
> > The fundamental problem of this approach is inconsistency with our
> > present/past. So far we have ‘Experimental’ features everywhere. So
> > suddenly becoming opt-in let us in an inconsistent state. For example
> > all IOs are marked internally as Experimental but not at 

Re: Hazelcast Jet Runner

2019-06-12 Thread Ismaël Mejía
Seems the discussion moved a bit of my original intent that was to
make the Jet runner directory to be just called runners/jet in the
directory and mark the 'experimental' part of it in documentation as
we do for all other things in Beam.
Can we do this or is there still any considerable argument to not do it?

On Wed, May 29, 2019 at 3:02 PM Reza Rokni  wrote:
>
> Hi,
>
> Over 800 usages under java, might be worth doing a few PR...
>
> Also suggest we use a very light review process: First round go for low 
> hanging fruit, if anyone does a -1 against a change then we leave that for 
> round two.
>
> Thoughts?
>
> Cheers
>
> Reza
>
> On Wed, 29 May 2019 at 12:05, Kenneth Knowles  wrote:
>>
>>
>>
>> On Mon, May 27, 2019 at 4:05 PM Reza Rokni  wrote:
>>>
>>> "Many APIs that have been in place for years and are used by most Beam 
>>> users are still marked Experimental."
>>>
>>> Should there be a formal process in place to start 'graduating' features 
>>> out of @Experimental? Perhaps even target an up coming release with a PR to 
>>> remove the annotation from well established API's?
>>
>>
>> Good idea. I think a PR like this would be an opportunity to discuss whether 
>> the feature is non-experimental. Probably many of them are ready. It would 
>> help to address Ismael's very good point that this new practice could make 
>> users think the old Experimental stuff is not experimental. Maybe it is true 
>> that it is not really still Experimental.
>>
>> Kenn
>>
>>
>>>
>>> On Tue, 28 May 2019 at 06:44, Reuven Lax  wrote:

 We generally use Experimental for two different things, which leads to 
 confusion.
   1. Features that work stably, but where we think we might still make 
 some changes to the API.
   2. New features that we think might not yet be stable.

 This dual usage leads to a lot of confusion IMO. The fact that we tend to 
 forget to remove the @Experimental tag also makes it somewhat useless. 
 Many APIs that have been in place for years and are used by most Beam 
 users are still marked Experimental.

 Reuven

 On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía  wrote:
>
> > Personally, I think that it is good that moving from experimental to 
> > non-experimental is a breaking change in the dependency - one has 
> > backwards-incompatible changes and the other does not. If artifacts had 
> > separate versioning we could use 0.x for this.
>
> In theory it seems so, but in practice it is an annoyance to an end
> user that already took the ‘risk’ of using an experimental feature.
> Awareness is probably not the most important reason to break existing
> code (even if it could be easily fixed). The alternative of doing this
> with version numbers at least seems less impacting but can be
> confusing.
>
> > But biggest motivation for me are these:
> >
> >  - using experimental features should be opt-in
> >  - should be impossible to use an experimental feature without knowing 
> > it (so "opt-in" to a normal-looking feature is not enough)
> > - developers of an experimental feature should be motivated to 
> > "graduate" it
>
> The fundamental problem of this approach is inconsistency with our
> present/past. So far we have ‘Experimental’ features everywhere. So
> suddenly becoming opt-in let us in an inconsistent state. For example
> all IOs are marked internally as Experimental but not at the level of
> directories/artifacts. Adding this suffix in a new IO apart of adding
> fear of use to the end users may also give the fake impression that
> the older ones not explicitly marked are not experimental.
>
> What will be the state for example in the case of runner modules that
> contain both mature and well tested runners like old Flink and Spark
> runners vs the more experimental new translations for Portability,
> again more confusion.
>
> > FWIW I don't think "experimental" should be viewed as a bad thing. It 
> > just means you are able to make backwards-incompatible changes, and 
> > that users should be aware that they will need to adjust APIs (probably 
> > only a little) with new releases. Most software is not very good until 
> > it has been around for a long time, and in my experience the problem is 
> > missing the mark on abstractions, so backwards compatibility *must* be 
> > broken to achieve quality. Freezing it early dooms it to never 
> > achieving high quality. I know of projects where the users explicitly 
> > requested that the developers not freeze the API but instead prioritize 
> > speed and quality.
>
> I agree 100% on the arguments, but let’s think in the reverse terms,
> highlighting lack of maturity can play against the intended goal of
> use and adoption even if for a noble reason. It is basic priming 101
> [1].
>
> > 

Re: Hazelcast Jet Runner

2019-05-29 Thread Reza Rokni
Hi,

Over 800 usages under java, might be worth doing a few PR...

Also suggest we use a very light review process: First round go for low
hanging fruit, if anyone does a -1 against a change then we leave that for
round two.

Thoughts?

Cheers

Reza

On Wed, 29 May 2019 at 12:05, Kenneth Knowles  wrote:

>
>
> On Mon, May 27, 2019 at 4:05 PM Reza Rokni  wrote:
>
>> "Many APIs that have been in place for years and are used by most Beam
>> users are still marked Experimental."
>>
>> Should there be a formal process in place to start 'graduating' features
>> out of @Experimental? Perhaps even target an up coming release with a PR to
>> remove the annotation from well established API's?
>>
>
> Good idea. I think a PR like this would be an opportunity to discuss
> whether the feature is non-experimental. Probably many of them are ready.
> It would help to address Ismael's very good point that this new practice
> could make users think the old Experimental stuff is not experimental.
> Maybe it is true that it is not really still Experimental.
>
> Kenn
>
>
>
>> On Tue, 28 May 2019 at 06:44, Reuven Lax  wrote:
>>
>>> We generally use Experimental for two different things, which leads to
>>> confusion.
>>>   1. Features that work stably, but where we think we might still make
>>> some changes to the API.
>>>   2. New features that we think might not yet be stable.
>>>
>>> This dual usage leads to a lot of confusion IMO. The fact that we tend
>>> to forget to remove the @Experimental tag also makes it somewhat useless.
>>> Many APIs that have been in place for years and are used by most Beam users
>>> are still marked Experimental.
>>>
>>> Reuven
>>>
>>> On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía  wrote:
>>>
 > Personally, I think that it is good that moving from experimental to
 non-experimental is a breaking change in the dependency - one has
 backwards-incompatible changes and the other does not. If artifacts had
 separate versioning we could use 0.x for this.

 In theory it seems so, but in practice it is an annoyance to an end
 user that already took the ‘risk’ of using an experimental feature.
 Awareness is probably not the most important reason to break existing
 code (even if it could be easily fixed). The alternative of doing this
 with version numbers at least seems less impacting but can be
 confusing.

 > But biggest motivation for me are these:
 >
 >  - using experimental features should be opt-in
 >  - should be impossible to use an experimental feature without
 knowing it (so "opt-in" to a normal-looking feature is not enough)
 > - developers of an experimental feature should be motivated to
 "graduate" it

 The fundamental problem of this approach is inconsistency with our
 present/past. So far we have ‘Experimental’ features everywhere. So
 suddenly becoming opt-in let us in an inconsistent state. For example
 all IOs are marked internally as Experimental but not at the level of
 directories/artifacts. Adding this suffix in a new IO apart of adding
 fear of use to the end users may also give the fake impression that
 the older ones not explicitly marked are not experimental.

 What will be the state for example in the case of runner modules that
 contain both mature and well tested runners like old Flink and Spark
 runners vs the more experimental new translations for Portability,
 again more confusion.

 > FWIW I don't think "experimental" should be viewed as a bad thing. It
 just means you are able to make backwards-incompatible changes, and that
 users should be aware that they will need to adjust APIs (probably only a
 little) with new releases. Most software is not very good until it has been
 around for a long time, and in my experience the problem is missing the
 mark on abstractions, so backwards compatibility *must* be broken to
 achieve quality. Freezing it early dooms it to never achieving high
 quality. I know of projects where the users explicitly requested that the
 developers not freeze the API but instead prioritize speed and quality.

 I agree 100% on the arguments, but let’s think in the reverse terms,
 highlighting lack of maturity can play against the intended goal of
 use and adoption even if for a noble reason. It is basic priming 101
 [1].

 > Maybe the word is just too negative-sounding? Alternatives might be
 "unstable" or "incubating".

 Yes! “experimental” should not be viewed as a bad thing unless you are
 a company that has less resources and is trying to protect its
 investment so in that case they may doubt to use it. In this case
 probably incubating is a better term because it has less of the
 ‘tentative’ dimension associated with Experimental.

 > Now, for the Jet runner, most runners sit on a branch for a while,
 not being 

Re: Hazelcast Jet Runner

2019-05-28 Thread Kenneth Knowles
On Mon, May 27, 2019 at 4:05 PM Reza Rokni  wrote:

> "Many APIs that have been in place for years and are used by most Beam
> users are still marked Experimental."
>
> Should there be a formal process in place to start 'graduating' features
> out of @Experimental? Perhaps even target an up coming release with a PR to
> remove the annotation from well established API's?
>

Good idea. I think a PR like this would be an opportunity to discuss
whether the feature is non-experimental. Probably many of them are ready.
It would help to address Ismael's very good point that this new practice
could make users think the old Experimental stuff is not experimental.
Maybe it is true that it is not really still Experimental.

Kenn



> On Tue, 28 May 2019 at 06:44, Reuven Lax  wrote:
>
>> We generally use Experimental for two different things, which leads to
>> confusion.
>>   1. Features that work stably, but where we think we might still make
>> some changes to the API.
>>   2. New features that we think might not yet be stable.
>>
>> This dual usage leads to a lot of confusion IMO. The fact that we tend to
>> forget to remove the @Experimental tag also makes it somewhat useless. Many
>> APIs that have been in place for years and are used by most Beam users are
>> still marked Experimental.
>>
>> Reuven
>>
>> On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía  wrote:
>>
>>> > Personally, I think that it is good that moving from experimental to
>>> non-experimental is a breaking change in the dependency - one has
>>> backwards-incompatible changes and the other does not. If artifacts had
>>> separate versioning we could use 0.x for this.
>>>
>>> In theory it seems so, but in practice it is an annoyance to an end
>>> user that already took the ‘risk’ of using an experimental feature.
>>> Awareness is probably not the most important reason to break existing
>>> code (even if it could be easily fixed). The alternative of doing this
>>> with version numbers at least seems less impacting but can be
>>> confusing.
>>>
>>> > But biggest motivation for me are these:
>>> >
>>> >  - using experimental features should be opt-in
>>> >  - should be impossible to use an experimental feature without knowing
>>> it (so "opt-in" to a normal-looking feature is not enough)
>>> > - developers of an experimental feature should be motivated to
>>> "graduate" it
>>>
>>> The fundamental problem of this approach is inconsistency with our
>>> present/past. So far we have ‘Experimental’ features everywhere. So
>>> suddenly becoming opt-in let us in an inconsistent state. For example
>>> all IOs are marked internally as Experimental but not at the level of
>>> directories/artifacts. Adding this suffix in a new IO apart of adding
>>> fear of use to the end users may also give the fake impression that
>>> the older ones not explicitly marked are not experimental.
>>>
>>> What will be the state for example in the case of runner modules that
>>> contain both mature and well tested runners like old Flink and Spark
>>> runners vs the more experimental new translations for Portability,
>>> again more confusion.
>>>
>>> > FWIW I don't think "experimental" should be viewed as a bad thing. It
>>> just means you are able to make backwards-incompatible changes, and that
>>> users should be aware that they will need to adjust APIs (probably only a
>>> little) with new releases. Most software is not very good until it has been
>>> around for a long time, and in my experience the problem is missing the
>>> mark on abstractions, so backwards compatibility *must* be broken to
>>> achieve quality. Freezing it early dooms it to never achieving high
>>> quality. I know of projects where the users explicitly requested that the
>>> developers not freeze the API but instead prioritize speed and quality.
>>>
>>> I agree 100% on the arguments, but let’s think in the reverse terms,
>>> highlighting lack of maturity can play against the intended goal of
>>> use and adoption even if for a noble reason. It is basic priming 101
>>> [1].
>>>
>>> > Maybe the word is just too negative-sounding? Alternatives might be
>>> "unstable" or "incubating".
>>>
>>> Yes! “experimental” should not be viewed as a bad thing unless you are
>>> a company that has less resources and is trying to protect its
>>> investment so in that case they may doubt to use it. In this case
>>> probably incubating is a better term because it has less of the
>>> ‘tentative’ dimension associated with Experimental.
>>>
>>> > Now, for the Jet runner, most runners sit on a branch for a while, not
>>> being released at all, and move to master as their "graduation". I think
>>> releasing under an "experimental" name is an improvement, making it
>>> available to users to try out. But we probably should have discussed before
>>> doing something different than all the other runners.
>>>
>>> There is something I don’t get in the case of Jet runner. From the
>>> discussion in this thread it seems it has everything required to 

Re: Hazelcast Jet Runner

2019-05-28 Thread Kenneth Knowles
On Mon, May 27, 2019 at 3:44 PM Reuven Lax  wrote:

> We generally use Experimental for two different things, which leads to
> confusion.
>   1. Features that work stably, but where we think we might still make
> some changes to the API.
>   2. New features that we think might not yet be stable.
>

Part of my point is that these tend to be related. Often you discover that
you cannot achieve high quality without changing the API. I think once
quality is achieved, verified, and assured it can graduate from being
Experimental. We may still have a better idea later, but we can probably
just give it a different name.

Kenn




> This dual usage leads to a lot of confusion IMO. The fact that we tend to
> forget to remove the @Experimental tag also makes it somewhat useless. Many
> APIs that have been in place for years and are used by most Beam users are
> still marked Experimental.
>
> Reuven
>
> On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía  wrote:
>
>> > Personally, I think that it is good that moving from experimental to
>> non-experimental is a breaking change in the dependency - one has
>> backwards-incompatible changes and the other does not. If artifacts had
>> separate versioning we could use 0.x for this.
>>
>> In theory it seems so, but in practice it is an annoyance to an end
>> user that already took the ‘risk’ of using an experimental feature.
>> Awareness is probably not the most important reason to break existing
>> code (even if it could be easily fixed). The alternative of doing this
>> with version numbers at least seems less impacting but can be
>> confusing.
>>
>> > But biggest motivation for me are these:
>> >
>> >  - using experimental features should be opt-in
>> >  - should be impossible to use an experimental feature without knowing
>> it (so "opt-in" to a normal-looking feature is not enough)
>> > - developers of an experimental feature should be motivated to
>> "graduate" it
>>
>> The fundamental problem of this approach is inconsistency with our
>> present/past. So far we have ‘Experimental’ features everywhere. So
>> suddenly becoming opt-in let us in an inconsistent state. For example
>> all IOs are marked internally as Experimental but not at the level of
>> directories/artifacts. Adding this suffix in a new IO apart of adding
>> fear of use to the end users may also give the fake impression that
>> the older ones not explicitly marked are not experimental.
>>
>> What will be the state for example in the case of runner modules that
>> contain both mature and well tested runners like old Flink and Spark
>> runners vs the more experimental new translations for Portability,
>> again more confusion.
>>
>> > FWIW I don't think "experimental" should be viewed as a bad thing. It
>> just means you are able to make backwards-incompatible changes, and that
>> users should be aware that they will need to adjust APIs (probably only a
>> little) with new releases. Most software is not very good until it has been
>> around for a long time, and in my experience the problem is missing the
>> mark on abstractions, so backwards compatibility *must* be broken to
>> achieve quality. Freezing it early dooms it to never achieving high
>> quality. I know of projects where the users explicitly requested that the
>> developers not freeze the API but instead prioritize speed and quality.
>>
>> I agree 100% on the arguments, but let’s think in the reverse terms,
>> highlighting lack of maturity can play against the intended goal of
>> use and adoption even if for a noble reason. It is basic priming 101
>> [1].
>>
>> > Maybe the word is just too negative-sounding? Alternatives might be
>> "unstable" or "incubating".
>>
>> Yes! “experimental” should not be viewed as a bad thing unless you are
>> a company that has less resources and is trying to protect its
>> investment so in that case they may doubt to use it. In this case
>> probably incubating is a better term because it has less of the
>> ‘tentative’ dimension associated with Experimental.
>>
>> > Now, for the Jet runner, most runners sit on a branch for a while, not
>> being released at all, and move to master as their "graduation". I think
>> releasing under an "experimental" name is an improvement, making it
>> available to users to try out. But we probably should have discussed before
>> doing something different than all the other runners.
>>
>> There is something I don’t get in the case of Jet runner. From the
>> discussion in this thread it seems it has everything required to not
>> be ‘experimental’. It passes ValidatesRunner and can even run Nexmark
>> that’s more that some runners already merged in master, so I still
>> don’t get why we want to give it a different connotation.
>>
>> [1] https://en.wikipedia.org/wiki/Priming_(psychology)
>>
>> On Sun, May 26, 2019 at 4:43 AM Kenneth Knowles  wrote:
>> >
>> > Personally, I think that it is good that moving from experimental to
>> non-experimental is a breaking change in the dependency - one has

Re: Hazelcast Jet Runner

2019-05-27 Thread Reza Rokni
"Many APIs that have been in place for years and are used by most Beam
users are still marked Experimental."

Should there be a formal process in place to start 'graduating' features
out of @Experimental? Perhaps even target an up coming release with a PR to
remove the annotation from well established API's?

On Tue, 28 May 2019 at 06:44, Reuven Lax  wrote:

> We generally use Experimental for two different things, which leads to
> confusion.
>   1. Features that work stably, but where we think we might still make
> some changes to the API.
>   2. New features that we think might not yet be stable.
>
> This dual usage leads to a lot of confusion IMO. The fact that we tend to
> forget to remove the @Experimental tag also makes it somewhat useless. Many
> APIs that have been in place for years and are used by most Beam users are
> still marked Experimental.
>
> Reuven
>
> On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía  wrote:
>
>> > Personally, I think that it is good that moving from experimental to
>> non-experimental is a breaking change in the dependency - one has
>> backwards-incompatible changes and the other does not. If artifacts had
>> separate versioning we could use 0.x for this.
>>
>> In theory it seems so, but in practice it is an annoyance to an end
>> user that already took the ‘risk’ of using an experimental feature.
>> Awareness is probably not the most important reason to break existing
>> code (even if it could be easily fixed). The alternative of doing this
>> with version numbers at least seems less impacting but can be
>> confusing.
>>
>> > But biggest motivation for me are these:
>> >
>> >  - using experimental features should be opt-in
>> >  - should be impossible to use an experimental feature without knowing
>> it (so "opt-in" to a normal-looking feature is not enough)
>> > - developers of an experimental feature should be motivated to
>> "graduate" it
>>
>> The fundamental problem of this approach is inconsistency with our
>> present/past. So far we have ‘Experimental’ features everywhere. So
>> suddenly becoming opt-in let us in an inconsistent state. For example
>> all IOs are marked internally as Experimental but not at the level of
>> directories/artifacts. Adding this suffix in a new IO apart of adding
>> fear of use to the end users may also give the fake impression that
>> the older ones not explicitly marked are not experimental.
>>
>> What will be the state for example in the case of runner modules that
>> contain both mature and well tested runners like old Flink and Spark
>> runners vs the more experimental new translations for Portability,
>> again more confusion.
>>
>> > FWIW I don't think "experimental" should be viewed as a bad thing. It
>> just means you are able to make backwards-incompatible changes, and that
>> users should be aware that they will need to adjust APIs (probably only a
>> little) with new releases. Most software is not very good until it has been
>> around for a long time, and in my experience the problem is missing the
>> mark on abstractions, so backwards compatibility *must* be broken to
>> achieve quality. Freezing it early dooms it to never achieving high
>> quality. I know of projects where the users explicitly requested that the
>> developers not freeze the API but instead prioritize speed and quality.
>>
>> I agree 100% on the arguments, but let’s think in the reverse terms,
>> highlighting lack of maturity can play against the intended goal of
>> use and adoption even if for a noble reason. It is basic priming 101
>> [1].
>>
>> > Maybe the word is just too negative-sounding? Alternatives might be
>> "unstable" or "incubating".
>>
>> Yes! “experimental” should not be viewed as a bad thing unless you are
>> a company that has less resources and is trying to protect its
>> investment so in that case they may doubt to use it. In this case
>> probably incubating is a better term because it has less of the
>> ‘tentative’ dimension associated with Experimental.
>>
>> > Now, for the Jet runner, most runners sit on a branch for a while, not
>> being released at all, and move to master as their "graduation". I think
>> releasing under an "experimental" name is an improvement, making it
>> available to users to try out. But we probably should have discussed before
>> doing something different than all the other runners.
>>
>> There is something I don’t get in the case of Jet runner. From the
>> discussion in this thread it seems it has everything required to not
>> be ‘experimental’. It passes ValidatesRunner and can even run Nexmark
>> that’s more that some runners already merged in master, so I still
>> don’t get why we want to give it a different connotation.
>>
>> [1] https://en.wikipedia.org/wiki/Priming_(psychology)
>>
>> On Sun, May 26, 2019 at 4:43 AM Kenneth Knowles  wrote:
>> >
>> > Personally, I think that it is good that moving from experimental to
>> non-experimental is a breaking change in the dependency - one has
>> 

Re: Hazelcast Jet Runner

2019-05-27 Thread Reuven Lax
We generally use Experimental for two different things, which leads to
confusion.
  1. Features that work stably, but where we think we might still make some
changes to the API.
  2. New features that we think might not yet be stable.

This dual usage leads to a lot of confusion IMO. The fact that we tend to
forget to remove the @Experimental tag also makes it somewhat useless. Many
APIs that have been in place for years and are used by most Beam users are
still marked Experimental.

Reuven

On Mon, May 27, 2019 at 2:16 PM Ismaël Mejía  wrote:

> > Personally, I think that it is good that moving from experimental to
> non-experimental is a breaking change in the dependency - one has
> backwards-incompatible changes and the other does not. If artifacts had
> separate versioning we could use 0.x for this.
>
> In theory it seems so, but in practice it is an annoyance to an end
> user that already took the ‘risk’ of using an experimental feature.
> Awareness is probably not the most important reason to break existing
> code (even if it could be easily fixed). The alternative of doing this
> with version numbers at least seems less impacting but can be
> confusing.
>
> > But biggest motivation for me are these:
> >
> >  - using experimental features should be opt-in
> >  - should be impossible to use an experimental feature without knowing
> it (so "opt-in" to a normal-looking feature is not enough)
> > - developers of an experimental feature should be motivated to
> "graduate" it
>
> The fundamental problem of this approach is inconsistency with our
> present/past. So far we have ‘Experimental’ features everywhere. So
> suddenly becoming opt-in let us in an inconsistent state. For example
> all IOs are marked internally as Experimental but not at the level of
> directories/artifacts. Adding this suffix in a new IO apart of adding
> fear of use to the end users may also give the fake impression that
> the older ones not explicitly marked are not experimental.
>
> What will be the state for example in the case of runner modules that
> contain both mature and well tested runners like old Flink and Spark
> runners vs the more experimental new translations for Portability,
> again more confusion.
>
> > FWIW I don't think "experimental" should be viewed as a bad thing. It
> just means you are able to make backwards-incompatible changes, and that
> users should be aware that they will need to adjust APIs (probably only a
> little) with new releases. Most software is not very good until it has been
> around for a long time, and in my experience the problem is missing the
> mark on abstractions, so backwards compatibility *must* be broken to
> achieve quality. Freezing it early dooms it to never achieving high
> quality. I know of projects where the users explicitly requested that the
> developers not freeze the API but instead prioritize speed and quality.
>
> I agree 100% on the arguments, but let’s think in the reverse terms,
> highlighting lack of maturity can play against the intended goal of
> use and adoption even if for a noble reason. It is basic priming 101
> [1].
>
> > Maybe the word is just too negative-sounding? Alternatives might be
> "unstable" or "incubating".
>
> Yes! “experimental” should not be viewed as a bad thing unless you are
> a company that has less resources and is trying to protect its
> investment so in that case they may doubt to use it. In this case
> probably incubating is a better term because it has less of the
> ‘tentative’ dimension associated with Experimental.
>
> > Now, for the Jet runner, most runners sit on a branch for a while, not
> being released at all, and move to master as their "graduation". I think
> releasing under an "experimental" name is an improvement, making it
> available to users to try out. But we probably should have discussed before
> doing something different than all the other runners.
>
> There is something I don’t get in the case of Jet runner. From the
> discussion in this thread it seems it has everything required to not
> be ‘experimental’. It passes ValidatesRunner and can even run Nexmark
> that’s more that some runners already merged in master, so I still
> don’t get why we want to give it a different connotation.
>
> [1] https://en.wikipedia.org/wiki/Priming_(psychology)
>
> On Sun, May 26, 2019 at 4:43 AM Kenneth Knowles  wrote:
> >
> > Personally, I think that it is good that moving from experimental to
> non-experimental is a breaking change in the dependency - one has
> backwards-incompatible changes and the other does not. If artifacts had
> separate versioning we could use 0.x for this.
> >
> > But biggest motivation for me are these:
> >
> >  - using experimental features should be opt-in
> >  - should be impossible to use an experimental feature without knowing
> it (so "opt-in" to a normal-looking feature is not enough)
> >  - developers of an experimental feature should be motivated to
> "graduate" it
> >
> > So I think a user of an 

Re: Hazelcast Jet Runner

2019-05-27 Thread Ismaël Mejía
> Personally, I think that it is good that moving from experimental to 
> non-experimental is a breaking change in the dependency - one has 
> backwards-incompatible changes and the other does not. If artifacts had 
> separate versioning we could use 0.x for this.

In theory it seems so, but in practice it is an annoyance to an end
user that already took the ‘risk’ of using an experimental feature.
Awareness is probably not the most important reason to break existing
code (even if it could be easily fixed). The alternative of doing this
with version numbers at least seems less impacting but can be
confusing.

> But biggest motivation for me are these:
>
>  - using experimental features should be opt-in
>  - should be impossible to use an experimental feature without knowing it (so 
> "opt-in" to a normal-looking feature is not enough)
> - developers of an experimental feature should be motivated to "graduate" it

The fundamental problem of this approach is inconsistency with our
present/past. So far we have ‘Experimental’ features everywhere. So
suddenly becoming opt-in let us in an inconsistent state. For example
all IOs are marked internally as Experimental but not at the level of
directories/artifacts. Adding this suffix in a new IO apart of adding
fear of use to the end users may also give the fake impression that
the older ones not explicitly marked are not experimental.

What will be the state for example in the case of runner modules that
contain both mature and well tested runners like old Flink and Spark
runners vs the more experimental new translations for Portability,
again more confusion.

> FWIW I don't think "experimental" should be viewed as a bad thing. It just 
> means you are able to make backwards-incompatible changes, and that users 
> should be aware that they will need to adjust APIs (probably only a little) 
> with new releases. Most software is not very good until it has been around 
> for a long time, and in my experience the problem is missing the mark on 
> abstractions, so backwards compatibility *must* be broken to achieve quality. 
> Freezing it early dooms it to never achieving high quality. I know of 
> projects where the users explicitly requested that the developers not freeze 
> the API but instead prioritize speed and quality.

I agree 100% on the arguments, but let’s think in the reverse terms,
highlighting lack of maturity can play against the intended goal of
use and adoption even if for a noble reason. It is basic priming 101
[1].

> Maybe the word is just too negative-sounding? Alternatives might be 
> "unstable" or "incubating".

Yes! “experimental” should not be viewed as a bad thing unless you are
a company that has less resources and is trying to protect its
investment so in that case they may doubt to use it. In this case
probably incubating is a better term because it has less of the
‘tentative’ dimension associated with Experimental.

> Now, for the Jet runner, most runners sit on a branch for a while, not being 
> released at all, and move to master as their "graduation". I think releasing 
> under an "experimental" name is an improvement, making it available to users 
> to try out. But we probably should have discussed before doing something 
> different than all the other runners.

There is something I don’t get in the case of Jet runner. From the
discussion in this thread it seems it has everything required to not
be ‘experimental’. It passes ValidatesRunner and can even run Nexmark
that’s more that some runners already merged in master, so I still
don’t get why we want to give it a different connotation.

[1] https://en.wikipedia.org/wiki/Priming_(psychology)

On Sun, May 26, 2019 at 4:43 AM Kenneth Knowles  wrote:
>
> Personally, I think that it is good that moving from experimental to 
> non-experimental is a breaking change in the dependency - one has 
> backwards-incompatible changes and the other does not. If artifacts had 
> separate versioning we could use 0.x for this.
>
> But biggest motivation for me are these:
>
>  - using experimental features should be opt-in
>  - should be impossible to use an experimental feature without knowing it (so 
> "opt-in" to a normal-looking feature is not enough)
>  - developers of an experimental feature should be motivated to "graduate" it
>
> So I think a user of an experimental feature should have to actually type the 
> word "experimental" either on the command line or in their dependencies. 
> That's just my opinion. In the thread [1] myself and Robert were the ones 
> that went in this direction of opt-in. But it was mostly lazy consensus, plus 
> the review on the pull request, that got us to this state. Definitely worth 
> discussing more.
>
> FWIW I don't think "experimental" should be viewed as a bad thing. It just 
> means you are able to make backwards-incompatible changes, and that users 
> should be aware that they will need to adjust APIs (probably only a little) 
> with new releases. Most 

Re: Hazelcast Jet Runner

2019-05-27 Thread Robert Burke
(minor related tangent for additional perspective)
+1 from the perspective of SDKs on moving from experimental to production
versions being a breaking change.
I've long posited that the Go SDK, as it's currently experimental, is v0.X,
and some breaking changes have been made accordingly.

Once the Go SDKs versioning and dependency management is meaningfully
implemented on, this would become easier.

On Mon, May 27, 2019, 8:13 AM Robert Bradshaw  wrote:

> I also favor explicit opt-in, especially when you're mixing mature and
> new components.
>
> A differently-named, but still published, artifact seems preferable
> IMHO to long lived branches. I don't have a handle on how problematic
> this would be in practice (e.g. how would a user know to update the
> name. Would they be encountering strange errors building the latest
> beam with an old (still "latest") incubating package). How hard would
> it be to write code that can build against versions that span this
> transition?) My feeling is that we have sufficient tooling to do a
> good job here.
>
> On Sun, May 26, 2019 at 4:43 AM Kenneth Knowles  wrote:
> >
> > Personally, I think that it is good that moving from experimental to
> non-experimental is a breaking change in the dependency - one has
> backwards-incompatible changes and the other does not. If artifacts had
> separate versioning we could use 0.x for this.
> >
> > But biggest motivation for me are these:
> >
> >  - using experimental features should be opt-in
> >  - should be impossible to use an experimental feature without knowing
> it (so "opt-in" to a normal-looking feature is not enough)
> >  - developers of an experimental feature should be motivated to
> "graduate" it
> >
> > So I think a user of an experimental feature should have to actually
> type the word "experimental" either on the command line or in their
> dependencies. That's just my opinion. In the thread [1] myself and Robert
> were the ones that went in this direction of opt-in. But it was mostly lazy
> consensus, plus the review on the pull request, that got us to this state.
> Definitely worth discussing more.
> >
> > FWIW I don't think "experimental" should be viewed as a bad thing. It
> just means you are able to make backwards-incompatible changes, and that
> users should be aware that they will need to adjust APIs (probably only a
> little) with new releases. Most software is not very good until it has been
> around for a long time, and in my experience the problem is missing the
> mark on abstractions, so backwards compatibility *must* be broken to
> achieve quality. Freezing it early dooms it to never achieving high
> quality. I know of projects where the users explicitly requested that the
> developers not freeze the API but instead prioritize speed and quality.
> >
> > Maybe the word is just too negative-sounding? Alternatives might be
> "unstable" or "incubating".
> >
> > Now, for the Jet runner, most runners sit on a branch for a while, not
> being released at all, and move to master as their "graduation". I think
> releasing under an "experimental" name is an improvement, making it
> available to users to try out. But we probably should have discussed before
> doing something different than all the other runners.
> >
> > Kenn
> >
> > [1]
> https://lists.apache.org/thread.html/302bd51c77feb5c9ce39882316d391535a0fc92e7608a623d9139160@%3Cdev.beam.apache.org%3E
> >
> > On Sat, May 25, 2019 at 1:03 AM Ismaël Mejía  wrote:
> >>
> >> Including the experimental suffix in artifact names is not a good idea
> >> either because once we decide that it is not experimantal anymore this
> >> will be a breaking change for users who will need then to update its
> >> dependencies code. Also it is error-prone to use different mappings
> >> for directories and artifacts (even if possible).
> >>
> >> May we reconsider this Kenn? I understand the motivation but I hardly
> >> see this making things better or more clear. Any runner user will end
> >> up reading the runner documentation and capability matrix so he will
> >> catch the current status that way.
> >>
> >>
> >>
> >> On Sat, May 25, 2019 at 8:35 AM Jozsef Bartok 
> wrote:
> >> >
> >> > I missed Ken's input when writing my previous mail. Sorry.
> >> > So, to recap: I should remove "experimental" from any directory
> names, but find an other way of configuring the artifact so that it still
> has "experimental" in it's name.
> >> > Right?
> >> >
> >> > On Sat, May 25, 2019 at 9:32 AM Jozsef Bartok 
> wrote:
> >> >>
> >> >> Yes, I'll gladly fix it, we aren't particularly keen to be labeled
> as experimental either..
> >> >>
> >> >> Btw. initially the "experimental" word was only in the Gradle module
> name, but then there was some change
> >> >> ([BEAM-4046] decouple gradle project names and maven artifact ids -
> 4/2/19) which kind of ended up
> >> >> putting it in the directory name. Maybe I should have merged with
> that differently, but this is how
> >> >> it seemed consistent.
> 

Re: Hazelcast Jet Runner

2019-05-25 Thread Kenneth Knowles
Personally, I think that it is good that moving from experimental to
non-experimental is a breaking change in the dependency - one has
backwards-incompatible changes and the other does not. If artifacts had
separate versioning we could use 0.x for this.

But biggest motivation for me are these:

 - using experimental features should be opt-in
 - should be impossible to use an experimental feature without knowing it
(so "opt-in" to a normal-looking feature is not enough)
 - developers of an experimental feature should be motivated to "graduate"
it

So I think a user of an experimental feature should have to actually type
the word "experimental" either on the command line or in their
dependencies. That's just my opinion. In the thread [1] myself and Robert
were the ones that went in this direction of opt-in. But it was mostly lazy
consensus, plus the review on the pull request, that got us to this state.
Definitely worth discussing more.

FWIW I don't think "experimental" should be viewed as a bad thing. It just
means you are able to make backwards-incompatible changes, and that users
should be aware that they will need to adjust APIs (probably only a little)
with new releases. Most software is not very good until it has been around
for a long time, and in my experience the problem is missing the mark on
abstractions, so backwards compatibility *must* be broken to achieve
quality. Freezing it early dooms it to never achieving high quality. I know
of projects where the users explicitly requested that the developers not
freeze the API but instead prioritize speed and quality.

Maybe the word is just too negative-sounding? Alternatives might be
"unstable" or "incubating".

Now, for the Jet runner, most runners sit on a branch for a while, not
being released at all, and move to master as their "graduation". I think
releasing under an "experimental" name is an improvement, making it
available to users to try out. But we probably should have discussed before
doing something different than all the other runners.

Kenn

[1]
https://lists.apache.org/thread.html/302bd51c77feb5c9ce39882316d391535a0fc92e7608a623d9139160@%3Cdev.beam.apache.org%3E

On Sat, May 25, 2019 at 1:03 AM Ismaël Mejía  wrote:

> Including the experimental suffix in artifact names is not a good idea
> either because once we decide that it is not experimantal anymore this
> will be a breaking change for users who will need then to update its
> dependencies code. Also it is error-prone to use different mappings
> for directories and artifacts (even if possible).
>
> May we reconsider this Kenn? I understand the motivation but I hardly
> see this making things better or more clear. Any runner user will end
> up reading the runner documentation and capability matrix so he will
> catch the current status that way.
>
>
>
> On Sat, May 25, 2019 at 8:35 AM Jozsef Bartok  wrote:
> >
> > I missed Ken's input when writing my previous mail. Sorry.
> > So, to recap: I should remove "experimental" from any directory names,
> but find an other way of configuring the artifact so that it still has
> "experimental" in it's name.
> > Right?
> >
> > On Sat, May 25, 2019 at 9:32 AM Jozsef Bartok 
> wrote:
> >>
> >> Yes, I'll gladly fix it, we aren't particularly keen to be labeled as
> experimental either..
> >>
> >> Btw. initially the "experimental" word was only in the Gradle module
> name, but then there was some change
> >> ([BEAM-4046] decouple gradle project names and maven artifact ids -
> 4/2/19) which kind of ended up
> >> putting it in the directory name. Maybe I should have merged with that
> differently, but this is how
> >> it seemed consistent.
> >>
> >> Anyways, will fix it in my next PR.
> >>
> >> On Fri, May 24, 2019 at 5:53 PM Ismaël Mejía  wrote:
> >>>
> >>> I see thanks Jozsef, marking things as Experimental was discussed but
> >>> we never agreed on doing this at the directory level. We can cover the
> >>> same ground by putting an annotation in the classes (in particular the
> >>> JetRunner and JetPipelineOptions classes which are the real public
> >>> interface, or in the documentation (in particular website), I do not
> >>> see how putting this in the directory name helps and if so we may need
> >>> to put this in many other directories which is far from ideal. Any
> >>> chance this can be fixed (jet-experimental -> jet) ?
> >>>
> >>> On Fri, May 24, 2019 at 9:08 AM Jozsef Bartok 
> wrote:
> >>> >
> >>> > Hi Ismaël!
> >>> >
> >>> > Quoting Kenn (from PR-8410): "We discussed on list that it would be
> better to have new things always start as experimental in a way that
> clearly distinguishes them from the core."
> >>> >
> >>> > Rgds
> >>> >
> >>> > On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía 
> wrote:
> >>> >>
> >>> >> I saw that the runner was merged but I don’t get why the foler is
> >>> >> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is
> it
> >>> >> because the runner does not pass ValidatesRunner? Or because the
> >>> 

Re: Hazelcast Jet Runner

2019-05-25 Thread Ismaël Mejía
Including the experimental suffix in artifact names is not a good idea
either because once we decide that it is not experimantal anymore this
will be a breaking change for users who will need then to update its
dependencies code. Also it is error-prone to use different mappings
for directories and artifacts (even if possible).

May we reconsider this Kenn? I understand the motivation but I hardly
see this making things better or more clear. Any runner user will end
up reading the runner documentation and capability matrix so he will
catch the current status that way.



On Sat, May 25, 2019 at 8:35 AM Jozsef Bartok  wrote:
>
> I missed Ken's input when writing my previous mail. Sorry.
> So, to recap: I should remove "experimental" from any directory names, but 
> find an other way of configuring the artifact so that it still has 
> "experimental" in it's name.
> Right?
>
> On Sat, May 25, 2019 at 9:32 AM Jozsef Bartok  wrote:
>>
>> Yes, I'll gladly fix it, we aren't particularly keen to be labeled as 
>> experimental either..
>>
>> Btw. initially the "experimental" word was only in the Gradle module name, 
>> but then there was some change
>> ([BEAM-4046] decouple gradle project names and maven artifact ids - 4/2/19) 
>> which kind of ended up
>> putting it in the directory name. Maybe I should have merged with that 
>> differently, but this is how
>> it seemed consistent.
>>
>> Anyways, will fix it in my next PR.
>>
>> On Fri, May 24, 2019 at 5:53 PM Ismaël Mejía  wrote:
>>>
>>> I see thanks Jozsef, marking things as Experimental was discussed but
>>> we never agreed on doing this at the directory level. We can cover the
>>> same ground by putting an annotation in the classes (in particular the
>>> JetRunner and JetPipelineOptions classes which are the real public
>>> interface, or in the documentation (in particular website), I do not
>>> see how putting this in the directory name helps and if so we may need
>>> to put this in many other directories which is far from ideal. Any
>>> chance this can be fixed (jet-experimental -> jet) ?
>>>
>>> On Fri, May 24, 2019 at 9:08 AM Jozsef Bartok  wrote:
>>> >
>>> > Hi Ismaël!
>>> >
>>> > Quoting Kenn (from PR-8410): "We discussed on list that it would be 
>>> > better to have new things always start as experimental in a way that 
>>> > clearly distinguishes them from the core."
>>> >
>>> > Rgds
>>> >
>>> > On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía  wrote:
>>> >>
>>> >> I saw that the runner was merged but I don’t get why the foler is
>>> >> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
>>> >> because the runner does not pass ValidatesRunner? Or because the
>>> >> contributors are few? I don’t really see any reason behind this
>>> >> suffix. And even if the status is not mature that’s not different from
>>> >> other already merged runners.
>>> >>
>>> >> On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles  wrote:
>>> >> >
>>> >> > Nice! That is *way* more than the PR I was looking for. I just meant 
>>> >> > that you could update the website/ directory. It is fine to keep the 
>>> >> > runner in your own repository if you want.
>>> >> >
>>> >> > But I think it is great if you want to contribute it to Apache Beam 
>>> >> > (hence donate it to the Apache Software Foundation). The benefits 
>>> >> > include: low-latency testing, free updates when someone does a 
>>> >> > refactor. Things to consider are: subject to ASF / Beam governance, 
>>> >> > PMC, commiters, subject to Beam's release cadence (and we might 
>>> >> > exclude from Beam releases for a little bit). Typically, we have kept 
>>> >> > runners on a branch until they are somewhat stable. I don't feel 
>>> >> > strongly about this for disjoint codebases that can easily be excluded 
>>> >> > from releases. We might want to suffix `-experimental` to the 
>>> >> > artifacts for some time.
>>> >> >
>>> >> > I commented on the PR about the necessary i.p. clearance steps.
>>> >> >
>>> >> > Kenn
>>> >> >
>>> >> > On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com 
>>> >> >  wrote:
>>> >> >>
>>> >> >> Hi Kenn.
>>> >> >>
>>> >> >> It took me a while to migrate our code to the Beam repo, but I 
>>> >> >> finally have been able to create the Pull Request you asked for, this 
>>> >> >> is it: https://github.com/apache/beam/pull/8410
>>> >> >>
>>> >> >> Looking forward to your feedback!
>>> >> >>
>>> >> >> Best regards,
>>> >> >> Jozsef
>>> >> >>
>>> >> >> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
>>> >> >> > The ValidatesRunner tests are the best source we have for knowing 
>>> >> >> > the
>>> >> >> > capabilities of a runner. Are there instructions for running the 
>>> >> >> > tests?
>>> >> >> >
>>> >> >> > Assuming we can check it out, then just open a PR to the website 
>>> >> >> > with the
>>> >> >> > current capabilities and caveats. Since it is a big deal and could 
>>> >> >> > use lots
>>> >> >> > of eyes, I would share the PR link on this thread.
>>> >> >> >
>>> >> >> > Kenn
>>> 

Re: Hazelcast Jet Runner

2019-05-25 Thread Jozsef Bartok
I missed Ken's input when writing my previous mail. Sorry.
So, to recap: I should remove "experimental" from any directory names, but
find an other way of configuring the artifact so that it still has
"experimental" in it's name.
Right?

On Sat, May 25, 2019 at 9:32 AM Jozsef Bartok  wrote:

> Yes, I'll gladly fix it, we aren't particularly keen to be labeled as
> experimental either..
>
> Btw. initially the "experimental" word was only in the Gradle module name,
> but then there was some change
> ([BEAM-4046] decouple gradle project names and maven artifact ids -
> 4/2/19) which kind of ended up
> putting it in the directory name. Maybe I should have merged with that
> differently, but this is how
> it seemed consistent.
>
> Anyways, will fix it in my next PR.
>
> On Fri, May 24, 2019 at 5:53 PM Ismaël Mejía  wrote:
>
>> I see thanks Jozsef, marking things as Experimental was discussed but
>> we never agreed on doing this at the directory level. We can cover the
>> same ground by putting an annotation in the classes (in particular the
>> JetRunner and JetPipelineOptions classes which are the real public
>> interface, or in the documentation (in particular website), I do not
>> see how putting this in the directory name helps and if so we may need
>> to put this in many other directories which is far from ideal. Any
>> chance this can be fixed (jet-experimental -> jet) ?
>>
>> On Fri, May 24, 2019 at 9:08 AM Jozsef Bartok 
>> wrote:
>> >
>> > Hi Ismaël!
>> >
>> > Quoting Kenn (from PR-8410): "We discussed on list that it would be
>> better to have new things always start as experimental in a way that
>> clearly distinguishes them from the core."
>> >
>> > Rgds
>> >
>> > On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía 
>> wrote:
>> >>
>> >> I saw that the runner was merged but I don’t get why the foler is
>> >> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
>> >> because the runner does not pass ValidatesRunner? Or because the
>> >> contributors are few? I don’t really see any reason behind this
>> >> suffix. And even if the status is not mature that’s not different from
>> >> other already merged runners.
>> >>
>> >> On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles 
>> wrote:
>> >> >
>> >> > Nice! That is *way* more than the PR I was looking for. I just meant
>> that you could update the website/ directory. It is fine to keep the runner
>> in your own repository if you want.
>> >> >
>> >> > But I think it is great if you want to contribute it to Apache Beam
>> (hence donate it to the Apache Software Foundation). The benefits include:
>> low-latency testing, free updates when someone does a refactor. Things to
>> consider are: subject to ASF / Beam governance, PMC, commiters, subject to
>> Beam's release cadence (and we might exclude from Beam releases for a
>> little bit). Typically, we have kept runners on a branch until they are
>> somewhat stable. I don't feel strongly about this for disjoint codebases
>> that can easily be excluded from releases. We might want to suffix
>> `-experimental` to the artifacts for some time.
>> >> >
>> >> > I commented on the PR about the necessary i.p. clearance steps.
>> >> >
>> >> > Kenn
>> >> >
>> >> > On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com <
>> jo...@hazelcast.com> wrote:
>> >> >>
>> >> >> Hi Kenn.
>> >> >>
>> >> >> It took me a while to migrate our code to the Beam repo, but I
>> finally have been able to create the Pull Request you asked for, this is
>> it: https://github.com/apache/beam/pull/8410
>> >> >>
>> >> >> Looking forward to your feedback!
>> >> >>
>> >> >> Best regards,
>> >> >> Jozsef
>> >> >>
>> >> >> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
>> >> >> > The ValidatesRunner tests are the best source we have for knowing
>> the
>> >> >> > capabilities of a runner. Are there instructions for running the
>> tests?
>> >> >> >
>> >> >> > Assuming we can check it out, then just open a PR to the website
>> with the
>> >> >> > current capabilities and caveats. Since it is a big deal and
>> could use lots
>> >> >> > of eyes, I would share the PR link on this thread.
>> >> >> >
>> >> >> > Kenn
>> >> >> >
>> >> >> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok <
>> jo...@hazelcast.com> wrote:
>> >> >> >
>> >> >> > > Hi. We at Hazelcast Jet have been working for a while now to
>> implement a
>> >> >> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
>> >> >> > > https://jet.hazelcast.org/). The process is still ongoing (
>> >> >> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but
>> we are
>> >> >> > > aiming for a fully functional, reliable Runner which can
>> proudly join the
>> >> >> > > Capability Matrix. For that purpose I would like to ask what’s
>> your process
>> >> >> > > of validating runners? We are already running the
>> @ValidatesRunner tests
>> >> >> > > and the Nexmark test suite, but beyond that what other steps do
>> we need to
>> >> >> > > take to get our Runner to the level it needs 

Re: Hazelcast Jet Runner

2019-05-25 Thread Jozsef Bartok
Yes, I'll gladly fix it, we aren't particularly keen to be labeled as
experimental either..

Btw. initially the "experimental" word was only in the Gradle module name,
but then there was some change
([BEAM-4046] decouple gradle project names and maven artifact ids - 4/2/19)
which kind of ended up
putting it in the directory name. Maybe I should have merged with that
differently, but this is how
it seemed consistent.

Anyways, will fix it in my next PR.

On Fri, May 24, 2019 at 5:53 PM Ismaël Mejía  wrote:

> I see thanks Jozsef, marking things as Experimental was discussed but
> we never agreed on doing this at the directory level. We can cover the
> same ground by putting an annotation in the classes (in particular the
> JetRunner and JetPipelineOptions classes which are the real public
> interface, or in the documentation (in particular website), I do not
> see how putting this in the directory name helps and if so we may need
> to put this in many other directories which is far from ideal. Any
> chance this can be fixed (jet-experimental -> jet) ?
>
> On Fri, May 24, 2019 at 9:08 AM Jozsef Bartok  wrote:
> >
> > Hi Ismaël!
> >
> > Quoting Kenn (from PR-8410): "We discussed on list that it would be
> better to have new things always start as experimental in a way that
> clearly distinguishes them from the core."
> >
> > Rgds
> >
> > On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía  wrote:
> >>
> >> I saw that the runner was merged but I don’t get why the foler is
> >> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
> >> because the runner does not pass ValidatesRunner? Or because the
> >> contributors are few? I don’t really see any reason behind this
> >> suffix. And even if the status is not mature that’s not different from
> >> other already merged runners.
> >>
> >> On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles 
> wrote:
> >> >
> >> > Nice! That is *way* more than the PR I was looking for. I just meant
> that you could update the website/ directory. It is fine to keep the runner
> in your own repository if you want.
> >> >
> >> > But I think it is great if you want to contribute it to Apache Beam
> (hence donate it to the Apache Software Foundation). The benefits include:
> low-latency testing, free updates when someone does a refactor. Things to
> consider are: subject to ASF / Beam governance, PMC, commiters, subject to
> Beam's release cadence (and we might exclude from Beam releases for a
> little bit). Typically, we have kept runners on a branch until they are
> somewhat stable. I don't feel strongly about this for disjoint codebases
> that can easily be excluded from releases. We might want to suffix
> `-experimental` to the artifacts for some time.
> >> >
> >> > I commented on the PR about the necessary i.p. clearance steps.
> >> >
> >> > Kenn
> >> >
> >> > On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com <
> jo...@hazelcast.com> wrote:
> >> >>
> >> >> Hi Kenn.
> >> >>
> >> >> It took me a while to migrate our code to the Beam repo, but I
> finally have been able to create the Pull Request you asked for, this is
> it: https://github.com/apache/beam/pull/8410
> >> >>
> >> >> Looking forward to your feedback!
> >> >>
> >> >> Best regards,
> >> >> Jozsef
> >> >>
> >> >> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
> >> >> > The ValidatesRunner tests are the best source we have for knowing
> the
> >> >> > capabilities of a runner. Are there instructions for running the
> tests?
> >> >> >
> >> >> > Assuming we can check it out, then just open a PR to the website
> with the
> >> >> > current capabilities and caveats. Since it is a big deal and could
> use lots
> >> >> > of eyes, I would share the PR link on this thread.
> >> >> >
> >> >> > Kenn
> >> >> >
> >> >> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok <
> jo...@hazelcast.com> wrote:
> >> >> >
> >> >> > > Hi. We at Hazelcast Jet have been working for a while now to
> implement a
> >> >> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
> >> >> > > https://jet.hazelcast.org/). The process is still ongoing (
> >> >> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but we
> are
> >> >> > > aiming for a fully functional, reliable Runner which can proudly
> join the
> >> >> > > Capability Matrix. For that purpose I would like to ask what’s
> your process
> >> >> > > of validating runners? We are already running the
> @ValidatesRunner tests
> >> >> > > and the Nexmark test suite, but beyond that what other steps do
> we need to
> >> >> > > take to get our Runner to the level it needs to be at?
> >> >> > >
> >> >> >
>


Re: Hazelcast Jet Runner

2019-05-24 Thread Kenneth Knowles
My request was that the artifact be beam-runners-jet-experimental or
beam-runners-experimental-jet so that a user was clearly opting in to
experimental functionality, per the discussion. I try not to have a strong
opinion about the mechanism. Probably the most natural thing to do is just
configure the publishing { } block to make it explicit.

Kenn

On Fri, May 24, 2019 at 7:53 AM Ismaël Mejía  wrote:

> I see thanks Jozsef, marking things as Experimental was discussed but
> we never agreed on doing this at the directory level. We can cover the
> same ground by putting an annotation in the classes (in particular the
> JetRunner and JetPipelineOptions classes which are the real public
> interface, or in the documentation (in particular website), I do not
> see how putting this in the directory name helps and if so we may need
> to put this in many other directories which is far from ideal. Any
> chance this can be fixed (jet-experimental -> jet) ?
>
> On Fri, May 24, 2019 at 9:08 AM Jozsef Bartok  wrote:
> >
> > Hi Ismaël!
> >
> > Quoting Kenn (from PR-8410): "We discussed on list that it would be
> better to have new things always start as experimental in a way that
> clearly distinguishes them from the core."
> >
> > Rgds
> >
> > On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía  wrote:
> >>
> >> I saw that the runner was merged but I don’t get why the foler is
> >> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
> >> because the runner does not pass ValidatesRunner? Or because the
> >> contributors are few? I don’t really see any reason behind this
> >> suffix. And even if the status is not mature that’s not different from
> >> other already merged runners.
> >>
> >> On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles 
> wrote:
> >> >
> >> > Nice! That is *way* more than the PR I was looking for. I just meant
> that you could update the website/ directory. It is fine to keep the runner
> in your own repository if you want.
> >> >
> >> > But I think it is great if you want to contribute it to Apache Beam
> (hence donate it to the Apache Software Foundation). The benefits include:
> low-latency testing, free updates when someone does a refactor. Things to
> consider are: subject to ASF / Beam governance, PMC, commiters, subject to
> Beam's release cadence (and we might exclude from Beam releases for a
> little bit). Typically, we have kept runners on a branch until they are
> somewhat stable. I don't feel strongly about this for disjoint codebases
> that can easily be excluded from releases. We might want to suffix
> `-experimental` to the artifacts for some time.
> >> >
> >> > I commented on the PR about the necessary i.p. clearance steps.
> >> >
> >> > Kenn
> >> >
> >> > On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com <
> jo...@hazelcast.com> wrote:
> >> >>
> >> >> Hi Kenn.
> >> >>
> >> >> It took me a while to migrate our code to the Beam repo, but I
> finally have been able to create the Pull Request you asked for, this is
> it: https://github.com/apache/beam/pull/8410
> >> >>
> >> >> Looking forward to your feedback!
> >> >>
> >> >> Best regards,
> >> >> Jozsef
> >> >>
> >> >> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
> >> >> > The ValidatesRunner tests are the best source we have for knowing
> the
> >> >> > capabilities of a runner. Are there instructions for running the
> tests?
> >> >> >
> >> >> > Assuming we can check it out, then just open a PR to the website
> with the
> >> >> > current capabilities and caveats. Since it is a big deal and could
> use lots
> >> >> > of eyes, I would share the PR link on this thread.
> >> >> >
> >> >> > Kenn
> >> >> >
> >> >> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok <
> jo...@hazelcast.com> wrote:
> >> >> >
> >> >> > > Hi. We at Hazelcast Jet have been working for a while now to
> implement a
> >> >> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
> >> >> > > https://jet.hazelcast.org/). The process is still ongoing (
> >> >> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but we
> are
> >> >> > > aiming for a fully functional, reliable Runner which can proudly
> join the
> >> >> > > Capability Matrix. For that purpose I would like to ask what’s
> your process
> >> >> > > of validating runners? We are already running the
> @ValidatesRunner tests
> >> >> > > and the Nexmark test suite, but beyond that what other steps do
> we need to
> >> >> > > take to get our Runner to the level it needs to be at?
> >> >> > >
> >> >> >
>


Re: Hazelcast Jet Runner

2019-05-24 Thread Ismaël Mejía
I see thanks Jozsef, marking things as Experimental was discussed but
we never agreed on doing this at the directory level. We can cover the
same ground by putting an annotation in the classes (in particular the
JetRunner and JetPipelineOptions classes which are the real public
interface, or in the documentation (in particular website), I do not
see how putting this in the directory name helps and if so we may need
to put this in many other directories which is far from ideal. Any
chance this can be fixed (jet-experimental -> jet) ?

On Fri, May 24, 2019 at 9:08 AM Jozsef Bartok  wrote:
>
> Hi Ismaël!
>
> Quoting Kenn (from PR-8410): "We discussed on list that it would be better to 
> have new things always start as experimental in a way that clearly 
> distinguishes them from the core."
>
> Rgds
>
> On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía  wrote:
>>
>> I saw that the runner was merged but I don’t get why the foler is
>> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
>> because the runner does not pass ValidatesRunner? Or because the
>> contributors are few? I don’t really see any reason behind this
>> suffix. And even if the status is not mature that’s not different from
>> other already merged runners.
>>
>> On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles  wrote:
>> >
>> > Nice! That is *way* more than the PR I was looking for. I just meant that 
>> > you could update the website/ directory. It is fine to keep the runner in 
>> > your own repository if you want.
>> >
>> > But I think it is great if you want to contribute it to Apache Beam (hence 
>> > donate it to the Apache Software Foundation). The benefits include: 
>> > low-latency testing, free updates when someone does a refactor. Things to 
>> > consider are: subject to ASF / Beam governance, PMC, commiters, subject to 
>> > Beam's release cadence (and we might exclude from Beam releases for a 
>> > little bit). Typically, we have kept runners on a branch until they are 
>> > somewhat stable. I don't feel strongly about this for disjoint codebases 
>> > that can easily be excluded from releases. We might want to suffix 
>> > `-experimental` to the artifacts for some time.
>> >
>> > I commented on the PR about the necessary i.p. clearance steps.
>> >
>> > Kenn
>> >
>> > On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com  
>> > wrote:
>> >>
>> >> Hi Kenn.
>> >>
>> >> It took me a while to migrate our code to the Beam repo, but I finally 
>> >> have been able to create the Pull Request you asked for, this is it: 
>> >> https://github.com/apache/beam/pull/8410
>> >>
>> >> Looking forward to your feedback!
>> >>
>> >> Best regards,
>> >> Jozsef
>> >>
>> >> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
>> >> > The ValidatesRunner tests are the best source we have for knowing the
>> >> > capabilities of a runner. Are there instructions for running the tests?
>> >> >
>> >> > Assuming we can check it out, then just open a PR to the website with 
>> >> > the
>> >> > current capabilities and caveats. Since it is a big deal and could use 
>> >> > lots
>> >> > of eyes, I would share the PR link on this thread.
>> >> >
>> >> > Kenn
>> >> >
>> >> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok  
>> >> > wrote:
>> >> >
>> >> > > Hi. We at Hazelcast Jet have been working for a while now to 
>> >> > > implement a
>> >> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
>> >> > > https://jet.hazelcast.org/). The process is still ongoing (
>> >> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are
>> >> > > aiming for a fully functional, reliable Runner which can proudly join 
>> >> > > the
>> >> > > Capability Matrix. For that purpose I would like to ask what’s your 
>> >> > > process
>> >> > > of validating runners? We are already running the @ValidatesRunner 
>> >> > > tests
>> >> > > and the Nexmark test suite, but beyond that what other steps do we 
>> >> > > need to
>> >> > > take to get our Runner to the level it needs to be at?
>> >> > >
>> >> >


Re: Hazelcast Jet Runner

2019-05-24 Thread Jozsef Bartok
Hi Ismaël!

Quoting Kenn (from PR-8410 ): "We
discussed on list that it would be better to have new things always start
as experimental in a way that clearly distinguishes them from the core."

Rgds

On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía  wrote:

> I saw that the runner was merged but I don’t get why the foler is
> called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
> because the runner does not pass ValidatesRunner? Or because the
> contributors are few? I don’t really see any reason behind this
> suffix. And even if the status is not mature that’s not different from
> other already merged runners.
>
> On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles  wrote:
> >
> > Nice! That is *way* more than the PR I was looking for. I just meant
> that you could update the website/ directory. It is fine to keep the runner
> in your own repository if you want.
> >
> > But I think it is great if you want to contribute it to Apache Beam
> (hence donate it to the Apache Software Foundation). The benefits include:
> low-latency testing, free updates when someone does a refactor. Things to
> consider are: subject to ASF / Beam governance, PMC, commiters, subject to
> Beam's release cadence (and we might exclude from Beam releases for a
> little bit). Typically, we have kept runners on a branch until they are
> somewhat stable. I don't feel strongly about this for disjoint codebases
> that can easily be excluded from releases. We might want to suffix
> `-experimental` to the artifacts for some time.
> >
> > I commented on the PR about the necessary i.p. clearance steps.
> >
> > Kenn
> >
> > On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com 
> wrote:
> >>
> >> Hi Kenn.
> >>
> >> It took me a while to migrate our code to the Beam repo, but I finally
> have been able to create the Pull Request you asked for, this is it:
> https://github.com/apache/beam/pull/8410
> >>
> >> Looking forward to your feedback!
> >>
> >> Best regards,
> >> Jozsef
> >>
> >> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
> >> > The ValidatesRunner tests are the best source we have for knowing the
> >> > capabilities of a runner. Are there instructions for running the
> tests?
> >> >
> >> > Assuming we can check it out, then just open a PR to the website with
> the
> >> > current capabilities and caveats. Since it is a big deal and could
> use lots
> >> > of eyes, I would share the PR link on this thread.
> >> >
> >> > Kenn
> >> >
> >> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok 
> wrote:
> >> >
> >> > > Hi. We at Hazelcast Jet have been working for a while now to
> implement a
> >> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
> >> > > https://jet.hazelcast.org/). The process is still ongoing (
> >> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are
> >> > > aiming for a fully functional, reliable Runner which can proudly
> join the
> >> > > Capability Matrix. For that purpose I would like to ask what’s your
> process
> >> > > of validating runners? We are already running the @ValidatesRunner
> tests
> >> > > and the Nexmark test suite, but beyond that what other steps do we
> need to
> >> > > take to get our Runner to the level it needs to be at?
> >> > >
> >> >
>


Re: Hazelcast Jet Runner

2019-05-23 Thread Ismaël Mejía
I saw that the runner was merged but I don’t get why the foler is
called ‘runners/jet experimental’ and not simply ‘runners/jet’. Is it
because the runner does not pass ValidatesRunner? Or because the
contributors are few? I don’t really see any reason behind this
suffix. And even if the status is not mature that’s not different from
other already merged runners.

On Fri, Apr 26, 2019 at 9:43 PM Kenneth Knowles  wrote:
>
> Nice! That is *way* more than the PR I was looking for. I just meant that you 
> could update the website/ directory. It is fine to keep the runner in your 
> own repository if you want.
>
> But I think it is great if you want to contribute it to Apache Beam (hence 
> donate it to the Apache Software Foundation). The benefits include: 
> low-latency testing, free updates when someone does a refactor. Things to 
> consider are: subject to ASF / Beam governance, PMC, commiters, subject to 
> Beam's release cadence (and we might exclude from Beam releases for a little 
> bit). Typically, we have kept runners on a branch until they are somewhat 
> stable. I don't feel strongly about this for disjoint codebases that can 
> easily be excluded from releases. We might want to suffix `-experimental` to 
> the artifacts for some time.
>
> I commented on the PR about the necessary i.p. clearance steps.
>
> Kenn
>
> On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com  
> wrote:
>>
>> Hi Kenn.
>>
>> It took me a while to migrate our code to the Beam repo, but I finally have 
>> been able to create the Pull Request you asked for, this is it: 
>> https://github.com/apache/beam/pull/8410
>>
>> Looking forward to your feedback!
>>
>> Best regards,
>> Jozsef
>>
>> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
>> > The ValidatesRunner tests are the best source we have for knowing the
>> > capabilities of a runner. Are there instructions for running the tests?
>> >
>> > Assuming we can check it out, then just open a PR to the website with the
>> > current capabilities and caveats. Since it is a big deal and could use lots
>> > of eyes, I would share the PR link on this thread.
>> >
>> > Kenn
>> >
>> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok  wrote:
>> >
>> > > Hi. We at Hazelcast Jet have been working for a while now to implement a
>> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
>> > > https://jet.hazelcast.org/). The process is still ongoing (
>> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are
>> > > aiming for a fully functional, reliable Runner which can proudly join the
>> > > Capability Matrix. For that purpose I would like to ask what’s your 
>> > > process
>> > > of validating runners? We are already running the @ValidatesRunner tests
>> > > and the Nexmark test suite, but beyond that what other steps do we need 
>> > > to
>> > > take to get our Runner to the level it needs to be at?
>> > >
>> >


Re: Hazelcast Jet Runner

2019-04-26 Thread Kenneth Knowles
Nice! That is *way* more than the PR I was looking for. I just meant that
you could update the website/ directory. It is fine to keep the runner in
your own repository if you want.

But I think it is great if you want to contribute it to Apache Beam (hence
donate it to the Apache Software Foundation). The benefits include:
low-latency testing, free updates when someone does a refactor. Things to
consider are: subject to ASF / Beam governance, PMC, commiters, subject to
Beam's release cadence (and we might exclude from Beam releases for a
little bit). Typically, we have kept runners on a branch until they are
somewhat stable. I don't feel strongly about this for disjoint codebases
that can easily be excluded from releases. We might want to suffix
`-experimental` to the artifacts for some time.

I commented on the PR about the necessary i.p. clearance steps.

Kenn

On Fri, Apr 26, 2019 at 3:59 AM jo...@hazelcast.com 
wrote:

> Hi Kenn.
>
> It took me a while to migrate our code to the Beam repo, but I finally
> have been able to create the Pull Request you asked for, this is it:
> https://github.com/apache/beam/pull/8410
>
> Looking forward to your feedback!
>
> Best regards,
> Jozsef
>
> On 2019/04/19 20:52:42, Kenneth Knowles  wrote:
> > The ValidatesRunner tests are the best source we have for knowing the
> > capabilities of a runner. Are there instructions for running the tests?
> >
> > Assuming we can check it out, then just open a PR to the website with the
> > current capabilities and caveats. Since it is a big deal and could use
> lots
> > of eyes, I would share the PR link on this thread.
> >
> > Kenn
> >
> > On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok 
> wrote:
> >
> > > Hi. We at Hazelcast Jet have been working for a while now to implement
> a
> > > Java Beam Runner (non-portable) based on Hazelcast Jet (
> > > https://jet.hazelcast.org/). The process is still ongoing (
> > > https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are
> > > aiming for a fully functional, reliable Runner which can proudly join
> the
> > > Capability Matrix. For that purpose I would like to ask what’s your
> process
> > > of validating runners? We are already running the @ValidatesRunner
> tests
> > > and the Nexmark test suite, but beyond that what other steps do we
> need to
> > > take to get our Runner to the level it needs to be at?
> > >
> >
>


Re: Hazelcast Jet Runner

2019-04-22 Thread Maximilian Michels

Hi Jozsef,

If the Runner support the complete set of ValidatesRunner tests and the 
Nexmark suite, it is already in a very good state. Like Kenn already 
suggested, we can definitely add it to the capability matrix then.


Thanks,
Max

On 19.04.19 22:52, Kenneth Knowles wrote:
The ValidatesRunner tests are the best source we have for knowing the 
capabilities of a runner. Are there instructions for running the tests?


Assuming we can check it out, then just open a PR to the website with 
the current capabilities and caveats. Since it is a big deal and could 
use lots of eyes, I would share the PR link on this thread.


Kenn

On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok > wrote:


Hi. We at Hazelcast Jet have been working for a while now to
implement a Java Beam Runner (non-portable) based on Hazelcast Jet
(https://jet.hazelcast.org/). The process is still ongoing
(https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are
aiming for a fully functional, reliable Runner which can proudly
join the Capability Matrix. For that purpose I would like to ask
what’s your process of validating runners? We are already running
the @ValidatesRunner tests and the Nexmark test suite, but beyond
that what other steps do we need to take to get our Runner to the
level it needs to be at?



Re: Hazelcast Jet Runner

2019-04-19 Thread Kenneth Knowles
The ValidatesRunner tests are the best source we have for knowing the
capabilities of a runner. Are there instructions for running the tests?

Assuming we can check it out, then just open a PR to the website with the
current capabilities and caveats. Since it is a big deal and could use lots
of eyes, I would share the PR link on this thread.

Kenn

On Thu, Apr 18, 2019 at 11:53 AM Jozsef Bartok  wrote:

> Hi. We at Hazelcast Jet have been working for a while now to implement a
> Java Beam Runner (non-portable) based on Hazelcast Jet (
> https://jet.hazelcast.org/). The process is still ongoing (
> https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are
> aiming for a fully functional, reliable Runner which can proudly join the
> Capability Matrix. For that purpose I would like to ask what’s your process
> of validating runners? We are already running the @ValidatesRunner tests
> and the Nexmark test suite, but beyond that what other steps do we need to
> take to get our Runner to the level it needs to be at?
>


Re: Hazelcast Jet Runner - validation tests

2019-04-05 Thread Kenneth Knowles
Robert - that appears to be a test of state & timers, not triggers. Should
work for testing that the watermark at least advances. We do already have
similar java-based ValidatesRunner tests in ParDoTest.

The results of triggering, while nondeterministic, should generally fall
into a testable equivalence class, since the different outputs ought to
form a meaningful changelog of the evolution of the result. That is why
PAssert has methods like inCombinedNonLatePanes. For a triggered combine,
any test that examines the result more closely is overspecified. One
problem is that today the burden of ensuring that the results fall into a
well-defined equivalence class is split between the runner and the user.
The user needs to know about the trigger and accumulation mode and adjust
for it. This is not a blocker: the user in this case is the test, so it can
set things up appropriately, and that is how that PAssert method and others
work.

But none of that allows you to test that triggers are actually really
affecting the result. Partly this is because a runner is not *required* to
actually produce early output. The trigger *allows* the runner to fire, but
does not require it. It is not practical, nor the purpose of triggers, to
require that exact particular elements be produced. Our runners, and their
streaming vs batch mode, have very different behavior in this regard, and
it is generally coupled to bundling where they also differ.

All that said, the testing situation could be better. IMO implementing
TestStream would be the best way, but requires implementing knowledge of
global quiescence (or at least a reliable-enough approximation). This
feature would, itself, be useful for users debugging stuckness...

If you use ReduceFnRunner to execute triggers then it has a lot of testing,
so you just need to make sure your integration is basically right. There's
a Jira to convert these to TestStream-based tests - they were written
before TestStream existed.

Kenn

On Fri, Apr 5, 2019 at 1:37 AM Robert Bradshaw  wrote:

> On Thu, Apr 4, 2019 at 6:38 PM Lukasz Cwik  wrote:
> >
> > The issue with unbounded tests that rely on triggers/late data/early
> firings/processing time is that these are several sources of
> non-determinism. The sources make non-deterministic decisions around when
> to produce data, checkpoint, and resume and runners make non-deterministic
> decisions around when to output elements, in which order, and when to
> evaluate triggers. UsesTestStream is the best set of tests we currently
> have for making non-deterministic processing decisions deterministic but
> are more difficult to write then the other ValidatesRunner tests and also
> not well supported because of the special nature of UsesTestStream needing
> special hooks within the runner to control when to output and when to
> advance time.
> >
> > I'm not aware of any tests that we currently have that run a non
> deterministic pipeline and evaluate it against all possible outcomes that
> could have been produced and check that the output was valid. We would
> welcome ideas in how to improve this space to get more runners being tested
> for non-deterministic pipelines.
>
> Python has some tests of this nature, e.g.
>
>
> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L308
>
> I'd imagine we could do similar for Java.
>
> > On Thu, Apr 4, 2019 at 3:36 AM Jozsef Bartok 
> wrote:
> >>
> >> Hi.
> >>
> >> My name is Jozsef, I've been working on Runners based on Hazelcast Jet.
> Plural because we have both an "old-style" and a "portable" Runner in
> development (https://github.com/hazelcast/hazelcast-jet-beam-runner).
> >>
> >> While our portable one isn't even functional yet, the "old-style" type
> of Runner is a bit more mature. It handles only bounded data, but for that
> case it passes all Beam tests of ValidatesRunner category and runs the
> Nexmark suite successfully too (I'm refering only to correctness, because
> performance is not yet where it can be, we aren't doing any Pipeline
> surgery yet and no other optimizations either).
> >>
> >> Since a few days we have started extending it for unbounded data, so we
> have started adding support for things like triggers, watermarks and such
> and we are wondering how come we can't find ValidatesRunner tests specific
> to unbounded data. Tests from the UsesTestStream category seem to be kind
> of a candidate for this, but they have nowhere near the coverage and
> completeness provided by the ValidatesRunner ones.
> >>
> >> I think we are missing something and I don't know what... Could you
> pls. advise?
> >>
> >> Rgds,
> >> Jozsef
>


Re: Hazelcast Jet Runner - validation tests

2019-04-05 Thread Robert Bradshaw
On Thu, Apr 4, 2019 at 6:38 PM Lukasz Cwik  wrote:
>
> The issue with unbounded tests that rely on triggers/late data/early 
> firings/processing time is that these are several sources of non-determinism. 
> The sources make non-deterministic decisions around when to produce data, 
> checkpoint, and resume and runners make non-deterministic decisions around 
> when to output elements, in which order, and when to evaluate triggers. 
> UsesTestStream is the best set of tests we currently have for making 
> non-deterministic processing decisions deterministic but are more difficult 
> to write then the other ValidatesRunner tests and also not well supported 
> because of the special nature of UsesTestStream needing special hooks within 
> the runner to control when to output and when to advance time.
>
> I'm not aware of any tests that we currently have that run a non 
> deterministic pipeline and evaluate it against all possible outcomes that 
> could have been produced and check that the output was valid. We would 
> welcome ideas in how to improve this space to get more runners being tested 
> for non-deterministic pipelines.

Python has some tests of this nature, e.g.

https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L308

I'd imagine we could do similar for Java.

> On Thu, Apr 4, 2019 at 3:36 AM Jozsef Bartok  wrote:
>>
>> Hi.
>>
>> My name is Jozsef, I've been working on Runners based on Hazelcast Jet. 
>> Plural because we have both an "old-style" and a "portable" Runner in 
>> development (https://github.com/hazelcast/hazelcast-jet-beam-runner).
>>
>> While our portable one isn't even functional yet, the "old-style" type of 
>> Runner is a bit more mature. It handles only bounded data, but for that case 
>> it passes all Beam tests of ValidatesRunner category and runs the Nexmark 
>> suite successfully too (I'm refering only to correctness, because 
>> performance is not yet where it can be, we aren't doing any Pipeline surgery 
>> yet and no other optimizations either).
>>
>> Since a few days we have started extending it for unbounded data, so we have 
>> started adding support for things like triggers, watermarks and such and we 
>> are wondering how come we can't find ValidatesRunner tests specific to 
>> unbounded data. Tests from the UsesTestStream category seem to be kind of a 
>> candidate for this, but they have nowhere near the coverage and completeness 
>> provided by the ValidatesRunner ones.
>>
>> I think we are missing something and I don't know what... Could you pls. 
>> advise?
>>
>> Rgds,
>> Jozsef


Re: Hazelcast Jet Runner - validation tests

2019-04-04 Thread Lukasz Cwik
The issue with unbounded tests that rely on triggers/late data/early
firings/processing time is that these are several sources of
non-determinism. The sources make non-deterministic decisions around when
to produce data, checkpoint, and resume and runners make non-deterministic
decisions around when to output elements, in which order, and when to
evaluate triggers. UsesTestStream is the best set of tests we currently
have for making non-deterministic processing decisions deterministic but
are more difficult to write then the other ValidatesRunner tests and also
not well supported because of the special nature of UsesTestStream needing
special hooks within the runner to control when to output and when to
advance time.

I'm not aware of any tests that we currently have that run a non
deterministic pipeline and evaluate it against all possible outcomes that
could have been produced and check that the output was valid. We would
welcome ideas in how to improve this space to get more runners being tested
for non-deterministic pipelines.

On Thu, Apr 4, 2019 at 3:36 AM Jozsef Bartok  wrote:

> Hi.
>
> My name is Jozsef, I've been working on Runners based on Hazelcast Jet.
> Plural because we have both an "old-style" and a "portable" Runner in
> development (https://github.com/hazelcast/hazelcast-jet-beam-runner).
>
> While our portable one isn't even functional yet, the "old-style" type of
> Runner is a bit more mature. It handles only bounded data, but for that
> case it passes all Beam tests of ValidatesRunner category and runs the
> Nexmark suite successfully too (I'm refering only to correctness, because
> performance is not yet where it can be, we aren't doing any Pipeline
> surgery yet and no other optimizations either).
>
> Since a few days we have started extending it for unbounded data, so we
> have started adding support for things like triggers, watermarks and such
> and we are wondering how come we can't find ValidatesRunner tests specific
> to unbounded data. Tests from the UsesTestStream category seem to be kind
> of a candidate for this, but they have nowhere near the coverage and
> completeness provided by the ValidatesRunner ones.
>
> I think we are missing something and I don't know what... Could you pls.
> advise?
>
> Rgds,
> Jozsef
>


Re: Hazelcast Jet Runner

2019-03-20 Thread Ankur Goenka
Hi Can,

Like GreedyPipelineFuser, we have added many more components which makes
building a Portable Runner easy. Here is a link [1] to slides which
explains at a very high level what is needed to add a new portable runner.
Still adding a portable runner will be more complex than adding a native
runner but with these components it should be relatively easier than
originally expected.

[1]
https://docs.google.com/presentation/d/1JRNUSpOC8qaA4uLDuyGsuuyf6Tk8Xi9LAukhgl-hT_w/edit?usp=sharing

Thanks,
Ankur

On Wed, Mar 20, 2019 at 7:19 AM Maximilian Michels  wrote:

> Documentation on portability is still a bit sparse although there are
> many design documents:
> https://beam.apache.org/contribute/design-documents/#portability
>
> The structure of portable Runners is not fundamentally different, but
> some of the operations are deferred to the SDK which runs code for all
> supported languages. The Runner needs to provide an integration with it.
>
> Eventually, the old Runners will become obsolete though that won't
> happen very soon. Performance should be slightly better on the old Runners.
>
> I think writing an old-style Runner now will give you enough experience
> to port it to the new language-portable style later on.
>
> Cheers,
> Max
>
> On 20.03.19 14:52, Can Gencer wrote:
> > I had a look at "GreedyPipelineFuser" and indeed this was what exactly I
> > was talking about.
> >
> > Is https://beam.apache.org/roadmap/portability/ still the best
> > information about the portable runners or is there a more in-depth guide
> > available anywhere?
> >
> > On Wed, Mar 20, 2019 at 2:29 PM Can Gencer  > > wrote:
> >
> > Hi Max,
> >
> > Thanks. When you mean "old-style runner"  is this meant that this
> > style of runners will be obsolete and only the portable one will be
> > supported? The documentation for portable runners wasn't quite
> > complete and the barrier to entry for writing an old style runner
> > seemed easier for us and the old style runner should have better
> > performance?
> >
> > On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels  > > wrote:
> >
> > Hi Can,
> >
> > Thanks for the update. Interesting question. Flink has an
> > optimization
> > built in called chaining which works together nicely with Beam.
> > Essentially, operators which share the same partitioning get
> > executed
> > one after another inside a master operator. This saves resources.
> >
> > Interestingly, Beam's Fuser for portable Runners does something
> > similar.
> > AFAIK there is no built-in solution for the old-style Runners. I
> > think
> > it would be possible to build something like this on top of the
> > existing
> > translation.
> >
> > Cheers,
> > Max
> >
> > On 20.03.19 13:07, Can Gencer wrote:
> >  > Hi again,
> >  >
> >  > We've made some progress on the runner since writing this
> > more than a
> >  > month ago, the repo is available here publicly:
> >  > https://github.com/hazelcast/hazelcast-jet-beam-runner
> >  >
> >  > Still very much a work in progress though. One of the issues
> > I wanted to
> >  > raise is that currently we're translating each PTransform to
> > a Jet
> >  > Vertex (could be consider analogous to a Flink operator or a
> > vertex in
> >  > Tez). This is sub-optimal, since Beam creates lots of
> > transforms for
> >  > computations that could be performed inside the same Vertex,
> > such as
> >  > subsequent mapping transforms and many others. Ideally you
> > only need
> >  > distinct vertices where the data is re-partitioned and/or
> > shuffled. I'm
> >  > curious if Beam offers some way of translating the PTransform
> > graph to a
> >  > more minimal set of transforms, i.e. some kind of planner or
> > would this
> >  > have to be custom code? We've done a similar integration with
> > Cascading
> >  > in the past and it offered a planner which given a set of
> > rules would
> >  > partition the Cascading DAG into a minimal set of vertices
> > for the same
> >  > DAG. Curious if Beam has any similar functionality?
> >  >
> >  >
> >  >
> >  > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles
> > mailto:k...@apache.org>
> >  > >> wrote:
> >  >
> >  > Elaborating on what Robert alluded to: when I wrote that
> > runner
> >  > author guide, portability was in its infancy. Now Beam
> > Python can be
> >  > run on Flink. So that guide is primarily focused on the
> > 

Re: Hazelcast Jet Runner

2019-03-20 Thread Maximilian Michels
Documentation on portability is still a bit sparse although there are 
many design documents: 
https://beam.apache.org/contribute/design-documents/#portability


The structure of portable Runners is not fundamentally different, but 
some of the operations are deferred to the SDK which runs code for all 
supported languages. The Runner needs to provide an integration with it.


Eventually, the old Runners will become obsolete though that won't 
happen very soon. Performance should be slightly better on the old Runners.


I think writing an old-style Runner now will give you enough experience 
to port it to the new language-portable style later on.


Cheers,
Max

On 20.03.19 14:52, Can Gencer wrote:
I had a look at "GreedyPipelineFuser" and indeed this was what exactly I 
was talking about.


Is https://beam.apache.org/roadmap/portability/ still the best 
information about the portable runners or is there a more in-depth guide 
available anywhere?


On Wed, Mar 20, 2019 at 2:29 PM Can Gencer > wrote:


Hi Max,

Thanks. When you mean "old-style runner"  is this meant that this
style of runners will be obsolete and only the portable one will be
supported? The documentation for portable runners wasn't quite
complete and the barrier to entry for writing an old style runner
seemed easier for us and the old style runner should have better
performance?

On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels mailto:m...@apache.org>> wrote:

Hi Can,

Thanks for the update. Interesting question. Flink has an
optimization
built in called chaining which works together nicely with Beam.
Essentially, operators which share the same partitioning get
executed
one after another inside a master operator. This saves resources.

Interestingly, Beam's Fuser for portable Runners does something
similar.
AFAIK there is no built-in solution for the old-style Runners. I
think
it would be possible to build something like this on top of the
existing
translation.

Cheers,
Max

On 20.03.19 13:07, Can Gencer wrote:
 > Hi again,
 >
 > We've made some progress on the runner since writing this
more than a
 > month ago, the repo is available here publicly:
 > https://github.com/hazelcast/hazelcast-jet-beam-runner
 >
 > Still very much a work in progress though. One of the issues
I wanted to
 > raise is that currently we're translating each PTransform to
a Jet
 > Vertex (could be consider analogous to a Flink operator or a
vertex in
 > Tez). This is sub-optimal, since Beam creates lots of
transforms for
 > computations that could be performed inside the same Vertex,
such as
 > subsequent mapping transforms and many others. Ideally you
only need
 > distinct vertices where the data is re-partitioned and/or
shuffled. I'm
 > curious if Beam offers some way of translating the PTransform
graph to a
 > more minimal set of transforms, i.e. some kind of planner or
would this
 > have to be custom code? We've done a similar integration with
Cascading
 > in the past and it offered a planner which given a set of
rules would
 > partition the Cascading DAG into a minimal set of vertices
for the same
 > DAG. Curious if Beam has any similar functionality?
 >
 >
 >
 > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles
mailto:k...@apache.org>
 > >> wrote:
 >
 >     Elaborating on what Robert alluded to: when I wrote that
runner
 >     author guide, portability was in its infancy. Now Beam
Python can be
 >     run on Flink. So that guide is primarily focused on the
"deserialize
 >     a Java DoFn and call its methods" approach. A decent
amount of it is
 >     still really important to know, but is now the
responsibility of the
 >     "SDK harness", aka language-specific coprocessor. For
Python & Go &
 >      you really want to use the
 >     portability protos and the portable Flink runner is the
best model.
 >
 >     Kenn
 >
 >
 >     On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw
mailto:rober...@google.com>
 >     >> wrote:
 >
 >         On Fri, Feb 15, 2019 at 7:36 AM Can Gencer
mailto:c...@hazelcast.com>
 >         >> wrote:
 >          >
 >          > We at Hazelcast are 

Re: Hazelcast Jet Runner

2019-03-20 Thread Can Gencer
I had a look at "GreedyPipelineFuser" and indeed this was what exactly I
was talking about.

Is https://beam.apache.org/roadmap/portability/ still the best information
about the portable runners or is there a more in-depth guide available
anywhere?

On Wed, Mar 20, 2019 at 2:29 PM Can Gencer  wrote:

> Hi Max,
>
> Thanks. When you mean "old-style runner"  is this meant that this style of
> runners will be obsolete and only the portable one will be supported? The
> documentation for portable runners wasn't quite complete and the barrier to
> entry for writing an old style runner seemed easier for us and the old
> style runner should have better performance?
>
> On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels  wrote:
>
>> Hi Can,
>>
>> Thanks for the update. Interesting question. Flink has an optimization
>> built in called chaining which works together nicely with Beam.
>> Essentially, operators which share the same partitioning get executed
>> one after another inside a master operator. This saves resources.
>>
>> Interestingly, Beam's Fuser for portable Runners does something similar.
>> AFAIK there is no built-in solution for the old-style Runners. I think
>> it would be possible to build something like this on top of the existing
>> translation.
>>
>> Cheers,
>> Max
>>
>> On 20.03.19 13:07, Can Gencer wrote:
>> > Hi again,
>> >
>> > We've made some progress on the runner since writing this more than a
>> > month ago, the repo is available here publicly:
>> > https://github.com/hazelcast/hazelcast-jet-beam-runner
>> >
>> > Still very much a work in progress though. One of the issues I wanted
>> to
>> > raise is that currently we're translating each PTransform to a Jet
>> > Vertex (could be consider analogous to a Flink operator or a vertex in
>> > Tez). This is sub-optimal, since Beam creates lots of transforms for
>> > computations that could be performed inside the same Vertex, such as
>> > subsequent mapping transforms and many others. Ideally you only need
>> > distinct vertices where the data is re-partitioned and/or shuffled. I'm
>> > curious if Beam offers some way of translating the PTransform graph to
>> a
>> > more minimal set of transforms, i.e. some kind of planner or would this
>> > have to be custom code? We've done a similar integration with Cascading
>> > in the past and it offered a planner which given a set of rules would
>> > partition the Cascading DAG into a minimal set of vertices for the same
>> > DAG. Curious if Beam has any similar functionality?
>> >
>> >
>> >
>> > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles > > > wrote:
>> >
>> > Elaborating on what Robert alluded to: when I wrote that runner
>> > author guide, portability was in its infancy. Now Beam Python can be
>> > run on Flink. So that guide is primarily focused on the "deserialize
>> > a Java DoFn and call its methods" approach. A decent amount of it is
>> > still really important to know, but is now the responsibility of the
>> > "SDK harness", aka language-specific coprocessor. For Python & Go &
>> >  you really want to use the
>> > portability protos and the portable Flink runner is the best model.
>> >
>> > Kenn
>> >
>> >
>> > On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw <
>> rober...@google.com
>> > > wrote:
>> >
>> > On Fri, Feb 15, 2019 at 7:36 AM Can Gencer > > > wrote:
>> >  >
>> >  > We at Hazelcast are looking into writing a Beam runner for
>> > Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet). I
>> > wanted to introduce myself as we'll likely have questions as we
>> > start development.
>> >
>> > Welcome!
>> >
>> > Hazelcast looks interesting, a Beam runner for it would be very
>> > cool.
>> >
>> >  > Some of the things I'm wondering about currently:
>> >  >
>> >  > * Currently there seems to be a guide available at
>> > https://beam.apache.org/contribute/runner-guide/ , is this up
>> to
>> > date? Is there anything in specific to be aware of when starting
>> > with a new runner that's not covered here?
>> >
>> > That looks like a pretty good starting point. At a quick
>> glance, I
>> > don't see anything that looks out of date. Another resource that
>> > might
>> > be helpful is a talk from last year on writing an SDK (but as it
>> > mostly covers the runner-sdk interaction, it's also quite
>> useful for
>> > understanding the runner side:
>> >
>> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
>> > And please feel free to ask any questions on this list as well;
>> we'd
>> > be happy to help.
>> >
>> >  > * Should we be targeting the latest master which is at
>> > 2.12-SNAPSHOT or a stable version?
>> >
>> > 

Re: Hazelcast Jet Runner

2019-03-20 Thread Can Gencer
Hi Max,

Thanks. When you mean "old-style runner"  is this meant that this style of
runners will be obsolete and only the portable one will be supported? The
documentation for portable runners wasn't quite complete and the barrier to
entry for writing an old style runner seemed easier for us and the old
style runner should have better performance?

On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels  wrote:

> Hi Can,
>
> Thanks for the update. Interesting question. Flink has an optimization
> built in called chaining which works together nicely with Beam.
> Essentially, operators which share the same partitioning get executed
> one after another inside a master operator. This saves resources.
>
> Interestingly, Beam's Fuser for portable Runners does something similar.
> AFAIK there is no built-in solution for the old-style Runners. I think
> it would be possible to build something like this on top of the existing
> translation.
>
> Cheers,
> Max
>
> On 20.03.19 13:07, Can Gencer wrote:
> > Hi again,
> >
> > We've made some progress on the runner since writing this more than a
> > month ago, the repo is available here publicly:
> > https://github.com/hazelcast/hazelcast-jet-beam-runner
> >
> > Still very much a work in progress though. One of the issues I wanted to
> > raise is that currently we're translating each PTransform to a Jet
> > Vertex (could be consider analogous to a Flink operator or a vertex in
> > Tez). This is sub-optimal, since Beam creates lots of transforms for
> > computations that could be performed inside the same Vertex, such as
> > subsequent mapping transforms and many others. Ideally you only need
> > distinct vertices where the data is re-partitioned and/or shuffled. I'm
> > curious if Beam offers some way of translating the PTransform graph to a
> > more minimal set of transforms, i.e. some kind of planner or would this
> > have to be custom code? We've done a similar integration with Cascading
> > in the past and it offered a planner which given a set of rules would
> > partition the Cascading DAG into a minimal set of vertices for the same
> > DAG. Curious if Beam has any similar functionality?
> >
> >
> >
> > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles  > > wrote:
> >
> > Elaborating on what Robert alluded to: when I wrote that runner
> > author guide, portability was in its infancy. Now Beam Python can be
> > run on Flink. So that guide is primarily focused on the "deserialize
> > a Java DoFn and call its methods" approach. A decent amount of it is
> > still really important to know, but is now the responsibility of the
> > "SDK harness", aka language-specific coprocessor. For Python & Go &
> >  you really want to use the
> > portability protos and the portable Flink runner is the best model.
> >
> > Kenn
> >
> >
> > On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw  > > wrote:
> >
> > On Fri, Feb 15, 2019 at 7:36 AM Can Gencer  > > wrote:
> >  >
> >  > We at Hazelcast are looking into writing a Beam runner for
> > Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet). I
> > wanted to introduce myself as we'll likely have questions as we
> > start development.
> >
> > Welcome!
> >
> > Hazelcast looks interesting, a Beam runner for it would be very
> > cool.
> >
> >  > Some of the things I'm wondering about currently:
> >  >
> >  > * Currently there seems to be a guide available at
> > https://beam.apache.org/contribute/runner-guide/ , is this up to
> > date? Is there anything in specific to be aware of when starting
> > with a new runner that's not covered here?
> >
> > That looks like a pretty good starting point. At a quick glance,
> I
> > don't see anything that looks out of date. Another resource that
> > might
> > be helpful is a talk from last year on writing an SDK (but as it
> > mostly covers the runner-sdk interaction, it's also quite useful
> for
> > understanding the runner side:
> >
> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
> > And please feel free to ask any questions on this list as well;
> we'd
> > be happy to help.
> >
> >  > * Should we be targeting the latest master which is at
> > 2.12-SNAPSHOT or a stable version?
> >
> > I would target the latest master.
> >
> >  > * After a runner is developed, how is the maintenance
> > typically handled, as the runners seems to be part of Beam
> codebase?
> >
> > Either is possible. Several runner adapters are part of the Beam
> > codebase, but for example the IMB Streams Beam runner is not.
> There
> > are certainly pros and cons (certainly early on when the APIs
> > 

Re: Hazelcast Jet Runner

2019-03-20 Thread Maximilian Michels

Hi Can,

Thanks for the update. Interesting question. Flink has an optimization 
built in called chaining which works together nicely with Beam. 
Essentially, operators which share the same partitioning get executed 
one after another inside a master operator. This saves resources.


Interestingly, Beam's Fuser for portable Runners does something similar. 
AFAIK there is no built-in solution for the old-style Runners. I think 
it would be possible to build something like this on top of the existing 
translation.


Cheers,
Max

On 20.03.19 13:07, Can Gencer wrote:

Hi again,

We've made some progress on the runner since writing this more than a 
month ago, the repo is available here publicly: 
https://github.com/hazelcast/hazelcast-jet-beam-runner


Still very much a work in progress though. One of the issues I wanted to 
raise is that currently we're translating each PTransform to a Jet 
Vertex (could be consider analogous to a Flink operator or a vertex in 
Tez). This is sub-optimal, since Beam creates lots of transforms for 
computations that could be performed inside the same Vertex, such as 
subsequent mapping transforms and many others. Ideally you only need 
distinct vertices where the data is re-partitioned and/or shuffled. I'm 
curious if Beam offers some way of translating the PTransform graph to a 
more minimal set of transforms, i.e. some kind of planner or would this 
have to be custom code? We've done a similar integration with Cascading 
in the past and it offered a planner which given a set of rules would 
partition the Cascading DAG into a minimal set of vertices for the same 
DAG. Curious if Beam has any similar functionality?




On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles > wrote:


Elaborating on what Robert alluded to: when I wrote that runner
author guide, portability was in its infancy. Now Beam Python can be
run on Flink. So that guide is primarily focused on the "deserialize
a Java DoFn and call its methods" approach. A decent amount of it is
still really important to know, but is now the responsibility of the
"SDK harness", aka language-specific coprocessor. For Python & Go &
 you really want to use the
portability protos and the portable Flink runner is the best model.

Kenn


On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw mailto:rober...@google.com>> wrote:

On Fri, Feb 15, 2019 at 7:36 AM Can Gencer mailto:c...@hazelcast.com>> wrote:
 >
 > We at Hazelcast are looking into writing a Beam runner for
Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet). I
wanted to introduce myself as we'll likely have questions as we
start development.

Welcome!

Hazelcast looks interesting, a Beam runner for it would be very
cool.

 > Some of the things I'm wondering about currently:
 >
 > * Currently there seems to be a guide available at
https://beam.apache.org/contribute/runner-guide/ , is this up to
date? Is there anything in specific to be aware of when starting
with a new runner that's not covered here?

That looks like a pretty good starting point. At a quick glance, I
don't see anything that looks out of date. Another resource that
might
be helpful is a talk from last year on writing an SDK (but as it
mostly covers the runner-sdk interaction, it's also quite useful for
understanding the runner side:

https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
And please feel free to ask any questions on this list as well; we'd
be happy to help.

 > * Should we be targeting the latest master which is at
2.12-SNAPSHOT or a stable version?

I would target the latest master.

 > * After a runner is developed, how is the maintenance
typically handled, as the runners seems to be part of Beam codebase?

Either is possible. Several runner adapters are part of the Beam
codebase, but for example the IMB Streams Beam runner is not. There
are certainly pros and cons (certainly early on when the APIs
themselves were under heavy development it was easier to keep things
in sync in the same codebase, but things have mostly stabilized
now).
A runner only becomes part of the Beam codebase if there are members
of the community committed to maintaining it (which could include
you). Both approaches are fine.

- Robert



Re: Hazelcast Jet Runner

2019-02-15 Thread Kenneth Knowles
Elaborating on what Robert alluded to: when I wrote that runner author
guide, portability was in its infancy. Now Beam Python can be run on Flink.
So that guide is primarily focused on the "deserialize a Java DoFn and call
its methods" approach. A decent amount of it is still really important to
know, but is now the responsibility of the "SDK harness", aka
language-specific coprocessor. For Python & Go &  you really want to use the portability protos and the portable Flink
runner is the best model.

Kenn


On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw  wrote:

> On Fri, Feb 15, 2019 at 7:36 AM Can Gencer  wrote:
> >
> > We at Hazelcast are looking into writing a Beam runner for Hazelcast Jet
> (https://github.com/hazelcast/hazelcast-jet). I wanted to introduce
> myself as we'll likely have questions as we start development.
>
> Welcome!
>
> Hazelcast looks interesting, a Beam runner for it would be very cool.
>
> > Some of the things I'm wondering about currently:
> >
> > * Currently there seems to be a guide available at
> https://beam.apache.org/contribute/runner-guide/ , is this up to date? Is
> there anything in specific to be aware of when starting with a new runner
> that's not covered here?
>
> That looks like a pretty good starting point. At a quick glance, I
> don't see anything that looks out of date. Another resource that might
> be helpful is a talk from last year on writing an SDK (but as it
> mostly covers the runner-sdk interaction, it's also quite useful for
> understanding the runner side:
>
> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
> And please feel free to ask any questions on this list as well; we'd
> be happy to help.
>
> > * Should we be targeting the latest master which is at 2.12-SNAPSHOT or
> a stable version?
>
> I would target the latest master.
>
> > * After a runner is developed, how is the maintenance typically handled,
> as the runners seems to be part of Beam codebase?
>
> Either is possible. Several runner adapters are part of the Beam
> codebase, but for example the IMB Streams Beam runner is not. There
> are certainly pros and cons (certainly early on when the APIs
> themselves were under heavy development it was easier to keep things
> in sync in the same codebase, but things have mostly stabilized now).
> A runner only becomes part of the Beam codebase if there are members
> of the community committed to maintaining it (which could include
> you). Both approaches are fine.
>
> - Robert
>


Re: Hazelcast Jet Runner

2019-02-15 Thread Robert Bradshaw
On Fri, Feb 15, 2019 at 7:36 AM Can Gencer  wrote:
>
> We at Hazelcast are looking into writing a Beam runner for Hazelcast Jet 
> (https://github.com/hazelcast/hazelcast-jet). I wanted to introduce myself as 
> we'll likely have questions as we start development.

Welcome!

Hazelcast looks interesting, a Beam runner for it would be very cool.

> Some of the things I'm wondering about currently:
>
> * Currently there seems to be a guide available at 
> https://beam.apache.org/contribute/runner-guide/ , is this up to date? Is 
> there anything in specific to be aware of when starting with a new runner 
> that's not covered here?

That looks like a pretty good starting point. At a quick glance, I
don't see anything that looks out of date. Another resource that might
be helpful is a talk from last year on writing an SDK (but as it
mostly covers the runner-sdk interaction, it's also quite useful for
understanding the runner side:
https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
And please feel free to ask any questions on this list as well; we'd
be happy to help.

> * Should we be targeting the latest master which is at 2.12-SNAPSHOT or a 
> stable version?

I would target the latest master.

> * After a runner is developed, how is the maintenance typically handled, as 
> the runners seems to be part of Beam codebase?

Either is possible. Several runner adapters are part of the Beam
codebase, but for example the IMB Streams Beam runner is not. There
are certainly pros and cons (certainly early on when the APIs
themselves were under heavy development it was easier to keep things
in sync in the same codebase, but things have mostly stabilized now).
A runner only becomes part of the Beam codebase if there are members
of the community committed to maintaining it (which could include
you). Both approaches are fine.

- Robert