Re: Terminating streaming test

2018-02-08 Thread Aljoscha Krettek
Hi,

The job is not explicitly stopped, bringing down the cluster will also bring 
down the job. (Which is maybe not the nicest way of doing things but it works.)

Sources can trigger pipeline termination by returning from their run() method.

Best,
Aljoscha

> On 7. Feb 2018, at 21:15, Thomas Weise  wrote:
> 
> Thanks! It would indeed be nice to have this as framework that makes test
> fun and easy to write ;-)
> 
> Looking at SavepointMigrationTestBase, I see that the cluster is eventually
> stopped in teardown, but I don't find where the individual job is
> terminated after the expected results are in? Also, CheckingRestoringSource
> will run until cancelled, is there a way where the source can trigger
> pipeline termination?
> 
> Thanks,
> Thomas
> 
> 
> On Wed, Feb 7, 2018 at 7:56 AM, Aljoscha Krettek 
> wrote:
> 
>> There is StatefulJobSavepointMigrationITCase, which executes a proper
>> unbounded pipeline on a locally started cluster and "listens" for some
>> criteria via accumulators before cancelling the job and shutting down the
>> cluster. The communication with the cluster is quite custom here, but I
>> would really like to have a framework that comes with Flink that allows
>> writing such tests. Somewhat similar to how PAssert works in Beam.
>> 
>> Best,
>> Aljoscha
>> 
>>> On 7. Feb 2018, at 04:34, Thomas Weise  wrote:
>>> 
>>> Hi Ken,
>>> 
>>> Thanks! I would expect more folks to run into this and hence surprised to
>>> not find this in LocalStreamEnvironment. Is there a reason for that?
>>> 
>>> In the specific case, we have an unbounded source (Kinesis), but for
>>> testing we would like to make it bounded. Hence the earlier question
>>> whether it is possible to terminate a topology with a final watermark or
>>> different means from within the source, similar to how a bounded source
>> in
>>> Beam would behave.
>>> 
>>> Thanks,
>>> Thomas
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler >> 
>>> wrote:
>>> 
 Hi Thomas,
 
 Normally the streaming job will terminate when the sources are exhausted
 and all records have been processed.
 
 I assume you have some unbounded source(s), thus this doesn’t work for
 your case.
 
 We’d run into a similar situation with a streaming job that has
>> iterations.
 
 Our solution was your option #1 below, where we created a modified
>> version
 of LocalStreamEnvironment  that supports async execution.
 
 — Ken
 
 
> On Feb 6, 2018, at 4:21 PM, Thomas Weise  wrote:
> 
> Hi,
> 
> I'm looking for an example of an integration test that runs a streaming
 job
> and terminates when the expected result becomes available. I could
>> think
 of
> 2 approaches:
> 
> 1. Modified version of LocalStreamEnvironment that executes the job
> asynchronously and polls for the result or
> 
> 2. Source that emits a final watermark that causes the topology to
> terminate after the watermark has traversed the topology. Is that
 possible
> with Flink?
> 
> But probably this is a rather common testing need that's already
>> solved?!
> 
> Thanks,
> Thomas
 
 --
 Ken Krugler
 http://www.scaleunlimited.com
 custom big data solutions & training
 Hadoop, Cascading, Cassandra & Solr
 
 
>> 
>> 



Re: Terminating streaming test

2018-02-07 Thread Thomas Weise
Thanks! It would indeed be nice to have this as framework that makes test
fun and easy to write ;-)

Looking at SavepointMigrationTestBase, I see that the cluster is eventually
stopped in teardown, but I don't find where the individual job is
terminated after the expected results are in? Also, CheckingRestoringSource
will run until cancelled, is there a way where the source can trigger
pipeline termination?

Thanks,
Thomas


On Wed, Feb 7, 2018 at 7:56 AM, Aljoscha Krettek 
wrote:

> There is StatefulJobSavepointMigrationITCase, which executes a proper
> unbounded pipeline on a locally started cluster and "listens" for some
> criteria via accumulators before cancelling the job and shutting down the
> cluster. The communication with the cluster is quite custom here, but I
> would really like to have a framework that comes with Flink that allows
> writing such tests. Somewhat similar to how PAssert works in Beam.
>
> Best,
> Aljoscha
>
> > On 7. Feb 2018, at 04:34, Thomas Weise  wrote:
> >
> > Hi Ken,
> >
> > Thanks! I would expect more folks to run into this and hence surprised to
> > not find this in LocalStreamEnvironment. Is there a reason for that?
> >
> > In the specific case, we have an unbounded source (Kinesis), but for
> > testing we would like to make it bounded. Hence the earlier question
> > whether it is possible to terminate a topology with a final watermark or
> > different means from within the source, similar to how a bounded source
> in
> > Beam would behave.
> >
> > Thanks,
> > Thomas
> >
> >
> >
> >
> >
> >
> > On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler  >
> > wrote:
> >
> >> Hi Thomas,
> >>
> >> Normally the streaming job will terminate when the sources are exhausted
> >> and all records have been processed.
> >>
> >> I assume you have some unbounded source(s), thus this doesn’t work for
> >> your case.
> >>
> >> We’d run into a similar situation with a streaming job that has
> iterations.
> >>
> >> Our solution was your option #1 below, where we created a modified
> version
> >> of LocalStreamEnvironment  >> ScaleUnlimited/flink-crawler/master/src/main/java/org/
> >> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
> >> cExecution.java> that supports async execution.
> >>
> >> — Ken
> >>
> >>
> >>> On Feb 6, 2018, at 4:21 PM, Thomas Weise  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I'm looking for an example of an integration test that runs a streaming
> >> job
> >>> and terminates when the expected result becomes available. I could
> think
> >> of
> >>> 2 approaches:
> >>>
> >>> 1. Modified version of LocalStreamEnvironment that executes the job
> >>> asynchronously and polls for the result or
> >>>
> >>> 2. Source that emits a final watermark that causes the topology to
> >>> terminate after the watermark has traversed the topology. Is that
> >> possible
> >>> with Flink?
> >>>
> >>> But probably this is a rather common testing need that's already
> solved?!
> >>>
> >>> Thanks,
> >>> Thomas
> >>
> >> --
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> custom big data solutions & training
> >> Hadoop, Cascading, Cassandra & Solr
> >>
> >>
>
>


Re: Terminating streaming test

2018-02-07 Thread Aljoscha Krettek
There is StatefulJobSavepointMigrationITCase, which executes a proper unbounded 
pipeline on a locally started cluster and "listens" for some criteria via 
accumulators before cancelling the job and shutting down the cluster. The 
communication with the cluster is quite custom here, but I would really like to 
have a framework that comes with Flink that allows writing such tests. Somewhat 
similar to how PAssert works in Beam.

Best,
Aljoscha

> On 7. Feb 2018, at 04:34, Thomas Weise  wrote:
> 
> Hi Ken,
> 
> Thanks! I would expect more folks to run into this and hence surprised to
> not find this in LocalStreamEnvironment. Is there a reason for that?
> 
> In the specific case, we have an unbounded source (Kinesis), but for
> testing we would like to make it bounded. Hence the earlier question
> whether it is possible to terminate a topology with a final watermark or
> different means from within the source, similar to how a bounded source in
> Beam would behave.
> 
> Thanks,
> Thomas
> 
> 
> 
> 
> 
> 
> On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler 
> wrote:
> 
>> Hi Thomas,
>> 
>> Normally the streaming job will terminate when the sources are exhausted
>> and all records have been processed.
>> 
>> I assume you have some unbounded source(s), thus this doesn’t work for
>> your case.
>> 
>> We’d run into a similar situation with a streaming job that has iterations.
>> 
>> Our solution was your option #1 below, where we created a modified version
>> of LocalStreamEnvironment > ScaleUnlimited/flink-crawler/master/src/main/java/org/
>> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
>> cExecution.java> that supports async execution.
>> 
>> — Ken
>> 
>> 
>>> On Feb 6, 2018, at 4:21 PM, Thomas Weise  wrote:
>>> 
>>> Hi,
>>> 
>>> I'm looking for an example of an integration test that runs a streaming
>> job
>>> and terminates when the expected result becomes available. I could think
>> of
>>> 2 approaches:
>>> 
>>> 1. Modified version of LocalStreamEnvironment that executes the job
>>> asynchronously and polls for the result or
>>> 
>>> 2. Source that emits a final watermark that causes the topology to
>>> terminate after the watermark has traversed the topology. Is that
>> possible
>>> with Flink?
>>> 
>>> But probably this is a rather common testing need that's already solved?!
>>> 
>>> Thanks,
>>> Thomas
>> 
>> --
>> Ken Krugler
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 



Re: Terminating streaming test

2018-02-06 Thread Thomas Weise
Hi Ken,

Thanks! I would expect more folks to run into this and hence surprised to
not find this in LocalStreamEnvironment. Is there a reason for that?

In the specific case, we have an unbounded source (Kinesis), but for
testing we would like to make it bounded. Hence the earlier question
whether it is possible to terminate a topology with a final watermark or
different means from within the source, similar to how a bounded source in
Beam would behave.

Thanks,
Thomas






On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler 
wrote:

> Hi Thomas,
>
> Normally the streaming job will terminate when the sources are exhausted
> and all records have been processed.
>
> I assume you have some unbounded source(s), thus this doesn’t work for
> your case.
>
> We’d run into a similar situation with a streaming job that has iterations.
>
> Our solution was your option #1 below, where we created a modified version
> of LocalStreamEnvironment  ScaleUnlimited/flink-crawler/master/src/main/java/org/
> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
> cExecution.java> that supports async execution.
>
> — Ken
>
>
> > On Feb 6, 2018, at 4:21 PM, Thomas Weise  wrote:
> >
> > Hi,
> >
> > I'm looking for an example of an integration test that runs a streaming
> job
> > and terminates when the expected result becomes available. I could think
> of
> > 2 approaches:
> >
> > 1. Modified version of LocalStreamEnvironment that executes the job
> > asynchronously and polls for the result or
> >
> > 2. Source that emits a final watermark that causes the topology to
> > terminate after the watermark has traversed the topology. Is that
> possible
> > with Flink?
> >
> > But probably this is a rather common testing need that's already solved?!
> >
> > Thanks,
> > Thomas
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>


Re: Terminating streaming test

2018-02-06 Thread Ken Krugler
Hi Thomas,

Normally the streaming job will terminate when the sources are exhausted and 
all records have been processed.

I assume you have some unbounded source(s), thus this doesn’t work for your 
case.

We’d run into a similar situation with a streaming job that has iterations.

Our solution was your option #1 below, where we created a modified version of 
LocalStreamEnvironment 

 that supports async execution.

— Ken


> On Feb 6, 2018, at 4:21 PM, Thomas Weise  wrote:
> 
> Hi,
> 
> I'm looking for an example of an integration test that runs a streaming job
> and terminates when the expected result becomes available. I could think of
> 2 approaches:
> 
> 1. Modified version of LocalStreamEnvironment that executes the job
> asynchronously and polls for the result or
> 
> 2. Source that emits a final watermark that causes the topology to
> terminate after the watermark has traversed the topology. Is that possible
> with Flink?
> 
> But probably this is a rather common testing need that's already solved?!
> 
> Thanks,
> Thomas

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr