Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-27 Thread Zor X.L.

Hi,


We use kafka because:

- it is a high throughput message queue

  - we want to have about 2GB/s in/write and 7GB/s out/read 
perforamance (400B/msg)


- it is popular (well this is kind of important...)

- the input has a start and an end, but we want to process new data as 
soon as possible



Will NiFi be a possible alternative? Are there any other option?



*From:* Aljoscha Krettek 
*Subject:* How can I cancel a Flink job safely without a special stop 
message in the stream?

*Date:* Friday, Aug 25, 2017 19:45 GMT+0800
*To:* Nico Kruber 
*Cc:* user@flink.apache.org, Zor X.L. 


Hi,

I don't think implementing a StoppableSource is an option here since you want 
to use the Flink Kafka source. What is your motivation for this? Especially, 
why are you using Kafka if the input is bounded and you will shut down the job 
at some point?

Also, regarding StoppableSource: this will not tell the source to read all 
remaining input and then stop. It will just ask the source to stop at some 
opportune time.

Best,
Aljoscha


On 14. Aug 2017, at 17:29, Nico Kruber  wrote:

Hi,
have you tried letting your source also implement the StoppableFunction
interface as suggested by the SourceFunction javadoc?

If  a source is stopped, e.g. after identifying some special signal from the
outside, it will continue processing all remaining events and the Flink
program will shut down gracefully.

Is that what you intend to do?


Nico

On Monday, 14 August 2017 11:15:22 CEST Zor X.L. wrote:

Bump...

在 2017/8/11 9:36, Zor X.L. 写道:

Hi,

What we want to do is cancelling the Flink job after all upstream data
were processed.
We use Kafka as our input and output, and use the SQL capability of
Table API by the way.

A possible solution is:
*

embed a stop message at the tail of upstream

*

do what should be done in the Flink Job

*

propagate this stop message to downstream untouched after all data
are processed

*

a downstream monitoring program can thus know if all subtasks are
finished processing all upstream data

*

then cancel the job

*What we want to do is canelling the job safely without utilizing this
kind of stop message.*

*But I find this is hard or inefficient to implement in Flink… is it
possible?*

P.S. If not utilizing Flink, a possible solution is:
*

the upstream program write a stop signal some where after all data
were written to Kafka.

 o

the data has a unique index for exactly once semantics

 o

the signal should include the last data’s index of all partition

*

when the job receive the upstream stop signal

 o

if the last data of a partition is processed, then this
partition is finished.

 o

If all partition is finished, the job can be cancelled

​




Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-25 Thread Aljoscha Krettek
Hi,

I don't think implementing a StoppableSource is an option here since you want 
to use the Flink Kafka source. What is your motivation for this? Especially, 
why are you using Kafka if the input is bounded and you will shut down the job 
at some point?

Also, regarding StoppableSource: this will not tell the source to read all 
remaining input and then stop. It will just ask the source to stop at some 
opportune time.

Best,
Aljoscha

> On 14. Aug 2017, at 17:29, Nico Kruber  wrote:
> 
> Hi,
> have you tried letting your source also implement the StoppableFunction 
> interface as suggested by the SourceFunction javadoc?
> 
> If  a source is stopped, e.g. after identifying some special signal from the 
> outside, it will continue processing all remaining events and the Flink 
> program will shut down gracefully.
> 
> Is that what you intend to do?
> 
> 
> Nico
> 
> On Monday, 14 August 2017 11:15:22 CEST Zor X.L. wrote:
>> Bump...
>> 
>> 在 2017/8/11 9:36, Zor X.L. 写道:
>>> Hi,
>>> 
>>> What we want to do is cancelling the Flink job after all upstream data
>>> were processed.
>>> We use Kafka as our input and output, and use the SQL capability of
>>> Table API by the way.
>>> 
>>> A possible solution is:
>>> *
>>> 
>>>embed a stop message at the tail of upstream
>>> 
>>> *
>>> 
>>>do what should be done in the Flink Job
>>> 
>>> *
>>> 
>>>propagate this stop message to downstream untouched after all data
>>>are processed
>>> 
>>> *
>>> 
>>>a downstream monitoring program can thus know if all subtasks are
>>>finished processing all upstream data
>>> 
>>> *
>>> 
>>>then cancel the job
>>> 
>>> *What we want to do is canelling the job safely without utilizing this
>>> kind of stop message.*
>>> 
>>> *But I find this is hard or inefficient to implement in Flink… is it
>>> possible?*
>>> 
>>> P.S. If not utilizing Flink, a possible solution is:
>>> *
>>> 
>>>the upstream program write a stop signal some where after all data
>>>were written to Kafka.
>>> 
>>> o
>>> 
>>>the data has a unique index for exactly once semantics
>>> 
>>> o
>>> 
>>>the signal should include the last data’s index of all partition
>>> 
>>> *
>>> 
>>>when the job receive the upstream stop signal
>>> 
>>> o
>>> 
>>>if the last data of a partition is processed, then this
>>>partition is finished.
>>> 
>>> o
>>> 
>>>If all partition is finished, the job can be cancelled
>>> 
>>> ​
> 



Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Nico Kruber
Hi,
have you tried letting your source also implement the StoppableFunction 
interface as suggested by the SourceFunction javadoc?

If  a source is stopped, e.g. after identifying some special signal from the 
outside, it will continue processing all remaining events and the Flink 
program will shut down gracefully.

Is that what you intend to do?


Nico

On Monday, 14 August 2017 11:15:22 CEST Zor X.L. wrote:
> Bump...
> 
> 在 2017/8/11 9:36, Zor X.L. 写道:
> > Hi,
> > 
> > What we want to do is cancelling the Flink job after all upstream data
> > were processed.
> > We use Kafka as our input and output, and use the SQL capability of
> > Table API by the way.
> > 
> > A possible solution is:
> >  *
> >  
> > embed a stop message at the tail of upstream
> >  
> >  *
> >  
> > do what should be done in the Flink Job
> >  
> >  *
> >  
> > propagate this stop message to downstream untouched after all data
> > are processed
> >  
> >  *
> >  
> > a downstream monitoring program can thus know if all subtasks are
> > finished processing all upstream data
> >  
> >  *
> >  
> > then cancel the job
> > 
> > *What we want to do is canelling the job safely without utilizing this
> > kind of stop message.*
> > 
> > *But I find this is hard or inefficient to implement in Flink… is it
> > possible?*
> > 
> > P.S. If not utilizing Flink, a possible solution is:
> >  *
> >  
> > the upstream program write a stop signal some where after all data
> > were written to Kafka.
> > 
> >  o
> >  
> > the data has a unique index for exactly once semantics
> >  
> >  o
> >  
> > the signal should include the last data’s index of all partition
> >  
> >  *
> >  
> > when the job receive the upstream stop signal
> > 
> >  o
> >  
> > if the last data of a partition is processed, then this
> > partition is finished.
> >  
> >  o
> >  
> > If all partition is finished, the job can be cancelled
> > 
> > ​



signature.asc
Description: This is a digitally signed message part.


Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Zor X.L.

Bump...


在 2017/8/11 9:36, Zor X.L. 写道:


Hi,

What we want to do is cancelling the Flink job after all upstream data 
were processed.
We use Kafka as our input and output, and use the SQL capability of 
Table API by the way.


A possible solution is:

 *

embed a stop message at the tail of upstream

 *

do what should be done in the Flink Job

 *

propagate this stop message to downstream untouched after all data
are processed

 *

a downstream monitoring program can thus know if all subtasks are
finished processing all upstream data

 *

then cancel the job

*What we want to do is canelling the job safely without utilizing this 
kind of stop message.*


*But I find this is hard or inefficient to implement in Flink… is it 
possible?*


P.S. If not utilizing Flink, a possible solution is:

 *

the upstream program write a stop signal some where after all data
were written to Kafka.

 o

the data has a unique index for exactly once semantics

 o

the signal should include the last data’s index of all partition

 *

when the job receive the upstream stop signal

 o

if the last data of a partition is processed, then this
partition is finished.

 o

If all partition is finished, the job can be cancelled

​