My naive solution can't work because a dump can be quite long.
So, yes I have to find a way to stop the consumption from the topic used for 
streaming mode when a dump is done :(
Terry, I try to implement something based on your reply and based on this 
thread 
https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a
 
Any suggestions are welcomed
thx.

David

On 2020/01/06 09:35:37, David Morin <morin.david....@gmail.com> wrote: 
> Hi,
> 
> Thanks for your replies.
> Yes Terry. You are right. I can try to create a custom source. 
> But perhaps, according to my use case, I figured out I can use a technical 
> field in my data. This is a timestamp and I think I just have to ignore late 
> events with watermarks or later in the pipeline according to metadata stored 
> in the Flink state. I test it now...
> Thx
> 
> David
> 
> On 2020/01/03 15:44:08, Chesnay Schepler <ches...@apache.org> wrote: 
> > Are you asking how to detect from within the job whether the dump is 
> > complete, or how to combine these 2 jobs?
> > 
> > If you had a way to notice whether the dump is complete, then I would 
> > suggest to create a custom source that wraps 2 kafka sources, and switch 
> > between them at will based on your conditions.
> > 
> > 
> > On 03/01/2020 03:53, Terry Wang wrote:
> > > Hi,
> > >
> > > I’d like to share my opinion here. It seems that you need adjust the 
> > > Kafka consumer to have communication each other. When your begin the dump 
> > > process, you need to notify another CDC-topic consumer to wait idle.
> > >
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > >> 2020年1月2日 16:49,David Morin <morin.david....@gmail.com> 写道:
> > >>
> > >> Hi,
> > >>
> > >> Is there a way to stop temporarily to consume one kafka source in 
> > >> streaming mode ?
> > >> Use case: I have to consume 2 topics but in fact one of them is more 
> > >> prioritized.
> > >> One of this topic is dedicated to ingest data from db (change data 
> > >> capture) and one of them is dedicated to make a synchronization (a dump 
> > >> i.e. a SELECT ... from db). At the moment the last one is performed by 
> > >> one Flink job and we start this one after stop the previous one (CDC) 
> > >> manually
> > >> I want to merge these 2 modes and automatically stop consumption of the 
> > >> topic dedicated to the CDC mode when a dump is done.
> > >> How to handle that with Flink in a streaming way ? backpressure ? ...
> > >> Thx in advance for your insights
> > >>
> > >> David
> > >
> > 
> > 
> 

Reply via email to