Re: Re: how to run streaming process after batch process is completed?

Joern Kottmann Fri, 03 Dec 2021 00:43:44 -0800

Hello,

Are there plans to support checkpoints for batch mode? I currently load the
state back via the DataStream API, but this gets more and more complicated
and doesn't always lead to a perfect state restore (as flink could have
done).


This is one of my most wanted Flink features these days.

Regards,
Jörn




On Thu, Dec 2, 2021 at 9:24 AM Yun Gao <yungao...@aliyun.com> wrote:

> Hi Vtygoss,
>
> Very thanks for sharing the scenarios!
>
> Currently for batch mode checkpoint is not support, thus it could not
> create a snapshot after the job is finished. However, there might be some
> alternative solutions:
>
> 1. Hybrid source [1] targets at allowing first read from a bounded source,
> then switch
> to an unbounded source, which seems to work in this case. however,
> currently it might not
> support the table / sql yet, which might be done in 1.15.
> 2. The batch job might first write the result to an intermediate table,
> then for the unbounded
> stream job, it might first load the table into state with DataStream API
> on startup or use dimension
> join to continue processing new records.
>
> Best,
> Yun
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
>
> ------------------Original Mail ------------------
> *Sender:*vtygoss <vtyg...@126.com>
> *Send Date:*Wed Dec 1 17:52:17 2021
> *Recipients:*Alexander Preuß <alexanderpre...@ververica.com>
> *CC:*user@flink.apache.org <user@flink.apache.org>
> *Subject:*Re: how to run streaming process after batch process is
> completed?
>
>> Hi Alexander,
>>
>>
>> This is my ideal data pipeline.
>>
>> - 1. Sqoop transfer bounded data from database to hive. And I think flink
>> batch process is more efficient than streaming process, so i want to
>> process this bounded data in batch mode and write result in HiveTable2.
>>
>> - 2. There ares some tools to transfer CDC / BINLOG to kafka, and to
>> write incremental unbounded data in HiveTable1.  I want to process this
>> unbounded data in streaming mode and update incremental result in
>> HiveTable2.
>>
>>
>> So this is the problem. The flink streaming sql application cannot be
>> restored from  batch process application. e.g. SQL: insert into table_2
>> select count(1) from table_1. In batch mode, the result stored in table_2
>> is N. And i expect that the accumulator number starts from N, not 0 when
>> streaming process started.
>>
>>
>> Thanks for your reply.
>>
>>
>> Best Regard!
>>
>>
>> (sending again because I accidentally left out the user ml in the reply
>> on the first try)...
>>
>> 在 2021年11月30日 21:42，Alexander Preuß<alexanderpre...@ververica.com> 写道：
>>
>> Hi Vtygoss,
>>
>> Can you explain a bit more about your ideal pipeline? Is the batch data
>> bounded data or could you also process it in streaming execution mode? And
>> is the streaming data derived from the batch data or do you just want to
>> ensure that the batch has been finished before running the processing of
>> the streaming data?
>>
>> Best Regards,
>> Alexander
>>
>> (sending again because I accidentally left out the user ml in the reply
>> on the first try)
>>
>> On Tue, Nov 30, 2021 at 12:38 PM vtygoss <vtyg...@126.com> wrote:
>>
>>> Hi, community!
>>>
>>>
>>> By Flink, I want to unify batch process and streaming process in data
>>> production pipeline. Batch process is used to process inventory data, then
>>> streaming process is used to process incremental data. But I meet a
>>> problem, there is no  state in batch and the result is error if i run
>>> stream process directly.
>>>
>>>
>>> So how to run streaming process accurately  after batch process is
>>> completed?   Is there any doc or demo to handle this scenario?
>>>
>>>
>>> Thanks for your any reply or suggestion!
>>>
>>>
>>> Best Regards!
>>>
>>>
>>>
>>>
>>>

Re: Re: how to run streaming process after batch process is completed?

Reply via email to