Hello, Are there plans to support checkpoints for batch mode? I currently load the state back via the DataStream API, but this gets more and more complicated and doesn't always lead to a perfect state restore (as flink could have done).
This is one of my most wanted Flink features these days. Regards, Jörn On Thu, Dec 2, 2021 at 9:24 AM Yun Gao <yungao...@aliyun.com> wrote: > Hi Vtygoss, > > Very thanks for sharing the scenarios! > > Currently for batch mode checkpoint is not support, thus it could not > create a snapshot after the job is finished. However, there might be some > alternative solutions: > > 1. Hybrid source [1] targets at allowing first read from a bounded source, > then switch > to an unbounded source, which seems to work in this case. however, > currently it might not > support the table / sql yet, which might be done in 1.15. > 2. The batch job might first write the result to an intermediate table, > then for the unbounded > stream job, it might first load the table into state with DataStream API > on startup or use dimension > join to continue processing new records. > > Best, > Yun > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source > > ------------------Original Mail ------------------ > *Sender:*vtygoss <vtyg...@126.com> > *Send Date:*Wed Dec 1 17:52:17 2021 > *Recipients:*Alexander Preuß <alexanderpre...@ververica.com> > *CC:*user@flink.apache.org <user@flink.apache.org> > *Subject:*Re: how to run streaming process after batch process is > completed? > >> Hi Alexander, >> >> >> This is my ideal data pipeline. >> >> - 1. Sqoop transfer bounded data from database to hive. And I think flink >> batch process is more efficient than streaming process, so i want to >> process this bounded data in batch mode and write result in HiveTable2. >> >> - 2. There ares some tools to transfer CDC / BINLOG to kafka, and to >> write incremental unbounded data in HiveTable1. I want to process this >> unbounded data in streaming mode and update incremental result in >> HiveTable2. >> >> >> So this is the problem. The flink streaming sql application cannot be >> restored from batch process application. e.g. SQL: insert into table_2 >> select count(1) from table_1. In batch mode, the result stored in table_2 >> is N. And i expect that the accumulator number starts from N, not 0 when >> streaming process started. >> >> >> Thanks for your reply. >> >> >> Best Regard! >> >> >> (sending again because I accidentally left out the user ml in the reply >> on the first try)... >> >> 在 2021年11月30日 21:42,Alexander Preuß<alexanderpre...@ververica.com> 写道: >> >> Hi Vtygoss, >> >> Can you explain a bit more about your ideal pipeline? Is the batch data >> bounded data or could you also process it in streaming execution mode? And >> is the streaming data derived from the batch data or do you just want to >> ensure that the batch has been finished before running the processing of >> the streaming data? >> >> Best Regards, >> Alexander >> >> (sending again because I accidentally left out the user ml in the reply >> on the first try) >> >> On Tue, Nov 30, 2021 at 12:38 PM vtygoss <vtyg...@126.com> wrote: >> >>> Hi, community! >>> >>> >>> By Flink, I want to unify batch process and streaming process in data >>> production pipeline. Batch process is used to process inventory data, then >>> streaming process is used to process incremental data. But I meet a >>> problem, there is no state in batch and the result is error if i run >>> stream process directly. >>> >>> >>> So how to run streaming process accurately after batch process is >>> completed? Is there any doc or demo to handle this scenario? >>> >>> >>> Thanks for your any reply or suggestion! >>> >>> >>> Best Regards! >>> >>> >>> >>> >>>