Hi, This one: https://issues.apache.org/jira/browse/FLINK-2491 <https://issues.apache.org/jira/browse/FLINK-2491>
1. What if you set `org.apache.flink.streaming.api.functions.source.FileProcessingMode#PROCESS_CONTINUOUSLY`? This will prevent split source from finishing, so checkpointing should work fine. Downside is that you would have to on your own, manually, determine whether the job has finished/completed or not. Other things that come to my mind would require some coding: 2. Look at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#createFileInput, copy it’s code and replace `ContinuousFileMonitoringFunction` with something that finishes on some custom event/action/condition. The code that you would have to modify/replace is alongside usages of `FileProcessingMode monitoringMode`. 3. Probably even more complicated, you could modify ContinuousFileReaderOperator to be a source function, with statically precomputed list of files/splits to process (they would have to be assigned/distributed taking parallelism into account). Thus your source functions could complete not when splits are generated, but when they have finished reading splits. Piotrek > On 14 May 2018, at 20:29, Tao Xia <t...@udacity.com> wrote: > > Thanks for the reply Piotr. Which jira ticket were you refer to? > We were trying to use the same code for normal stream process to process very > old historical backfill data. > The problem for me right now is that, backfill x years of data will be very > slow. And I cannot have any checkpoint during the whole time since FileSource > is "Finished". When anything goes wrong in the middle, the whole pipeline > will start over from beginning again. > Anyway I can skip the checkpoint of "Source: Custom File Source" but still > having checkpoint on "Split Reader: Custom File Source"? > Thanks, > Tao > > On Fri, May 11, 2018 at 4:34 AM, Piotr Nowojski <pi...@data-artisans.com > <mailto:pi...@data-artisans.com>> wrote: > Hi, > > It’s not considered as a bug, only a missing not yet implemented feature > (check my previous responses for the Jira ticket). Generally speaking using > file input stream for DataStream programs is not very popular, thus this was > so far low on our priority list. > > Piotrek > > > On 10 May 2018, at 06:26, xiatao123 <t...@udacity.com > > <mailto:t...@udacity.com>> wrote: > > > > I ran into a similar issue. > > > > Since it is a "Custom File Source", the first source just listing > > folder/file path for all existing files. Next operator "Split Reader" will > > read the content of the file. > > "Custom File Source" went to "finished" state after first couple secs. > > That's way we got this error message "Custom File Source (1/1) is not being > > executed at the moment. Aborting checkpoint". Because the "Custom File > > Source" finished already. > > > > Is this by design? Although the "Custom File Source" finished in secs, the > > rest of the pipeline can running for hours or days. Whenever anything went > > wrong, the pipeline will restart and start to reading from the beginning > > again, since there is not any checkpoint. > > > > > > > > -- > > Sent from: > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> > >