FileSource Usage

Meghajit Mazumdar Wed, 19 Jan 2022 02:06:02 -0800

Hello,

We are using FileSource
<https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/> to
process Parquet Files and had a few doubts around it. Would really
appreciate if somebody can help answer them:


1. For a given file, does FileSource read the contents inside it in order ?
In other words, what is the order in which the file splits are generated
from the contents of the file ?

2. We want to provide a GCS Bucket URL to the FileSource so that it can
read parquet files from there. The bucket has multiple parquet files.
Wanted to know, what is the order in which the files will be picked and
processed by this FileSource ? Can we provide an order strategy ourselves,
say, process according to creation time ?

3. Is it possible/good practice to apply checkpointing and watermarking for
a bounded source like FileSource ?

-- 
*Regards,*
*Meghajit*

FileSource Usage

Reply via email to