Hi Jason,
Yes, I write the files inside of the mapPartition function. Note that you can
get multiple key groups inside of one partition, so you have to manage your own
map from the key group to the writer.
The Flink DAG ends with a DiscardingSink, after the mapPartition.
And no, we didn’t noti
FWIW I had to do something similar in the past. My solution was to…
1. Create a custom reader that added the source directory to the input data (so
I had a Tuple2
2. Create a job that reads from all source directories, using HadoopInputFormat
for text
3. Constrain the parallelism of this initial