subject:"Running Flink Dataset jobs Sequentially"

Re: Running Flink Dataset jobs Sequentially

2021-07-14 Thread Ken Krugler

Hi Jason, Yes, I write the files inside of the mapPartition function. Note that you can get multiple key groups inside of one partition, so you have to manage your own map from the key group to the writer. The Flink DAG ends with a DiscardingSink, after the mapPartition. And no, we didn’t noti

Re: Running Flink Dataset jobs Sequentially

2021-07-09 Thread Ken Krugler

FWIW I had to do something similar in the past. My solution was to… 1. Create a custom reader that added the source directory to the input data (so I had a Tuple2 2. Create a job that reads from all source directories, using HadoopInputFormat for text 3. Constrain the parallelism of this initial