Hi Mike, This can be because of the partitioning logic of the data.

If possible, can you share your pipeline code at a high level.

On Mon, Jun 10, 2019 at 12:58 PM Mike Kaplinskiy <[email protected]>
wrote:

>
> Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.
>
>
> On Mon, Jun 10, 2019 at 6:51 AM Maximilian Michels <[email protected]> wrote:
>
>> Hi Mike,
>>
>> If you set the number of shards to 1, you should get one shard per
>> window; unless you have "ignore windows" set to true.
>>
>
> Right, that makes sense, and what I expected. The thing that I find a
> little puzzling is that a single Flink sub-task receives all of the data -
> I'd expect the actual work to be spread across runners since each window is
> independent.
>
>
>>
>> > (The way I'm checking this is via the Flink UI)
>>
>> I'm curious, how do you check this via the Flink UI?
>>
>
> Attached a screenshot
>
>
>>
>> Cheers,
>> Max
>>
>> On 09.06.19 22:33, Mike Kaplinskiy wrote:
>> > Hi everyone,
>> >
>> > I’m using a Kafka source with a lot of watermark skew (i.e. new
>> > partitions were added to the topic over time). The sink is a
>> > FileIO.Write().withNumShards(1) to get ~ 1 file per day & an early
>> > trigger to write at most 40,000 records per file. Unfortunately it looks
>> > like there's only 1 shard trying to write files for all the various days
>> > instead of writing multiple days' files in parallel. (The way I'm
>> > checking this is via the Flink UI). Is there anything I could do here to
>> > parallelize the process? All of this is with the Flink runner.
>> >
>> > Mike.
>> >
>> > Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your
>> life.
>> >
>>
>>

Reply via email to