Hi Song, Guo,

Thanks for the information.
I will first upgrade our flink cluster to 1.10.0 and try again.
Currently, we are encountering some dependency conflict issue, possibly
with tranquility. But that is another issue.

For your information, (also as I described in the previous email)
*What Flink deployment are you using? (Standalone/K8s/Yarn/Mesos): *we are
running a standalone cluster with version 1.9.0.
*How many times have you tried with and without
`cluster.evenly-spread-out-slots`? *Almost all the time. This is the first
time we tried it. The behavior before we changed the config, is that a
great amount of subtasks of the source (11 subtasks) were allocated in one
task manager, and the rest of the subtasks for that source we were spread
unevenly to all rest task managers. After changing the configuration, the
subtasks of this source took all the slots on 4 of our task managers, which
was more "skewed" than before.
*How many TMs do you have? And how many slots does each TM has? *We have 15
task manager with 15 slots on each.

I will try to reproduce this tomorrow(JST) when I have time.

Best regards,
Mu

On Wed, Jul 8, 2020 at 11:01 AM Xintong Song <tonysong...@gmail.com> wrote:

> Hi Mu,
> Regarding your questions.
>
>    - The feature `spread out tasks evenly across task managers` is
>    introduced in Flink 1.10.0, and backported to Flink 1.9.2, per the JIRA
>    ticket [1]. That means if you configure this option in Flink 1.9.0, it
>    should not take any effect.
>    - Please be aware that this feature ATM only works for standalone
>    deployment (including standalone Kubernetes deployment). For the native
>    Kubernetes, Yarn and Mesos deployment, it is a known issue that this
>    feature does not work as expected.
>    - Regarding the scheduling behavior changes, we would need more
>    information to explain this. To provide the information needed, the easiest
>    way is probably to provide the jobmanager log files, if you're okay with
>    sharing them. If you cannot share the logs, then it would be better to
>    answer the following questions
>       - What Flink deployment are you using? (Standalone/K8s/Yarn/Mesos)
>       - How many times have you tried with and without
>       `cluster.evenly-spread-out-slots`? In other words, the described 
> behaviors
>       before and after setting `cluster.evenly-spread-out-slots`, can they be
>       stably reproduced?
>       - How many TMs do you have? And how many slots does each TM has?
>
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12122
>
> On Tue, Jul 7, 2020 at 8:33 PM Mu Kong <kong.mu....@gmail.com> wrote:
>
>> Hi, Guo,
>>
>> Thanks for helping out.
>>
>> My application has a kafka source with 60 subtasks(parallelism), and we
>> have 15 task managers with 15 slots on each.
>>
>> *Before I applied the cluster.evenly-spread-out-slots,* meaning it is
>> set to default false, the operator 'kafka source" has 11 subtasks allocated
>> in one single task manager,
>> while the remaining 49 subtasks of "kafka source" distributed to the
>> remaining 14 task managers.
>>
>> *After I set cluster.evenly-spread-out-slots to true*, the 60 subtasks
>> of "kafka source" were allocated to only 4 task managers, and they took 15
>> slots on each of these 4 TMs.
>>
>> What I thought is that this config will make the subtasks of one operator
>> more evenly spread among the task managers, but it seems it made them
>> allocated in the same task manager as much as possible.
>>
>> The version I'm deploying is 1.9.0.
>>
>> Best regards,
>> Mu
>>
>> On Tue, Jul 7, 2020 at 7:10 PM Yangze Guo <karma...@gmail.com> wrote:
>>
>>> Hi, Mu,
>>>
>>> IIUC, cluster.evenly-spread-out-slots would fulfill your demand. Why
>>> do you think it does the opposite of what you want. Do you run your
>>> job in active mode? If so, cluster.evenly-spread-out-slots might not
>>> work very well because there could be insufficient task managers when
>>> request slot from ResourceManager. This has been discussed in
>>> https://issues.apache.org/jira/browse/FLINK-12122 .
>>>
>>>
>>> Best,
>>> Yangze Guo
>>>
>>> On Tue, Jul 7, 2020 at 5:44 PM Mu Kong <kong.mu....@gmail.com> wrote:
>>> >
>>> > Hi community,
>>> >
>>> > I'm running an application to consume data from kafka, and process it
>>> then put data to the druid.
>>> > I wonder if there is a way where I can allocate the data source
>>> consuming process evenly across the task manager to maximize the usage of
>>> the network of task managers.
>>> >
>>> > So, for example, I have 15 task managers and I set parallelism for the
>>> kafka source as 60, since I have 60 partitions in kafka topic.
>>> > What I want is flink cluster will put 4 kafka source subtasks on each
>>> task manager.
>>> >
>>> > Is that possible? I have gone through the document, the only thing we
>>> found is
>>> >
>>> > cluster.evenly-spread-out-slots
>>> >
>>> > which does exact the opposite of what I want. It will put the subtasks
>>> of the same operator onto one task manager as much as possible.
>>> >
>>> > So, is some kind of manual resource allocation available?
>>> > Thanks in advance!
>>> >
>>> >
>>> > Best regards,
>>> > Mu
>>>
>>

Reply via email to