Hi, Andrey

Thanks for your response.

I have checked this Jira ticket and I think it can work in standalone mode 
which TaskManager has been started before scheduling tasks.
But we are currently running flink on yarn in per-job cluster mode.

I noticed that this issue has already been raised. I will keep watching this 
ticket. 

Thanks again.

Best
Weihua Hu

> 2020年1月15日 17:53,Andrey Zagrebin <azagre...@apache.org> 写道:
> 
> HI HuWeihua,
> 
> I think your issue should resolve with 1.9.2 and 1.10 (not released but in 
> progress).
> You can check the related Jira ticket [1].
> 
> Best,
> Andrey
> 
> [1] https://jira.apache.org/jira/browse/FLINK-12122 
> <https://jira.apache.org/jira/browse/FLINK-12122>
> On Wed, Jan 15, 2020 at 10:08 AM HuWeihua <huweihua....@gmail.com 
> <mailto:huweihua....@gmail.com>> wrote:
> Hi, All
> We encountered some problems during the upgrade from Flink 1.5 to Flink 1.9. 
> Flink's scheduling strategy has changed. Flink 1.9 prefers centralized 
> scheduling, while Flink 1.5 prefers decentralized scheduling. This change has 
> caused resources imbalance and blocked our upgrade plan. We have thousands of 
> jobs that need to be upgraded.
> 
> For example,
> There is a job with 10 sources and 100 sinks. Each source need 1 core and 
> each sink need 0.1 core.
> Try to run this job on Yarn, configure the numberOfTaskSlots is 10, 
> yarn.containers.vcores is 2.
> 
> When using Flink-1.5:
> Each TaskManager will run 1 source and 9 sinks, they need 1.9 cores totally. 
> So the job with this configuration works very well. The schedule results is 
> shown in Figure 1.
> When using Flink-1.9:
> The 10 sources will be scheduled to one TaskManager  and the 100 sinks will 
> scheduled to other 10 TaskManagers.  The schedule results is shown in Figure 
> 2.
> In this scenario, the TaskManager which run sources need 10 cores, other 
> TaskManagers need 1 cores. But TaskManager must be configured the same, So we 
> need 11 TaskManager with 10 cores. 
> This situation waste (10-2)*11 = 88 cores more than Flink 1.5.
> 
> In addition to the waste of resources, we also encountered other problems 
> caused by centralized scheduling strategy.
> Network bandwidth. Tasks of the same type are scheduled to the one 
> TaskManager, causing too much network traffic on the machine.
> Some jobs need to sink to the local agent. After centralized scheduling, the 
> insufficient processing capacity of the single machine causes a backlog of 
> consumption.
> 
> In summary, we think a decentralized scheduling strategy is necessary.
> 
> 
> Figure 1. Flink 1.5 schedule results
> <粘贴的图形-3.tiff>
> 
> Figure 2. Flink 1.9 schedule results
> <粘贴的图形-4.tiff>
> 
> 
> 
> Best
> Weihua Hu
> 

Reply via email to