回复：Dataset and eager scheduling

Zhijiang(wangzhijiang999) Thu, 02 Mar 2017 23:41:40 -0800

Hi,
    From my understand,  if you do not care resource waste and confirm there 
are enough resources in cluster, you can set EAGER schedule mode for batch job.
    From optimizer aspect, if not set the PIPELINED_FORCED hint for 
ExecutionMode, for some special topology cases, the optimizer would consider 
BATCH DataExchangeMode to avoid dead lock risk. That means the producer tasks 
should first deploy and output the data. After the producer tasks finish, the 
consumer tasks will be scheduled and start to consume data.And it is exactly 
the case of FROM_SOURCE schedule mode. For this case, if use EAGER mode for 
replacement, the consumer task may be do nothing after startup until the 
producer tasks finish, so it wastes resources.  But for PIPELINED 
DataExchangeMode, EAGER schedule mode can make sense because the consumer task 
can request data once the producer task ouput the first data.
    Maybe my understanding is not very accurate, welcome any discuss!


Cheers,
zhijiang
------------------------------------------------------------------发件人：CPC 
<acha...@gmail.com>发送时间：2017年3月2日(星期四) 18:52收件人：dev <dev@flink.apache.org>主　
题：Dataset and eager scheduling
Hi all,

Currently our team trying implement a runtime operator also playing with
scheduler. We are trying to understand batch optimizer but it will take
some time. What we want to know is whether changing batch scheduling mode
from LAZY_FROM_SOURCES to EAGER could affect optimizer? I mean whether
optimizer have some strong assumptions that batch jobs scheduling mode is
always lazy_from_sources?

Thanks...

回复：Dataset and eager scheduling

Reply via email to