Hi, From my understand, if you do not care resource waste and confirm there are enough resources in cluster, you can set EAGER schedule mode for batch job. From optimizer aspect, if not set the PIPELINED_FORCED hint for ExecutionMode, for some special topology cases, the optimizer would consider BATCH DataExchangeMode to avoid dead lock risk. That means the producer tasks should first deploy and output the data. After the producer tasks finish, the consumer tasks will be scheduled and start to consume data.And it is exactly the case of FROM_SOURCE schedule mode. For this case, if use EAGER mode for replacement, the consumer task may be do nothing after startup until the producer tasks finish, so it wastes resources. But for PIPELINED DataExchangeMode, EAGER schedule mode can make sense because the consumer task can request data once the producer task ouput the first data. Maybe my understanding is not very accurate, welcome any discuss!
Cheers, zhijiang ------------------------------------------------------------------发件人:CPC <acha...@gmail.com>发送时间:2017年3月2日(星期四) 18:52收件人:dev <dev@flink.apache.org>主 题:Dataset and eager scheduling Hi all, Currently our team trying implement a runtime operator also playing with scheduler. We are trying to understand batch optimizer but it will take some time. What we want to know is whether changing batch scheduling mode from LAZY_FROM_SOURCES to EAGER could affect optimizer? I mean whether optimizer have some strong assumptions that batch jobs scheduling mode is always lazy_from_sources? Thanks...