Hi. We have seen the same behaviour on Yarn. It turned out that the default settings for was not optimal. yarn.maximum-failed-containers: The maximum number of failed containers the ApplicationMaster accepts until it fails the YARN session. Default: The number of initially requested TaskManagers (-n). So try to lookup the configuration for your system. Next step is to investigate why the task manager is killed.
Med venlig hilsen / Best regards Lasse Nedergaard > Den 20. aug. 2018 kl. 16.34 skrev Dominik Wosiński <[email protected]>: > > Hey, > Can You please provide a little more information about your setup and maybe > logs showing when the crash occurs? > Best Regards, > Dominik > > 2018-08-20 16:23 GMT+02:00 Siew Wai Yow <[email protected]>: >> Hi, >> >> When one of the task manager is killed, the whole cluster die, is this >> something expected? We are using Flink 1.4. Thank you. >> >> Regards, >> Yow >
