Hi, Could you post full job manager and task manager logs from startup until the first signs of the problem?
Thanks, Piotrek > On 15 Jan 2018, at 11:21, Reza Samee <reza.sa...@gmail.com> wrote: > > Thanks for response; > And sorry the passed time. > > The JobManager & TaskManager logged ports are open! > > > Is this log OK? > 2018-01-15 13:40:03,455 INFO > org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader > reachable under akka.tcp://flink@172.16.20.18:6123/user/jobmanager:null > <http://flink@172.16.20.18:6123/user/jobmanager:null>. > > When I kill task-manger, the jobmanager logs: > 2018-01-15 13:32:41,419 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@stage_dbq_1:45532] has failed, address is now gated for > [5000] ms. Reason: [Disassociated] > > But it will not decrement the number of available task-managers! > and when I start my signle task-manager again, it logs: > > 2018-01-15 13:32:52,753 INFO > org.apache.flink.runtime.instance.InstanceManager - Registered > TaskManager at ??? (akka://flink/deadLetters) as > 626846ae27a833cb094eeeb047a6a72c. Current number of registered hosts is 2. > Current number of alive task slots is 40. > > > On Wed, Jan 10, 2018 at 11:36 AM, Piotr Nowojski <pi...@data-artisans.com > <mailto:pi...@data-artisans.com>> wrote: > Hi, > > Search both job manager and task manager logs for ip address(es) and port(s) > that have timeouted. First of all make sure that nodes are visible to each > other using some simple ping. Afterwards please check that those timeouted > ports are opened and not blocked by some firewall (telnet). > > You can search the documentation for the configuration parameters with “port” > in name: > https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html > <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html> > But note that many of them are random by default. > > Piotrek > >> On 9 Jan 2018, at 17:56, Reza Samee <reza.sa...@gmail.com >> <mailto:reza.sa...@gmail.com>> wrote: >> >> >> I'm running a flink-cluster (a mini one with just one node); but the problem >> is that my TaskManager can't reach to my JobManager! >> >> Here are logs from TaskManager >> ... >> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager >> <> (attempt 20, timeout: 30 seconds) >> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager >> <> (attempt 21, timeout: 30 seconds) >> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager >> <> (attempt 22, timeout: 30 seconds) >> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager >> <> (attempt 23, timeout: 30 seconds) >> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager >> <> (attempt 24, timeout: 30 seconds) >> ... >> >> My "JobManager UI" shows my TaskManager with this Path & ID: >> "akka://flink/deadLetters <>" ( in TaskManagers tab) >> And I found these lines in my JobManger stdout: >> >> Resource Manager associating with leading JobManager >> Actor[akka://flink/user/jobmanager#-275619168 <>] - leader session null >> TaskManager ResourceID{resourceId='1132cbdaf2d8204e5e42e321e8592754'} has >> started. >> Registered TaskManager at MY_PRIV_IP (akka://flink/deadLetters <>) as >> 7d9568445b4557a74d05a0771a08ad9c. Current number of registered hosts is 1. >> Current number of alive task slots is 20. >> >> >> What's the meaning of these lines? Where should I look for the solution? >> >> >> >> >> -- >> رضا سامعی / http://samee.blog.ir <http://samee.blog.ir/> > > > > -- > رضا سامعی / http://samee.blog.ir <http://samee.blog.ir/>