Hi,

Could you post full job manager and task manager logs from startup until the 
first signs of the problem?

Thanks, Piotrek

> On 15 Jan 2018, at 11:21, Reza Samee <reza.sa...@gmail.com> wrote:
> 
> Thanks for response; 
> And sorry the passed time.
> 
> The JobManager & TaskManager logged ports are open!
> 
> 
> Is this log OK?
> 2018-01-15 13:40:03,455 INFO  
> org.apache.flink.runtime.webmonitor.JobManagerRetriever       - New leader 
> reachable under akka.tcp://flink@172.16.20.18:6123/user/jobmanager:null 
> <http://flink@172.16.20.18:6123/user/jobmanager:null>.
> 
> When I kill task-manger, the jobmanager logs:
> 2018-01-15 13:32:41,419 WARN  akka.remote.ReliableDeliverySupervisor          
>               - Association with remote system 
> [akka.tcp://flink@stage_dbq_1:45532] has failed, address is now gated for 
> [5000] ms. Reason: [Disassociated] 
> 
> But it will not decrement the number of available task-managers!
> and when I start my signle task-manager again, it logs:
> 
> 2018-01-15 13:32:52,753 INFO  
> org.apache.flink.runtime.instance.InstanceManager             - Registered 
> TaskManager at ??? (akka://flink/deadLetters) as 
> 626846ae27a833cb094eeeb047a6a72c. Current number of registered hosts is 2. 
> Current number of alive task slots is 40.
> 
> 
> On Wed, Jan 10, 2018 at 11:36 AM, Piotr Nowojski <pi...@data-artisans.com 
> <mailto:pi...@data-artisans.com>> wrote:
> Hi,
> 
> Search both job manager and task manager logs for ip address(es) and port(s) 
> that have timeouted. First of all make sure that nodes are visible to each 
> other using some simple ping. Afterwards please check that those timeouted 
> ports are opened and not blocked by some firewall (telnet).
> 
> You can search the documentation for the configuration parameters with “port” 
> in name:
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html>
> But note that many of them are random by default.
> 
> Piotrek
> 
>> On 9 Jan 2018, at 17:56, Reza Samee <reza.sa...@gmail.com 
>> <mailto:reza.sa...@gmail.com>> wrote:
>> 
>> 
>> I'm running a flink-cluster (a mini one with just one node); but the problem 
>> is that my TaskManager can't reach to my JobManager!
>> 
>> Here are logs from TaskManager
>> ...
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager 
>> <> (attempt 20, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager 
>> <> (attempt 21, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager 
>> <> (attempt 22, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager 
>> <> (attempt 23, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager 
>> <> (attempt 24, timeout: 30 seconds)
>> ...
>> 
>> My "JobManager UI" shows my TaskManager with this Path & ID: 
>> "akka://flink/deadLetters <>" ( in TaskManagers tab)
>> And I found these lines in my JobManger stdout:
>> 
>> Resource Manager associating with leading JobManager 
>> Actor[akka://flink/user/jobmanager#-275619168 <>] - leader session null
>> TaskManager ResourceID{resourceId='1132cbdaf2d8204e5e42e321e8592754'} has 
>> started.
>> Registered TaskManager at MY_PRIV_IP (akka://flink/deadLetters <>) as 
>> 7d9568445b4557a74d05a0771a08ad9c. Current number of registered hosts is 1. 
>> Current number of alive task slots is 20.
>> 
>> 
>> What's the meaning of these lines? Where should I look for the solution?
>> 
>> 
>> 
>> 
>> -- 
>> رضا سامعی / http://samee.blog.ir <http://samee.blog.ir/>
> 
> 
> 
> -- 
> رضا سامعی / http://samee.blog.ir <http://samee.blog.ir/>

Reply via email to