Hi,

Let me also investigate that? Did you observe this in 1.3.2 and not in 1.3.0 
and/or 1.3.1 or did you directly go from 1.2.x to 1.3.2?

Best,
Aljoscha

> On 10. Aug 2017, at 13:31, Gyula Fóra <gyula.f...@gmail.com> wrote:
> 
> Hi!
> In some cases it seems to take a long time for the job to start the zookeeper 
> based job recovery after recovering from a JM failure.
> Looking at the logs there is a 2 minute gap between the last recovered TM was 
> started successfully and the job recovery:
> 
> 2017-08-10 13:14:06,369 INFO  
> org.apache.flink.runtime.instance.InstanceManager             - Registered 
> TaskManager at splat31 
> (akka.tcp://fl...@splat31.sto.midasplayer.com:41821/user/taskmanager 
> <http://fl...@splat31.sto.midasplayer.com:41821/user/taskmanager>) as 
> 2de2376edd4670b48600f29df9e821a1. Current number of registered hosts is 15. 
> Current number of alive task slots is 120.
> 2017-08-10 13:14:08,198 INFO  org.apache.flink.yarn.YarnFlinkResourceManager  
>               - TaskManager container_e22_1495032222473_0939_02_000002 has 
> started.
> 2017-08-10 13:14:08,198 INFO  
> org.apache.flink.runtime.instance.InstanceManager             - Registered 
> TaskManager at splat19 
> (akka.tcp://fl...@splat19.sto.midasplayer.com:36377/user/taskmanager 
> <http://fl...@splat19.sto.midasplayer.com:36377/user/taskmanager>) as 
> 06ed7098060f4d539150c1c26a32ac07. Current number of registered hosts is 16. 
> Current number of alive task slots is 128.
> 2017-08-10 13:14:56,710 INFO  kafka.client.ClientUtils$                       
>               - Fetching metadata from broker 
> id:0,host:kafka10.sto.midasplayer.com 
> <http://kafka10.sto.midasplayer.com/>,port:9092 with correlation id 0 for 1 
> topic(s) Set(meter.rbea.log)
> 2017-08-10 13:14:56,714 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka10.sto.midasplayer.com:9092 
> <http://kafka10.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,734 INFO  kafka.producer.SyncProducer                     
>               - Disconnecting from kafka10.sto.midasplayer.com:9092 
> <http://kafka10.sto.midasplayer.com:9092/>
> 2017-08-10 13:14:56,780 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka10.sto.midasplayer.com:9092 
> <http://kafka10.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,791 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka21.sto.midasplayer.com:9092 
> <http://kafka21.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,796 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka11.sto.midasplayer.com:9092 
> <http://kafka11.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,800 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka09.sto.midasplayer.com:9092 
> <http://kafka09.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,806 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka23.sto.midasplayer.com:9092 
> <http://kafka23.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,826 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka22.sto.midasplayer.com:9092 
> <http://kafka22.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,830 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka08.sto.midasplayer.com:9092 
> <http://kafka08.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,845 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka20.sto.midasplayer.com:9092 
> <http://kafka20.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,852 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka15.sto.midasplayer.com:9092 
> <http://kafka15.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,857 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka17.sto.midasplayer.com:9092 
> <http://kafka17.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:14:56,861 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka18.sto.midasplayer.com:9092 
> <http://kafka18.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:15:56,814 INFO  org.apache.flink.yarn.YarnJobManager            
>               - Attempting to recover all jobs.
> 2017-08-10 13:15:56,820 INFO  org.apache.flink.yarn.YarnJobManager            
>               - There are 1 jobs to recover. Starting the job recovery.
> 2017-08-10 13:15:56,821 INFO  org.apache.flink.yarn.YarnJobManager            
>               - Attempting to recover job fea2eb82ef98849be2a8ac1731379193.
> 2017-08-10 13:15:56,886 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka13.sto.midasplayer.com:9092 
> <http://kafka13.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:15:56,889 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka14.sto.midasplayer.com:9092 
> <http://kafka14.sto.midasplayer.com:9092/> for producing
> 2017-08-10 13:15:56,943 INFO  kafka.producer.SyncProducer                     
>               - Connected to kafka12.sto.midasplayer.com:9092 
> <http://kafka12.sto.midasplayer.com:9092/> for producing
> Disregard the kafka logs, they belong to a metrics reporter.
> 
> Any idea what might cause the delay here? What is happenning in the 
> background?
> Thanks,
> Gyula

Reply via email to