Hi Navneeth First of all, I suggest to upgrade Flink version to latest version. And you could refer here [1] for the savepoint compatibility when upgrading Flink.
For the problem that cannot connect address, you could login your pod and run 'nslookup jobmanager' to see whether the host could be resolved. You can also check the service of 'jobmanager' whether work as expected via 'kubectl get svc' . [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#compatibility-table Best Yun Tang ________________________________ From: Navneeth Krishnan <[email protected]> Sent: Friday, August 28, 2020 17:00 To: user <[email protected]> Subject: Flink Migration Hi All, We are currently on a very old version of flink 1.4.0 and it has worked pretty well. But lately we have been facing checkpoint timeout issues. We would like to minimize any changes to the current pipelines and go ahead with the migration. With that said our first pick was to migrate to 1.5.6 and later migrate to a newer version. Do you guys think a more recent version like 1.6 or 1.7 might work? We did try 1.8 but it requires some changes in the pipelines. When we tried 1.5.6 with docker compose we were unable to get the task manager attached to jobmanager. Are there some specific configurations required for newer versions? Logs: 8-28 07:36:30.834 [main] INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics 2020-08-28 07:36:30.853 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Retrieved new target address jobmanager/172.21.0.8:6123<http://172.21.0.8:6123>. 2020-08-28 07:36:31.279 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/172.21.0.8:6123<http://172.21.0.8:6123> 2020-08-28 07:36:31.280 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'e6f9104cdc61/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.281 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.281 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.282 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1<http://127.0.0.1>': Invalid argument (connect failed) 2020-08-28 07:36:31.283 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.284 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1<http://127.0.0.1>': Invalid argument (connect failed) 2020-08-28 07:36:31.684 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/172.21.0.8:6123<http://172.21.0.8:6123> 2020-08-28 07:36:31.686 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'e6f9104cdc61/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.687 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.688 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.688 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1<http://127.0.0.1>': Invalid argument (connect failed) 2020-08-28 07:36:31.689 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:31.690 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1<http://127.0.0.1>': Invalid argument (connect failed) 2020-08-28 07:36:32.490 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/172.21.0.8:6123<http://172.21.0.8:6123> 2020-08-28 07:36:32.491 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'e6f9104cdc61/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:32.493 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:32.494 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:32.495 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1<http://127.0.0.1>': Invalid argument (connect failed) 2020-08-28 07:36:32.496 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.21.0.9<http://172.21.0.9>': Connection refused (Connection refused) 2020-08-28 07:36:32.497 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1<http://127.0.0.1>': Invalid argument (connect failed) 2020-08-28 07:36:34.099 [main] INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/172.21.0.8:6123<http://172.21.0.8:6123> 2020-08-28 07:36:34.100 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager will use hostname/address 'e6f9104cdc61' (172.21.0.9) for communication. Flink Conf jobmanager.rpc.address: jobmanager rest.address: jobmanager Thanks
