hello 这个报错看上去并不是状态不兼容的报错。 我看代码 Sink 算子设置了uid 理论上是可以正确恢复的。
kong <[email protected]> 于2021年10月21日周四 上午10:26写道: > hi,我遇到flink修改sink并行度后,无法从checkpoint restore问题 > > > flink 版本: 1.13.1 > flink on yarn > DataStream api方式写的java job > > > 试验1:不修改任何代码,cancel job后,能从指定的checkpoint恢复 > dataStream.addSink(new Sink(config)).name("xxxx").uid("xxxx"); > 试验2:只修改sink端的并行度,job无法启动,一直是Initiating状态 > dataStream.addSink(new > Sink(config)).name("xxxx").uid("xxxx").setParallelism(2); > > > 日志异常 > 2021-10-21 09:52:57,076 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager at > akka://flink/user/rpc/resourcemanager_0 . > 2021-10-21 09:52:57,132 INFO > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Starting DefaultLeaderElectionService with > ZooKeeperLeaderElectionDriver{leaderPath='/leader/dispatcher_lock'}. > 2021-10-21 09:52:57,133 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Starting the resource manager. > 2021-10-21 09:52:57,134 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with > ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/resource_manager_lock'}. > 2021-10-21 09:52:57,135 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with > ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/dispatcher_lock'}. > 2021-10-21 09:52:57,142 INFO > org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - > Start JobDispatcherLeaderProcess. > 2021-10-21 09:52:57,151 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.dispatcher.MiniDispatcher at > akka://flink/user/rpc/dispatcher_1 . > 2021-10-21 09:52:57,228 INFO > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Starting DefaultLeaderElectionService with > ZooKeeperLeaderElectionDriver{leaderPath='/leader/57645e97919d2efebfab67e2846696e7/job_manager_lock'}. > 2021-10-21 09:52:57,312 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://[email protected]:42535] has failed, address is now gated > for [50] ms. Reason: [Association failed with > [akka.tcp://[email protected]:42535]] > Caused by: [java.net.ConnectException: Connection refused: > node2.hadoop/xx.xx.xx.xx:42535] > 2021-10-21 09:52:57,313 WARN akka.remote.transport.netty.NettyTransport > [] - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: > node2.hadoop/xx.xx.xx.xx:42535 > 2021-10-21 09:52:57,352 INFO > org.apache.flink.yarn.YarnResourceManagerDriver [] - Recovered > 0 containers from previous attempts ([]). > 2021-10-21 09:52:57,353 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Recovered 0 workers from previous attempt. > 2021-10-21 09:52:57,379 INFO org.apache.hadoop.conf.Configuration > [] - resource-types.xml not found > 2021-10-21 09:52:57,379 INFO > org.apache.hadoop.yarn.util.resource.ResourceUtils [] - Unable to > find 'resource-types.xml'. > 2021-10-21 09:52:57,394 INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] > 2021-10-21 09:52:57,395 WARN akka.remote.transport.netty.NettyTransport > [] - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: > node2.hadoop/xx.xx.xx.xx:42535 > 2021-10-21 09:52:57,396 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://[email protected]:42535] has failed, address is now gated > for [50] ms. Reason: [Association failed with > [akka.tcp://[email protected]:42535]] > Caused by: [java.net.ConnectException: Connection refused: > node2.hadoop/xx.xx.xx.xx:42535] > 2021-10-21 09:52:57,400 INFO > org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl [] - Upper > bound of the thread pool size is 500 > 2021-10-21 09:52:57,403 INFO > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Starting DefaultLeaderElectionService with > ZooKeeperLeaderElectionDriver{leaderPath='/leader/resource_manager_lock'}. > 2021-10-21 09:52:57,407 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > ResourceManager akka.tcp://[email protected]:43978/user/rpc/resourcemanager_0 > was granted leadership with fencing token ae27185bb3fda634d2c109510ab54ba4 > > > > > >
