[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler updated FLINK-8899: Fix Version/s: (was: 1.6.5) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.7.3 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated FLINK-8899: --- Fix Version/s: (was: 1.6.4) 1.6.5 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.7.3, 1.8.0, 1.6.5 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai updated FLINK-8899: --- Fix Version/s: (was: 1.7.2) 1.7.3 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.6.4, 1.7.3, 1.8.0 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated FLINK-8899: Fix Version/s: (was: 1.5.6) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.6.4, 1.7.2, 1.8.0 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai updated FLINK-8899: --- Fix Version/s: (was: 1.6.3) 1.6.4 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.5.6, 1.6.4, 1.7.2, 1.8.0 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: (was: 1.7.0) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.5.6, 1.6.3, 1.8.0, 1.7.1 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: 1.7.1 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.5.6, 1.6.3, 1.7.0, 1.8.0, 1.7.1 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: 1.8.0 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.5.6, 1.6.3, 1.7.0, 1.8.0 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: 1.6.2 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: (was: 1.6.1) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: (was: 1.5.4) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.6.1, 1.7.0, 1.5.5 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: 1.5.5 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.6.1, 1.7.0, 1.5.4, 1.5.5 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: (was: 1.6.0) 1.7.0 1.6.1 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Fix Version/s: 1.5.1 1.6.0 > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > Fix For: 1.6.0, 1.5.1 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8899: - Priority: Major (was: Minor) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Major > Labels: flip-6 > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski updated FLINK-8899: -- Priority: Minor (was: Blocker) Fix Version/s: (was: 1.5.0) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Assignee: Piotr Nowojski >Priority: Minor > Labels: flip-6 > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >
[jira] [Updated] (FLINK-8899) Submitting YARN job with FLIP-6 may lead to ApplicationAttemptNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski updated FLINK-8899: -- Affects Version/s: (was: 1.6.0) > Submitting YARN job with FLIP-6 may lead to > ApplicationAttemptNotFoundException > --- > > Key: FLINK-8899 > URL: https://issues.apache.org/jira/browse/FLINK-8899 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > Occasionally, running a simple word count as this > {code} > ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c > org.apache.flink.streaming.examples.wordcount.WordCount > ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING > {code} > leads to an {{ApplicationAttemptNotFoundException}} in the logs: > {code} > 2018-03-08 16:18:08,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Streaming > WordCount (df707a3c9817ddf5936efe56d427e2bd) switched from state RUNNING to > FINISHED. > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping > checkpoint coordinator for job df707a3c9817ddf5936efe56d427e2bd > 2018-03-08 16:18:08,508 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - > Shutting down > 2018-03-08 16:18:08,536 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job > df707a3c9817ddf5936efe56d427e2bd reached globally terminal state FINISHED. > 2018-03-08 16:18:08,611 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Stopping the JobMaster for job Streaming > WordCount(df707a3c9817ddf5936efe56d427e2bd). > 2018-03-08 16:18:08,634 INFO org.apache.flink.runtime.jobmaster.JobMaster > - Close ResourceManager connection > dcfdc329d61aae0ace2de26292c8916b: JobManager is shutting down.. > 2018-03-08 16:18:08,634 INFO org.apache.flink.yarn.YarnResourceManager > - Disconnect job manager > 0...@akka.tcp://fl...@ip-172-31-2-0.eu-west-1.compute.internal:38555/user/jobmanager_0 > for job df707a3c9817ddf5936efe56d427e2bd from the resource manager. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping > SlotPool. > 2018-03-08 16:18:08,664 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunner - > JobManagerRunner already shutdown. > 2018-03-08 16:18:09,650 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager adc8090bdb3f7052943ff86bde7d2a7b at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - Replacing old instance of worker for ResourceID > container_1519984124671_0090_01_05 > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager adc8090bdb3f7052943ff86bde7d2a7b from the SlotManager. > 2018-03-08 16:18:09,654 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager b975dbd16e0fd59c1168d978490a4b76 at the SlotManager. > 2018-03-08 16:18:09,654 INFO org.apache.flink.yarn.YarnResourceManager > - The target with resource ID > container_1519984124671_0090_01_05 is already been monitored. > 2018-03-08 16:18:09,992 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Register > TaskManager 73c258a0dbad236501b8391971c330ba at the SlotManager. > 2018-03-08 16:18:10,000 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2018-03-08 16:18:10,028 ERROR > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Exception > on heartbeat > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1519984124671_0090_01 doesn't exist in > ApplicationMasterService cache. > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:403) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at >