Hi Team,

We have Flink jobs running in cluster mode. When I cancel the job and check
for status it still shows as running.

Below is logs generated.  I do not see any useful information. Could you
guys point me in a right direction to debug this , our flink jobs do heavy
processing , my understanding is on cancelling flink waits till operator
completes processing to store save point. So is it happening because one of
the operator is hung waiting for resource and is there a way to configure
timeout.


Appreciate your help.



10 Jan 2019 20:36:51,490 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code Source: control -> control-filter
(1/1) (c6f50328bb09ce1ad389e5ea614fc8a9).
10 Jan 2019 20:36:51,490 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task interaction-details (1/1)
(dc946a863e34f66a95df3a8e8d540cab).
10 Jan 2019 20:36:51,490 INFO org.apache.flink.runtime.taskmanager.Task -
interaction-details (1/1) (dc946a863e34f66a95df3a8e8d540cab) switched from
RUNNING to CANCELING.
10 Jan 2019 20:36:51,491 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code interaction-details (1/1)
(dc946a863e34f66a95df3a8e8d540cab).
10 Jan 2019 20:36:51,491 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task current-longests (1/1)
(2f1cee63ab479a8e9e492c990311ccc1).
10 Jan 2019 20:36:51,491 INFO org.apache.flink.runtime.taskmanager.Task -
current-longests (1/1) (2f1cee63ab479a8e9e492c990311ccc1) switched from
RUNNING to CANCELING.
10 Jan 2019 20:36:51,491 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code current-longests (1/1)
(2f1cee63ab479a8e9e492c990311ccc1).
10 Jan 2019 20:36:51,492 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task interactions-count (1/1)
(aff734ce398978cdc3ac40a1e530d0ed).
10 Jan 2019 20:36:51,492 INFO org.apache.flink.runtime.taskmanager.Task -
interactions-count (1/1) (aff734ce398978cdc3ac40a1e530d0ed) switched from
RUNNING to CANCELING.
10 Jan 2019 20:36:51,628 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code interactions-count (1/1)
(aff734ce398978cdc3ac40a1e530d0ed).
10 Jan 2019 20:36:51,680 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task agents-working-on-interactions (1/1)
(15aa5e81a66a46127ab85719b9df7279).
10 Jan 2019 20:36:51,680 INFO org.apache.flink.runtime.taskmanager.Task -
agents-working-on-interactions (1/1) (15aa5e81a66a46127ab85719b9df7279)
switched from RUNNING to CANCELING.
10 Jan 2019 20:36:51,680 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code agents-working-on-interactions (1/1)
(15aa5e81a66a46127ab85719b9df7279).
10 Jan 2019 20:36:51,811 ERROR
org.apache.flink.streaming.runtime.tasks.StreamTask - Could not shut down
timer service
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2067)
        at
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
        at
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService.shutdownAndAwaitPending(SystemProcessingTimeService.java:197)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:317)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
        at java.lang.Thread.run(Thread.java:748)
10 Jan 2019 20:36:51,811 ERROR
org.apache.flink.streaming.runtime.tasks.StreamTask - Could not shut down
timer service
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2067)
        at
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
        at
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService.shutdownAndAwaitPending(SystemProcessingTimeService.java:197)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:317)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
        at java.lang.Thread.run(Thread.java:748)
10 Jan 2019 20:36:51,813 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task subscription-management (1/1)
(f5e529744e645e6acf00d2c5aab2006e).
10 Jan 2019 20:37:07,156 INFO org.apache.flink.runtime.taskmanager.Task -
subscription-management (1/1) (f5e529744e645e6acf00d2c5aab2006e) switched
from RUNNING to CANCELING.
10 Jan 2019 20:37:07,157 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code subscription-management (1/1)
(f5e529744e645e6acf00d2c5aab2006e).
10 Jan 2019 20:37:07,164 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task filter-business-metrics -> Sink: data_feed (1/1)
(6c229624ea4735215fe98f5458c58c8e).
10 Jan 2019 20:37:07,164 INFO org.apache.flink.runtime.taskmanager.Task -
filter-business-metrics -> Sink: data_feed (1/1)
(6c229624ea4735215fe98f5458c58c8e) switched from RUNNING to CANCELING.
10 Jan 2019 20:37:07,164 INFO org.apache.flink.runtime.taskmanager.Task -
Triggering cancellation of task code filter-business-metrics -> Sink:
data_feed (1/1) (6c229624ea4735215fe98f5458c58c8e).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task interaction-details (1/1)
(dc946a863e34f66a95df3a8e8d540cab).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Task interaction-details (1/1) is already in state CANCELING
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task current-longests (1/1)
(2f1cee63ab479a8e9e492c990311ccc1).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Task current-longests (1/1) is already in state CANCELING
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task interactions-count (1/1)
(aff734ce398978cdc3ac40a1e530d0ed).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Task interactions-count (1/1) is already in state CANCELING
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task agents-working-on-interactions (1/1)
(15aa5e81a66a46127ab85719b9df7279).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Task agents-working-on-interactions (1/1) is already in state CANCELING
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task subscription-management (1/1)
(f5e529744e645e6acf00d2c5aab2006e).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Task subscription-management (1/1) is already in state CANCELING
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Attempting to cancel task filter-business-metrics -> Sink: data_feed (1/1)
(6c229624ea4735215fe98f5458c58c8e).
10 Jan 2019 20:37:07,182 INFO org.apache.flink.runtime.taskmanager.Task -
Task filter-business-metrics -> Sink: data_feed (1/1) is already in state
CANCELING





Regards,
Mahantesh Patil.

Reply via email to