Hello,
After running a successful batch job on Flink 1.10.1 with Beam 2.22.0 I was
expecting to see the job in the "completed" section of the Flink
webinterface. That was not the case, the following log of the
taskmanager at DEBUG level shows that something within the shutdown of the
job might went off - it looks like this is a pure Flink issue to me:
2020-07-23 04:33:15,992 DEBUG org.apache.beam.sdk.io.cassandra.CassandraIO
- Waiting for a batch of 100 Cassandra writes to be
executed...
2020-07-23 04:33:16,001 DEBUG org.apache.beam.sdk.io.cassandra.CassandraIO
- Waiting for a batch of 100 Cassandra writes to be
executed...
2020-07-23 04:33:16,010 DEBUG org.apache.beam.sdk.io.cassandra.CassandraIO
- Waiting for a batch of 100 Cassandra writes to be
executed...
2020-07-23 04:33:16,018 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartition -
ReleaseOnConsumptionResultPartition
978f00b319eb407349b22999d5930c1a@a66baf0819fc88d57b3031111927c170
[PIPELINED, 1 subpartitions, 0 pending consumptions]: Received consumed
notification for subpartition 0.
2020-07-23 04:33:16,018 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartitionManager -
Received consume notification from ReleaseOnConsumptionResultPartition
978f00b319eb407349b22999d5930c1a@a66baf0819fc88d57b3031111927c170
[PIPELINED, 1 subpartitions, 0 pending consumptions].
2020-07-23 04:33:16,018 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartition - FlatMap
(FlatMap at
BigQueryIO.TypedRead/PassThroughThenCleanup/ParMultiDo(Identity).out0)
(1/1) (a66baf0819fc88d57b3031111927c170): Releasing
ReleaseOnConsumptionResultPartition
978f00b319eb407349b22999d5930c1a@a66baf0819fc88d57b3031111927c170
[PIPELINED, 1 subpartitions, 0 pending consumptions].
2020-07-23 04:33:16,018 DEBUG
org.apache.flink.runtime.io.network.partition.PipelinedSubpartition -
FlatMap (FlatMap at
BigQueryIO.TypedRead/PassThroughThenCleanup/ParMultiDo(Identity).out0)
(1/1) (a66baf0819fc88d57b3031111927c170): Released PipelinedSubpartition#0
[number of buffers: 10653067 (349079658869 bytes), number of buffers in
backlog: 0, finished? true, read view? false].
2020-07-23 04:33:16,018 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartitionManager -
Released partition 978f00b319eb407349b22999d5930c1a produced by
a66baf0819fc88d57b3031111927c170.
2020-07-23 04:33:16,022 DEBUG com.datastax.driver.core.Connection
-
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.67.5:9042-1,
inFlight=0, closed=true] closing connection
2020-07-23 04:33:16,022 DEBUG com.datastax.driver.core.Host.STATES
- [cluster1-dc1-service.cass-operator.svc.cluster.local/
10.70.67.5:9042]
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.67.5:9042-1,
inFlight=0, closed=true] closed, remaining = 0
2020-07-23 04:33:16,023 DEBUG com.datastax.driver.core.Connection
-
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.69.3:9042-1,
inFlight=0, closed=true] closing connection
2020-07-23 04:33:16,023 DEBUG com.datastax.driver.core.Host.STATES
- [cluster1-dc1-service.cass-operator.svc.cluster.local/
10.70.69.3:9042]
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.69.3:9042-1,
inFlight=0, closed=true] closed, remaining = 0
2020-07-23 04:33:16,027 DEBUG com.datastax.driver.core.Connection
-
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.68.3:9042-2,
inFlight=0, closed=true] closing connection
2020-07-23 04:33:16,027 DEBUG com.datastax.driver.core.Host.STATES
- [cluster1-dc1-service.cass-operator.svc.cluster.local/
10.70.68.3:9042]
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.68.3:9042-2,
inFlight=0, closed=true] closed, remaining = 1
2020-07-23 04:33:16,036 DEBUG com.datastax.driver.core.Cluster
- Shutting down
2020-07-23 04:33:16,037 DEBUG com.datastax.driver.core.Connection
-
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.68.3:9042-1,
inFlight=0, closed=true] closing connection
2020-07-23 04:33:16,037 DEBUG com.datastax.driver.core.Host.STATES
- [cluster1-dc1-service.cass-operator.svc.cluster.local/
10.70.68.3:9042]
Connection[cluster1-dc1-service.cass-operator.svc.cluster.local/10.70.68.3:9042-1,
inFlight=0, closed=true] closed, remaining = 0
2020-07-23 04:33:16,706 DEBUG com.datastax.driver.core.RequestHandler
- onTimeout triggered but the response was completed by
another thread, cancelling (retryCount = 0, queryState =
QueryState(count=0, inProgress=false, cancelled=false), queryStateRef =
QueryState(count=0, inProgress=false, cancelled=false))
2020-07-23 04:33:17,404 DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received
heartbeat request from 71a048c9dccd2470373e40752eacd515.
2020-07-23 04:33:17,524 DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received
heartbeat request from ccbcdcf9b664f13ebdf97a20848bad64.
2020-07-23 04:33:18,254 DEBUG io.netty.buffer.PoolThreadCache
- Freed 113 thread-local buffer(s) from thread:
cluster1-nio-worker-0
2020-07-23 04:33:18,255 DEBUG io.netty.buffer.PoolThreadCache
- Freed 141 thread-local buffer(s) from thread:
cluster1-nio-worker-1
2020-07-23 04:33:18,260 DEBUG org.apache.flink.runtime.operators.BatchTask
- Finished task code: FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
2020-07-23 04:33:18,260 DEBUG org.apache.flink.runtime.operators.BatchTask
- Releasing all broadcast variables.: CHAIN MapPartition
(MapPartition at CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) ->
FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
2020-07-23 04:33:18,260 DEBUG org.apache.flink.runtime.operators.BatchTask
- Finished task code.: CHAIN MapPartition (MapPartition
at CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
2020-07-23 04:33:18,260 DEBUG
org.apache.flink.runtime.io.network.partition.PipelinedSubpartition -
CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6): Finished PipelinedSubpartition#0
[number of buffers: 1 (0 bytes), number of buffers in backlog: 0, finished?
true, read view? false].
2020-07-23 04:33:18,260 INFO org.apache.flink.runtime.taskmanager.Task
- CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6) switched from RUNNING to FINISHED.
2020-07-23 04:33:18,260 INFO org.apache.flink.runtime.taskmanager.Task
- Freeing task resources for CHAIN MapPartition
(MapPartition at CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) ->
FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6).
2020-07-23 04:33:18,260 DEBUG org.apache.flink.runtime.taskmanager.Task
- Release task CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1) network
resources (state: FINISHED).
2020-07-23 04:33:18,260 DEBUG
org.apache.flink.runtime.io.network.TaskEventDispatcher -
unregistering
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6
2020-07-23 04:33:18,260 DEBUG
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate -
CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6): Releasing
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate@63f8ba66.
2020-07-23 04:33:18,261 INFO org.apache.flink.runtime.taskmanager.Task
- Ensuring all FileSystem streams are closed for task CHAIN
MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6) [FINISHED]
2020-07-23 04:33:18,262 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor -
Un-registering task and sending final execution state FINISHED to
JobManager for task CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
119e10b4eac40a98a3a205bd77ebe1b6.
2020-07-23 04:33:18,270 DEBUG
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager -
Registered new local state store with configuration
LocalRecoveryConfig{localRecoveryMode=false,
localStateDirectories=LocalRecoveryDirectoryProvider{rootDirectories=[/tmp/localState/aid_5d0f2f5a96259f7000807b5a950623d4],
jobID=cb2fd3aa33389243e9ca27713f6b5321,
jobVertexID=1a1e117db913815bacd014f274ca9ba6, subtaskIndex=0}} for
cb2fd3aa33389243e9ca27713f6b5321 - 1a1e117db913815bacd014f274ca9ba6 - 0
under allocation id 5d0f2f5a96259f7000807b5a950623d4.
2020-07-23 04:33:18,275 DEBUG
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory
- DataSink (org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8)
(1/1) (e7f7ccdd0e5dd88608088b623e37bb81): Created 1 input channels (local:
1, remote: 0, unknown: 0).
2020-07-23 04:33:18,277 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received
task DataSink (org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8)
(1/1).
2020-07-23 04:33:18,277 INFO org.apache.flink.runtime.taskmanager.Task
- DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from CREATED to DEPLOYING.
2020-07-23 04:33:18,278 INFO org.apache.flink.runtime.taskmanager.Task
- Creating FileSystem stream leak safety net for task
DataSink (org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8)
(1/1) (e7f7ccdd0e5dd88608088b623e37bb81) [DEPLOYING]
2020-07-23 04:33:18,278 INFO org.apache.flink.runtime.taskmanager.Task
- Loading JAR files for task DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) [DEPLOYING].
2020-07-23 04:33:18,279 DEBUG org.apache.flink.runtime.taskmanager.Task
- Getting user code class loader for task
e7f7ccdd0e5dd88608088b623e37bb81 at library cache manager took 1
milliseconds
2020-07-23 04:33:18,280 INFO org.apache.flink.runtime.taskmanager.Task
- Registering task at network: DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) [DEPLOYING].
2020-07-23 04:33:18,280 DEBUG
org.apache.flink.runtime.io.network.buffer.LocalBufferPool - Using a
local buffer pool with 0-8 buffers
2020-07-23 04:33:18,280 DEBUG
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel -
LocalInputChannel
[bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6]:
Requesting LOCAL subpartition 0 of partition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6.
2020-07-23 04:33:18,280 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartitionManager -
Requesting subpartition 0 of ReleaseOnConsumptionResultPartition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6
[PIPELINED, 1 subpartitions, 1 pending consumptions].
2020-07-23 04:33:18,280 DEBUG
org.apache.flink.runtime.io.network.partition.PipelinedSubpartition -
CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6): Creating read view for subpartition 0
of partition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6.
2020-07-23 04:33:18,280 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartition - Created
PipelinedSubpartitionView(index: 0) of ResultPartition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6
2020-07-23 04:33:18,280 INFO org.apache.flink.runtime.taskmanager.Task
- DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from DEPLOYING to RUNNING.
2020-07-23 04:33:18,280 DEBUG
org.apache.flink.runtime.operators.DataSinkTask - Start
registering input and output: DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
2020-07-23 04:33:18,359 DEBUG
org.apache.flink.runtime.operators.DataSinkTask - Finished
registering input and output: DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
2020-07-23 04:33:18,359 DEBUG
org.apache.flink.runtime.operators.DataSinkTask - Starting
data sink operator: DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.operators.DataSinkTask - Starting to
produce output: DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartition -
ReleaseOnConsumptionResultPartition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6
[PIPELINED, 1 subpartitions, 0 pending consumptions]: Received consumed
notification for subpartition 0.
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartitionManager -
Received consume notification from ReleaseOnConsumptionResultPartition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6
[PIPELINED, 1 subpartitions, 0 pending consumptions].
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartition - CHAIN
MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6): Releasing
ReleaseOnConsumptionResultPartition
bb8a9a9603b8bec194c3f866ad62bd2c@119e10b4eac40a98a3a205bd77ebe1b6
[PIPELINED, 1 subpartitions, 0 pending consumptions].
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.io.network.partition.PipelinedSubpartition -
CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6): Released PipelinedSubpartition#0
[number of buffers: 1 (4 bytes), number of buffers in backlog: 0, finished?
true, read view? false].
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.io.network.partition.ResultPartitionManager -
Released partition bb8a9a9603b8bec194c3f866ad62bd2c produced by
119e10b4eac40a98a3a205bd77ebe1b6.
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.operators.DataSinkTask - Finished
data sink operator: DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
2020-07-23 04:33:18,361 INFO org.apache.flink.runtime.taskmanager.Task
- DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from RUNNING to FINISHED.
2020-07-23 04:33:18,361 INFO org.apache.flink.runtime.taskmanager.Task
- Freeing task resources for DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81).
2020-07-23 04:33:18,361 DEBUG org.apache.flink.runtime.taskmanager.Task
- Release task DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
network resources (state: FINISHED).
2020-07-23 04:33:18,361 DEBUG
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate -
DataSink (org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8)
(1/1) (e7f7ccdd0e5dd88608088b623e37bb81): Releasing
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate@5df1a0e6.
2020-07-23 04:33:18,362 INFO org.apache.flink.runtime.taskmanager.Task
- Ensuring all FileSystem streams are closed for task
DataSink (org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8)
(1/1) (e7f7ccdd0e5dd88608088b623e37bb81) [FINISHED]
2020-07-23 04:33:18,362 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor -
Un-registering task and sending final execution state FINISHED to
JobManager for task DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
e7f7ccdd0e5dd88608088b623e37bb81.
2020-07-23 04:33:18,546 DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Free slot
with allocation id 5d0f2f5a96259f7000807b5a950623d4 because: Stopping
JobMaster for job
backload-root-0722083835-3f5b221(cb2fd3aa33389243e9ca27713f6b5321).
2020-07-23 04:33:18,556 DEBUG
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
TaskSlot(index:0, state:ACTIVE, resource profile:
ResourceProfile{cpuCores=1.0000000000000000, taskHeapMemory=4.000gb
(4294967228 bytes), taskOffHeapMemory=0 bytes, managedMemory=3.400gb
(3650722256 bytes), networkMemory=870.400mb (912680564 bytes)},
allocationId: 5d0f2f5a96259f7000807b5a950623d4, jobId:
cb2fd3aa33389243e9ca27713f6b5321).
org.apache.flink.util.FlinkException: Stopping JobMaster for job
backload-root-0722083835-3f5b221(cb2fd3aa33389243e9ca27713f6b5321).
at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:343)
at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:218)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:514)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:176)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-07-23 04:33:18,575 INFO
org.apache.flink.runtime.taskexecutor.JobLeaderService - Remove job
cb2fd3aa33389243e9ca27713f6b5321 from job leader monitoring.
2020-07-23 04:33:18,575 INFO
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
- Stopping ZooKeeperLeaderRetrievalService
/leader/cb2fd3aa33389243e9ca27713f6b5321/job_manager_lock.
2020-07-23 04:33:18,575 DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Close
JobManager connection for job cb2fd3aa33389243e9ca27713f6b5321.
org.apache.flink.util.FlinkException: TaskExecutor akka.tcp://
[email protected]:6122/user/taskmanager_0 has no more allocated slots for
job cb2fd3aa33389243e9ca27713f6b5321.
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.closeJobManagerConnectionIfNoAllocatedResources(TaskExecutor.java:1518)
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.freeSlotInternal(TaskExecutor.java:1496)
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.freeSlot(TaskExecutor.java:885)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-07-23 04:33:18,578 DEBUG
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager -
Releasing local state under allocation id 5d0f2f5a96259f7000807b5a950623d4.
2020-07-23 04:33:18,586 DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Close
JobManager connection for job cb2fd3aa33389243e9ca27713f6b5321.
org.apache.flink.util.FlinkException: Stopping JobMaster for job
backload-root-0722083835-3f5b221(cb2fd3aa33389243e9ca27713f6b5321).
at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:343)
at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:218)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:514)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:176)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-07-23 04:33:18,587 INFO
org.apache.flink.runtime.taskexecutor.JobLeaderService - Cannot
reconnect to job cb2fd3aa33389243e9ca27713f6b5321 because it is not
registered.
2020-07-23 04:33:18,796 DEBUG
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got
ping response for sessionid: 0x10000014c9b0002 after 3ms
2020-07-23 04:33:27,424 DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received
heartbeat request from 71a048c9dccd2470373e40752eacd515.
This is the log from the jobmanager at the same time:
2020-07-23 04:33:18,548 INFO
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService -
Stopping ZooKeeperLeaderElectionService
ZooKeeperLeaderElectionService{leaderPath='/leader/cb2fd3aa33389243e9ca27713f6b5321/job_manager_lock'}.
2020-07-23 04:33:18,547 INFO
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Stopping
SlotPool.
2020-07-23 04:33:18,547 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
Disconnect job manager 84a046e3f81952c6a6f77ed20ff2457f
@akka.tcp://flink@flink-jobmanager:46110/user/jobmanager_0 for job
cb2fd3aa33389243e9ca27713f6b5321 from the resource manager.
"org.apache.flink.util.FlinkException: JobManager is shutting down.
at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:350)
at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:218)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:514)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:176)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"
2020-07-23 04:33:18,546 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Close ResourceManager connection
71a048c9dccd2470373e40752eacd515.
2020-07-23 04:33:18,546 INFO
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Suspending
SlotPool.
2020-07-23 04:33:18,540 INFO
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
- Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2020-07-23 04:33:18,536 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Disconnect TaskExecutor e85966367b1449d3948212bc6e7e2203
because: Stopping JobMaster for job
backload-root-0722083835-3f5b221(cb2fd3aa33389243e9ca27713f6b5321).
2020-07-23 04:33:18,536 INFO org.apache.flink.runtime.jobmaster.JobMaster
- Stopping the JobMaster for job
backload-root-0722083835-3f5b221(cb2fd3aa33389243e9ca27713f6b5321).
2020-07-23 04:33:18,521 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job
cb2fd3aa33389243e9ca27713f6b5321 reached globally terminal state FINISHED.
2020-07-23 04:33:18,520 DEBUG
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn -
Reading reply sessionid:0x10000014c9b0000, packet:: clientPath:null
serverPath:null finished:false header:: 127,5 replyHeader:: 127,1040005,0
request::
'/flink_dev/flink_dev/running_job_registry/cb2fd3aa33389243e9ca27713f6b5321,#444f4e45,-1
response::
s{1040002,1040005,1595407490568,1595478798438,2,0,0,0,4,0,1040002}
2020-07-23 04:33:18,437 DEBUG
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn -
Reading reply sessionid:0x10000014c9b0000, packet:: clientPath:null
serverPath:null finished:false header:: 126,3 replyHeader:: 126,1040004,0
request::
'/flink_dev/flink_dev/running_job_registry/cb2fd3aa33389243e9ca27713f6b5321,F
response::
s{1040002,1040003,1595407490568,1595407490577,1,0,0,0,7,0,1040002}
2020-07-23 04:33:18,437 DEBUG
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn -
Reading reply sessionid:0x10000014c9b0000, packet:: clientPath:null
serverPath:null finished:false header:: 125,3 replyHeader:: 125,1040004,0
request:: '/flink_dev/flink_dev/running_job_registry,F response::
s{43,43,1594585599374,1594585599374,0,33,0,0,0,1,1040002}
2020-07-23 04:33:18,435 DEBUG
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn -
Reading reply sessionid:0x10000014c9b0000, packet:: clientPath:null
serverPath:null finished:false header:: 124,3 replyHeader:: 124,1040004,0
request:: '/flink_dev/flink_dev,F response::
s{4,4,1594585221436,1594585221436,0,6,0,0,0,6,43}
2020-07-23 04:33:18,434 DEBUG
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn -
Reading reply sessionid:0x10000014c9b0000, packet:: clientPath:null
serverPath:null finished:false header:: 123,3 replyHeader:: 123,1040004,0
request:: '/flink_dev,F response::
s{2,2,1594585221402,1594585221402,0,1,0,0,0,1,4}
2020-07-23 04:33:18,428 DEBUG
org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperRunningJobsRegistry
- Setting scheduling state for job cb2fd3aa33389243e9ca27713f6b5321 to
DONE.
2020-07-23 04:33:18,420 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job
backload-root-0722083835-3f5b221 (cb2fd3aa33389243e9ca27713f6b5321)
switched from state RUNNING to FINISHED.
2020-07-23 04:33:18,365 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Adding
returned slot [5d0f2f5a96259f7000807b5a950623d4] to available slots
2020-07-23 04:33:18,365 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Releasing
slot [SlotRequestId{1929430ebc67637965a78e9f6ccac6da}] because: Release
multi task slot because all children have been released.
2020-07-23 04:33:18,365 DEBUG
org.apache.flink.runtime.executiongraph.ExecutionGraph - Ignoring
transition of vertex DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1) -
execution #0 to FAILED while being FINISHED.
2020-07-23 04:33:18,364 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from RUNNING to FINISHED.
2020-07-23 04:33:18,282 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from DEPLOYING to RUNNING.
2020-07-23 04:33:18,268 DEBUG
org.apache.flink.runtime.executiongraph.ExecutionGraph - Ignoring
transition of vertex CHAIN MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1) - execution
#0 to FAILED while being FINISHED.
2020-07-23 04:33:18,268 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN
MapPartition (MapPartition at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write)) -> FlatMap (FlatMap at
CassandraIO.Write/ParDo(Write)/ParMultiDo(Write).output) (1/1)
(119e10b4eac40a98a3a205bd77ebe1b6) switched from RUNNING to FINISHED.
2020-07-23 04:33:18,265 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying
DataSink (org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8)
(1/1) (attempt #0) to e85966367b1449d3948212bc6e7e2203 @
flink-taskmanager-75775df955-vpwn6 (dataPort=42349)
2020-07-23 04:33:18,265 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from SCHEDULED to DEPLOYING.
2020-07-23 04:33:18,264 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager - Create
single task slot [SlotRequestId{dd6d5ded8886fe0a88a8a2915087dd62}] in multi
task slot [SlotRequestId{49150b51beabcbf5fbab1beb404600d9}] for group
1a1e117db913815bacd014f274ca9ba6.
2020-07-23 04:33:18,264 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl - Received
slot request [SlotRequestId{dd6d5ded8886fe0a88a8a2915087dd62}] for task:
null
2020-07-23 04:33:18,264 DEBUG
org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator -
Allocate slot with id SlotRequestId{dd6d5ded8886fe0a88a8a2915087dd62} for
execution 1a1e117db913815bacd014f274ca9ba6_0
2020-07-23 04:33:18,264 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSink
(org.apache.flink.api.java.io.DiscardingOutputFormat@2b56f5f8) (1/1)
(e7f7ccdd0e5dd88608088b623e37bb81) switched from CREATED to SCHEDULED.
2020-07-23 04:33:17,525 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Received heartbeat from e85966367b1449d3948212bc6e7e2203.
2020-07-23 04:33:17,522 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Trigger heartbeat request.
2020-07-23 04:33:17,405 DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl -
Received slot report from instance c015c75144e5c4fd0ad13d9d8e67f796:
SlotReport{slotsStatus=[SlotStatus{slotID=e85966367b1449d3948212bc6e7e2203_0,
resourceProfile=ResourceProfile{cpuCores=1.0000000000000000,
taskHeapMemory=4.000gb (4294967228 bytes), taskOffHeapMemory=0 bytes,
managedMemory=3.400gb (3650722256 bytes), networkMemory=870.400mb
(912680564 bytes)}, allocationID=5d0f2f5a96259f7000807b5a950623d4,
jobID=cb2fd3aa33389243e9ca27713f6b5321}]}.
2020-07-23 04:33:17,405 DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
Received heartbeat from e85966367b1449d3948212bc6e7e2203.
2020-07-23 04:33:17,402 DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
Trigger heartbeat request.
2020-07-23 04:33:17,262 DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
Received heartbeat from ccbcdcf9b664f13ebdf97a20848bad64.
2020-07-23 04:33:17,262 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Received heartbeat request from
71a048c9dccd2470373e40752eacd515.
2020-07-23 04:33:17,262 DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
Trigger heartbeat request.
2020-07-23 04:33:13,539 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcherImpl -
Query metrics for akka.tcp://
[email protected]:37635/user/MetricQueryService_e85966367b1449d3948212bc6e7e2203
.
2020-07-23 04:33:13,537 DEBUG
org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to
connect to remote RPC endpoint with address akka.tcp://
[email protected]:37635/user/MetricQueryService_e85966367b1449d3948212bc6e7e2203.
Returning a
org.apache.flink.runtime.webmonitor.retriever.MetricQueryServiceGateway
gateway.
2020-07-23 04:33:13,533 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcherImpl -
Query metrics for akka.tcp://flink-metrics@flink-jobmanager
:39975/user/MetricQueryService.
2020-07-23 04:33:13,533 DEBUG
org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to
connect to remote RPC endpoint with address
akka.tcp://flink-metrics@flink-jobmanager:39975/user/MetricQueryService.
Returning a
org.apache.flink.runtime.webmonitor.retriever.MetricQueryServiceGateway
gateway.
2020-07-23 04:33:13,533 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcherImpl -
Retrieve metric query service gateway for
akka.tcp://flink-metrics@flink-jobmanager:39975/user/MetricQueryService
2020-07-23 04:33:13,533 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcherImpl -
Start fetching metrics.