Hi everyone,

We are attempting to run flink 1.2 in a distributed dockerized environment
and are running into issues when running jobs in parallel.

The exception we are getting fairly quickly after start up is:

org.apache.flink.runtime.io.network.partition.PartitionNotFoundException:
Partition d3d8404aa26bedafd77e88bdfd88375b@84037703da6706cd1017f53fd8b818cd
not found.
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.failPartitionRequest(RemoteInputChannel.java:204)
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.retriggerSubpartitionRequest(RemoteInputChannel.java:129)
        at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.retriggerPartitionRequest(SingleInputGate.java:331)
        at 
org.apache.flink.runtime.taskmanager.Task.onPartitionStateUpdate(Task.java:1244)
        at org.apache.flink.runtime.taskmanager.Task$2.apply(Task.java:1082)
        at org.apache.flink.runtime.taskmanager.Task$2.apply(Task.java:1077)
        at 
org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:259)
        at akka.dispatch.OnComplete.internal(Future.scala:248)
        at akka.dispatch.OnComplete.internal(Future.scala:245)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
        at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
        at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
        at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
        at 
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

This only occurs when running in parallel but I don't have a lot to go on
from the exception. We have configured the following ports:
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
taskmanager.data.port: 6121

And have mapped the docker ports 6121 and 6122 on the task managers as well
as 6123 on the job manager.

Does anyone have any suggestions for other places to look or settings to
try?

Thanks,
David

Reply via email to