Hi Kyle, Thanks for the reply. I actually did not start Docker on the TaskManager nodes, so `docker run hello-world` failed. After
`sudo systemctl start docker` `sudo docker run hello-world` hello-world was able to run on the TaskManager nodes. However, even with this change the original error remains when I tried to run the job again. Note that I was able to successfully run the pipeline locally on JobManager (i.e. `--runner=FlinkRunner --environment_type=LOOPBACK`), and only fails to run on the distributed cluster. Best, Mike On Wed, Oct 21, 2020 at 12:09 PM Kyle Weaver <[email protected]> wrote: > Are you able to SSH into your TaskManager nodes and run `docker run > hello-world` successfully? > > On Wed, Oct 21, 2020 at 11:04 AM Mike Lo <[email protected]> wrote: > >> Hi all, >> >> I'm following these instructions >> <https://beam.apache.org/documentation/runners/flink/> to try to run a >> Python SDK Beam job on a distributed Flink (Version: 1.8.2) cluster built >> using AWS EC2 instances. I've pulled Flink Job Service Docker image (Flink >> 1.8 <https://hub.docker.com/r/apache/beam_flink1.8_job_server>) on my >> JobManager node and installed Docker on the other TaskManager nodes (but >> did not pull the job service image). Has anyone else run into this problem >> before or has some clue as to what the problem is? Any input is >> appreciated. Below are some more details. >> >> I submit the job using this command: >> >>> python -m meeting_detector \ >>> --input=/path/to/sample-data.csv \ >>> --output=/path/to/result.csv \ >>> --job_name=python-flink-job-test \ >>> --job_endpoint=localhost:8099 \ >>> --runner=PortableRunner \ >>> --environment_type=DOCKER >> >> >> And this is the traceback: >> >> INFO:apache_beam.runners.portability.portable_runner:Job state changed to >>> STOPPED >>> INFO:apache_beam.runners.portability.portable_runner:Job state changed >>> to STARTING >>> INFO:apache_beam.runners.portability.portable_runner:Job state changed >>> to RUNNING >>> ERROR:root:java.io.IOException: error=2, No such file or directory >>> INFO:apache_beam.runners.portability.portable_runner:Job state changed >>> to FAILED >>> Traceback (most recent call last): >>> File "/home/anaconda3/envs/fraud/lib/python3.6/runpy.py", line 193, in >>> _run_module_as_main >>> "__main__", mod_spec) >>> File "/home/anaconda3/envs/fraud/lib/python3.6/runpy.py", line 85, in >>> _run_code >>> exec(code, run_globals) >>> File "/data/jobs/meeting_detector.py", line 106, in <module> >>> run() >>> File "/data/jobs/meeting_detector.py", line 98, in run >>> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output) >>> File >>> "/home/anaconda3/envs/fraud/lib/python3.6/site-packages/apache_beam/pipeline.py", >>> line 556, in __exit__ >>> self.result.wait_until_finish() >>> File >>> "/home/anaconda3/envs/fraud/lib/python3.6/site-packages/apache_beam/runners/portability/portable_runner.py", >>> line 547, in wait_until_finish >>> raise self._runtime_exception >>> RuntimeError: Pipeline >>> python-flink-job-test_1faab21e-5b8a-4d62-ad67-879453b78d1e failed in state >>> FAILED: java.io.IOException: error=2, No such file or directory >> >> >> Below is the trace from the Docker Flink Job Service: >> >> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Job >>> python-flink-job-test (7a03cd7dbe75dc23e2d00377ee9c2e1e) switched from >>> state FAILING to FAILED. >>> org.apache.flink.runtime.JobException: Recovery is suppressed by >>> NoRestartBackoffTimeStrategy >>> at >>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110) >>> at >>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76) >>> at >>> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) >>> at >>> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186) >>> at >>> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180) >>> at >>> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:496) >>> at >>> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:380) >>> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at >>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284) >>> at >>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199) >>> at >>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) >>> at >>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) >>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at >>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>> Caused by: java.lang.Exception: The user defined 'open()' method caused >>> an exception: java.io.IOException: Cannot run program "docker": error=2, No >>> such file or directory >>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:499) >>> at >>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) >>> at java.lang.Thread.run(Thread.java:748) >>> Caused by: >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: >>> java.io.IOException: Cannot run program "docker": error=2, No such file or >>> directory >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:447) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:432) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:299) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultExecutableStageContext.getStageBundleFactory(DefaultExecutableStageContext.java:38) >>> at >>> org.apache.beam.runners.fnexecution.control.ReferenceCountingExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingExecutableStageContextFactory.java:199) >>> at >>> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:137) >>> at >>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) >>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:495) >>> ... 4 more >>> Caused by: java.io.IOException: Cannot run program "docker": error=2, No >>> such file or directory >>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:186) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:168) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:248) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:227) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) >>> ... 12 more >>> Caused by: java.io.IOException: error=2, No such file or directory >>> at java.lang.UNIXProcess.forkAndExec(Native Method) >>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) >>> at java.lang.ProcessImpl.start(ProcessImpl.java:134) >>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) >>> ... 26 more >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Discarding the >>> results produced by task execution edead64f179a6403320feb57ec4ed0db. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Discarding the >>> results produced by task execution edead64f179a6403320feb57ec4ed0db. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.minicluster.MiniCluster - Shutting down Flink Mini >>> Cluster >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job >>> 7a03cd7dbe75dc23e2d00377ee9c2e1e reached globally terminal state FAILED. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Shutting down >>> rest endpoint. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor >>> akka://flink/user/taskmanager_2. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Close ResourceManager >>> connection 13d8396855b02af0c138796ff9fe4120. >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>> Closing TaskExecutor connection efe9264c-ea4a-49a1-a091-ad5266ed00da >>> because: The TaskExecutor is shutting down. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:15, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> ab8d50ec71c77142a07556c2c778b36b, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:13, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 924a7b5b69bf9ce045b90c3779cacb8b, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:14, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> f20cbec15d2eae109199b4a7f381ea6d, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:10, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 21724dc21eeb3b989cc61218b2ed0d33, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:3, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> dfb7dbaf74369b8a88c1d3f4458d99fa, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:7, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> ee95a6d3407ad30b8b92b96ed54ffd62, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:12, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> e6b5f2defe813f0d5fa0ac6864d6fabf, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:2, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 0ac66cecfbeec45c08aa7574713fe713, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:4, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 2065cd8bd1148e8872907ba77253f11c, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:11, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> e3410cb80b2049f07c6b28181fc0c8f7, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:1, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 81965b19791ad7d681d274e16a178f36, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:5, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> ffb3f5ec8f8af914156caa70aa1976f5, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:9, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 833c6fb3b038dbca13c9507b9c9fe3e1, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:0, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 00a67feee8fd2a9dae42fa21224e4e1f, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:6, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> bda8240ec3870e8408383388983b15f5, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot >>> TaskSlot(index:8, state:ACTIVE, resource profile: >>> ResourceProfile{managedMemory=8.000mb (8388608 bytes), >>> networkMemory=4.000mb (4194304 bytes)}, allocationId: >>> 037a99d1e2c6e53b14a1131c86ec707c, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for >>> job python-flink-job-test(7a03cd7dbe75dc23e2d00377ee9c2e1e). >>> [flink-runner-job-invoker] ERROR >>> org.apache.beam.runners.jobsubmission.JobInvocation - Error during job >>> invocation python-flink-job-test_1faab21e-5b8a-4d62-ad67-879453b78d1e. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader >>> service. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - >>> Shutting down TaskExecutorLocalStateStoresManager. >>> java.util.concurrent.ExecutionException: >>> org.apache.flink.runtime.client.JobExecutionException: Job execution failed. >>> at >>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) >>> at >>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) >>> at >>> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:864) >>> at >>> org.apache.beam.runners.flink.FlinkBatchPortablePipelineTranslator$BatchTranslationContext.execute(FlinkBatchPortablePipelineTranslator.java:194) >>> at >>> org.apache.beam.runners.flink.FlinkPipelineRunner.runPipelineWithTranslator(FlinkPipelineRunner.java:116) >>> at >>> org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:83) >>> at >>> org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:83) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>> at java.lang.Thread.run(Thread.java:748) >>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job >>> execution failed. >>> at >>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147) >>> at >>> org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:175) >>> at >>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) >>> at >>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) >>> at >>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) >>> at >>> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) >>> at >>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:874) >>> at akka.dispatch.OnComplete.internal(Future.scala:264) >>> at akka.dispatch.OnComplete.internal(Future.scala:261) >>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) >>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) >>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) >>> at >>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74) >>> at >>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) >>> at >>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) >>> at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) >>> at >>> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22) >>> at >>> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21) >>> at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) >>> at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) >>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) >>> at >>> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) >>> at >>> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) >>> at >>> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) >>> at >>> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) >>> at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) >>> at >>> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) >>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) >>> at >>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) >>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at >>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>> Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed >>> by NoRestartBackoffTimeStrategy >>> at >>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110) >>> at >>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76) >>> at >>> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) >>> at >>> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186) >>> at >>> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180) >>> at >>> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:496) >>> at >>> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:380) >>> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at >>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284) >>> at >>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199) >>> at >>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) >>> at >>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) >>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>> ... 4 more >>> Caused by: java.lang.Exception: The user defined 'open()' method caused >>> an exception: java.io.IOException: Cannot run program "docker": error=2, No >>> such file or directory >>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:499) >>> at >>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) >>> at java.lang.Thread.run(Thread.java:748) >>> Caused by: >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: >>> java.io.IOException: Cannot run program "docker": error=2, No such file or >>> directory >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:447) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:432) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:299) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultExecutableStageContext.getStageBundleFactory(DefaultExecutableStageContext.java:38) >>> at >>> org.apache.beam.runners.fnexecution.control.ReferenceCountingExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingExecutableStageContextFactory.java:199) >>> at >>> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:137) >>> at >>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) >>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:495) >>> ... 4 more >>> Caused by: java.io.IOException: Cannot run program "docker": error=2, No >>> such file or directory >>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:186) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:168) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92) >>> at >>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:248) >>> at >>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:227) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) >>> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) >>> ... 12 more >>> Caused by: java.io.IOException: error=2, No such file or directory >>> at java.lang.UNIXProcess.forkAndExec(Native Method) >>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) >>> at java.lang.ProcessImpl.start(ProcessImpl.java:134) >>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) >>> ... 26 more >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Suspending >>> SlotPool. >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager >>> connection 13d8396855b02af0c138796ff9fe4120: JobManager is shutting down.. >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Stopping >>> SlotPool. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>> Disconnect job manager >>> ac69ec5dfd8c2b5d90240c1af14c46de@akka://flink/user/jobmanager_3 >>> for job 7a03cd7dbe75dc23e2d00377ee9c2e1e from the resource manager. >>> [ForkJoinPool.commonPool-worker-2] INFO >>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Removing cache >>> directory /tmp/flink-web-ui >>> [ForkJoinPool.commonPool-worker-2] INFO >>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Shut down >>> complete. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Shut >>> down cluster because application is in CANCELED, diagnostics >>> DispatcherResourceManagerComponent has been closed.. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent >>> - Closing components. >>> [flink-akka.actor.default-dispatcher-2] INFO >>> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess - >>> Stopping SessionDispatcherLeaderProcess. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopping >>> dispatcher akka://flink/user/dispatcher. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopping all >>> currently running jobs of dispatcher akka://flink/user/dispatcher. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.io.disk.FileChannelManagerImpl - >>> FileChannelManager removed spill file directory >>> /tmp/flink-io-d968bae7-ae23-4528-8e03-32a9173acdd0 >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - >>> Closing the SlotManager. >>> [flink-akka.actor.default-dispatcher-4] INFO >>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - >>> Suspending the SlotManager. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.io.network.NettyShuffleEnvironment - Shutting down >>> the network environment and its components. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.rest.handler.legacy.backpressure.BackPressureRequestCoordinator >>> - Shutting down back pressure request coordinator. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopped >>> dispatcher akka://flink/user/dispatcher. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.io.disk.FileChannelManagerImpl - >>> FileChannelManager removed spill file directory >>> /tmp/flink-netty-shuffle-9663a939-6608-4337-94b2-f32bfdca367d >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.taskexecutor.KvStateService - Shutting down the >>> kvState service and its components. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader >>> service. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.filecache.FileCache - removed file cache directory >>> /tmp/flink-dist-cache-a640b6a1-c444-4886-8b6f-98aed046bf2b >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor >>> akka://flink/user/taskmanager_2. >>> [flink-akka.actor.default-dispatcher-5] INFO >>> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC >>> service. >>> [flink-metrics-2] INFO >>> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down >>> remote daemon. >>> [flink-metrics-2] INFO >>> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut >>> down; proceeding with flushing remote transports. >>> [flink-metrics-2] INFO >>> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. >>> [flink-metrics-2] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService >>> - Stopping Akka RPC service. >>> [flink-metrics-2] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService >>> - Stopped Akka RPC service. >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at >>> 0.0.0.0:42851 >>> [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service. >> >> >> Best, >> Mike >> >
