Hi Kyle,

Thanks for the reply. I actually did not start Docker on the TaskManager
nodes, so `docker run hello-world` failed. After

`sudo systemctl start docker`
`sudo docker run hello-world`

hello-world was able to run on the TaskManager nodes. However, even with
this change the original error remains when I tried to run the job again.

Note that I was able to successfully run the pipeline locally on JobManager
(i.e. `--runner=FlinkRunner --environment_type=LOOPBACK`), and only fails
to run on the distributed cluster.

Best,
Mike


On Wed, Oct 21, 2020 at 12:09 PM Kyle Weaver <[email protected]> wrote:

> Are you able to SSH into your TaskManager nodes and run `docker run
> hello-world` successfully?
>
> On Wed, Oct 21, 2020 at 11:04 AM Mike Lo <[email protected]> wrote:
>
>> Hi all,
>>
>> I'm following these instructions
>> <https://beam.apache.org/documentation/runners/flink/> to try to run a
>> Python SDK Beam job on a distributed Flink (Version: 1.8.2) cluster built
>> using AWS EC2 instances. I've pulled Flink Job Service Docker image (Flink
>> 1.8 <https://hub.docker.com/r/apache/beam_flink1.8_job_server>) on my
>> JobManager node and installed Docker on the other TaskManager nodes (but
>> did not pull the job service image). Has anyone else run into this problem
>> before or has some clue as to what the problem is? Any input is
>> appreciated. Below are some more details.
>>
>> I submit the job using this command:
>>
>>> python -m meeting_detector \
>>> --input=/path/to/sample-data.csv \
>>> --output=/path/to/result.csv \
>>> --job_name=python-flink-job-test \
>>> --job_endpoint=localhost:8099 \
>>> --runner=PortableRunner \
>>> --environment_type=DOCKER
>>
>>
>> And this is the traceback:
>>
>> INFO:apache_beam.runners.portability.portable_runner:Job state changed to
>>> STOPPED
>>> INFO:apache_beam.runners.portability.portable_runner:Job state changed
>>> to STARTING
>>> INFO:apache_beam.runners.portability.portable_runner:Job state changed
>>> to RUNNING
>>> ERROR:root:java.io.IOException: error=2, No such file or directory
>>> INFO:apache_beam.runners.portability.portable_runner:Job state changed
>>> to FAILED
>>> Traceback (most recent call last):
>>>   File "/home/anaconda3/envs/fraud/lib/python3.6/runpy.py", line 193, in
>>> _run_module_as_main
>>>     "__main__", mod_spec)
>>>   File "/home/anaconda3/envs/fraud/lib/python3.6/runpy.py", line 85, in
>>> _run_code
>>>     exec(code, run_globals)
>>>   File "/data/jobs/meeting_detector.py", line 106, in <module>
>>>     run()
>>>   File "/data/jobs/meeting_detector.py", line 98, in run
>>>     | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output)
>>>   File
>>> "/home/anaconda3/envs/fraud/lib/python3.6/site-packages/apache_beam/pipeline.py",
>>> line 556, in __exit__
>>>     self.result.wait_until_finish()
>>>   File
>>> "/home/anaconda3/envs/fraud/lib/python3.6/site-packages/apache_beam/runners/portability/portable_runner.py",
>>> line 547, in wait_until_finish
>>>     raise self._runtime_exception
>>> RuntimeError: Pipeline
>>> python-flink-job-test_1faab21e-5b8a-4d62-ad67-879453b78d1e failed in state
>>> FAILED: java.io.IOException: error=2, No such file or directory
>>
>>
>> Below is the trace from the Docker Flink Job Service:
>>
>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Job
>>> python-flink-job-test (7a03cd7dbe75dc23e2d00377ee9c2e1e) switched from
>>> state FAILING to FAILED.
>>> org.apache.flink.runtime.JobException: Recovery is suppressed by
>>> NoRestartBackoffTimeStrategy
>>> at
>>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
>>> at
>>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
>>> at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
>>> at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186)
>>> at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180)
>>> at
>>> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:496)
>>> at
>>> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:380)
>>> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
>>> at
>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> Caused by: java.lang.Exception: The user defined 'open()' method caused
>>> an exception: java.io.IOException: Cannot run program "docker": error=2, No
>>> such file or directory
>>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:499)
>>> at
>>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by:
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>>> java.io.IOException: Cannot run program "docker": error=2, No such file or
>>> directory
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:447)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:432)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:299)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultExecutableStageContext.getStageBundleFactory(DefaultExecutableStageContext.java:38)
>>> at
>>> org.apache.beam.runners.fnexecution.control.ReferenceCountingExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingExecutableStageContextFactory.java:199)
>>> at
>>> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:137)
>>> at
>>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:495)
>>> ... 4 more
>>> Caused by: java.io.IOException: Cannot run program "docker": error=2, No
>>> such file or directory
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:186)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:168)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:248)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:227)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
>>> ... 12 more
>>> Caused by: java.io.IOException: error=2, No such file or directory
>>> at java.lang.UNIXProcess.forkAndExec(Native Method)
>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
>>> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>>> ... 26 more
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Discarding the
>>> results produced by task execution edead64f179a6403320feb57ec4ed0db.
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Discarding the
>>> results produced by task execution edead64f179a6403320feb57ec4ed0db.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.minicluster.MiniCluster - Shutting down Flink Mini
>>> Cluster
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job
>>> 7a03cd7dbe75dc23e2d00377ee9c2e1e reached globally terminal state FAILED.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Shutting down
>>> rest endpoint.
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor
>>> akka://flink/user/taskmanager_2.
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Close ResourceManager
>>> connection 13d8396855b02af0c138796ff9fe4120.
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
>>> Closing TaskExecutor connection efe9264c-ea4a-49a1-a091-ad5266ed00da
>>> because: The TaskExecutor is shutting down.
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:15, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> ab8d50ec71c77142a07556c2c778b36b, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:13, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 924a7b5b69bf9ce045b90c3779cacb8b, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:14, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> f20cbec15d2eae109199b4a7f381ea6d, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:10, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 21724dc21eeb3b989cc61218b2ed0d33, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:3, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> dfb7dbaf74369b8a88c1d3f4458d99fa, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:7, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> ee95a6d3407ad30b8b92b96ed54ffd62, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:12, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> e6b5f2defe813f0d5fa0ac6864d6fabf, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:2, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 0ac66cecfbeec45c08aa7574713fe713, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:4, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 2065cd8bd1148e8872907ba77253f11c, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:11, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> e3410cb80b2049f07c6b28181fc0c8f7, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:1, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 81965b19791ad7d681d274e16a178f36, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:5, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> ffb3f5ec8f8af914156caa70aa1976f5, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:9, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 833c6fb3b038dbca13c9507b9c9fe3e1, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:0, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 00a67feee8fd2a9dae42fa21224e4e1f, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:6, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> bda8240ec3870e8408383388983b15f5, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - Free slot
>>> TaskSlot(index:8, state:ACTIVE, resource profile:
>>> ResourceProfile{managedMemory=8.000mb (8388608 bytes),
>>> networkMemory=4.000mb (4194304 bytes)}, allocationId:
>>> 037a99d1e2c6e53b14a1131c86ec707c, jobId: 7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for
>>> job python-flink-job-test(7a03cd7dbe75dc23e2d00377ee9c2e1e).
>>> [flink-runner-job-invoker] ERROR
>>> org.apache.beam.runners.jobsubmission.JobInvocation - Error during job
>>> invocation python-flink-job-test_1faab21e-5b8a-4d62-ad67-879453b78d1e.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader
>>> service.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager -
>>> Shutting down TaskExecutorLocalStateStoresManager.
>>> java.util.concurrent.ExecutionException:
>>> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>>> at
>>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>>> at
>>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>>> at
>>> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:864)
>>> at
>>> org.apache.beam.runners.flink.FlinkBatchPortablePipelineTranslator$BatchTranslationContext.execute(FlinkBatchPortablePipelineTranslator.java:194)
>>> at
>>> org.apache.beam.runners.flink.FlinkPipelineRunner.runPipelineWithTranslator(FlinkPipelineRunner.java:116)
>>> at
>>> org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:83)
>>> at
>>> org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:83)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>>> execution failed.
>>> at
>>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
>>> at
>>> org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:175)
>>> at
>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>>> at
>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:874)
>>> at akka.dispatch.OnComplete.internal(Future.scala:264)
>>> at akka.dispatch.OnComplete.internal(Future.scala:261)
>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>> at
>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>> at
>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>> at
>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>> at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
>>> at
>>> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
>>> at
>>> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
>>> at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
>>> at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>> at
>>> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>>> at
>>> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>>> at
>>> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>>> at
>>> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>>> at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>>> at
>>> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>>> at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed
>>> by NoRestartBackoffTimeStrategy
>>> at
>>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
>>> at
>>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
>>> at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
>>> at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186)
>>> at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180)
>>> at
>>> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:496)
>>> at
>>> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:380)
>>> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
>>> at
>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>> ... 4 more
>>> Caused by: java.lang.Exception: The user defined 'open()' method caused
>>> an exception: java.io.IOException: Cannot run program "docker": error=2, No
>>> such file or directory
>>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:499)
>>> at
>>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by:
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>>> java.io.IOException: Cannot run program "docker": error=2, No such file or
>>> directory
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:447)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:432)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:299)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultExecutableStageContext.getStageBundleFactory(DefaultExecutableStageContext.java:38)
>>> at
>>> org.apache.beam.runners.fnexecution.control.ReferenceCountingExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingExecutableStageContextFactory.java:199)
>>> at
>>> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:137)
>>> at
>>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:495)
>>> ... 4 more
>>> Caused by: java.io.IOException: Cannot run program "docker": error=2, No
>>> such file or directory
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:186)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:168)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:248)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:227)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
>>> ... 12 more
>>> Caused by: java.io.IOException: error=2, No such file or directory
>>> at java.lang.UNIXProcess.forkAndExec(Native Method)
>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
>>> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>>> ... 26 more
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Suspending
>>> SlotPool.
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager
>>> connection 13d8396855b02af0c138796ff9fe4120: JobManager is shutting down..
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Stopping
>>> SlotPool.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager -
>>> Disconnect job manager 
>>> ac69ec5dfd8c2b5d90240c1af14c46de@akka://flink/user/jobmanager_3
>>> for job 7a03cd7dbe75dc23e2d00377ee9c2e1e from the resource manager.
>>> [ForkJoinPool.commonPool-worker-2] INFO
>>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Removing cache
>>> directory /tmp/flink-web-ui
>>> [ForkJoinPool.commonPool-worker-2] INFO
>>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Shut down
>>> complete.
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Shut
>>> down cluster because application is in CANCELED, diagnostics
>>> DispatcherResourceManagerComponent has been closed..
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent
>>> - Closing components.
>>> [flink-akka.actor.default-dispatcher-2] INFO
>>> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess -
>>> Stopping SessionDispatcherLeaderProcess.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopping
>>> dispatcher akka://flink/user/dispatcher.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopping all
>>> currently running jobs of dispatcher akka://flink/user/dispatcher.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.io.disk.FileChannelManagerImpl -
>>> FileChannelManager removed spill file directory
>>> /tmp/flink-io-d968bae7-ae23-4528-8e03-32a9173acdd0
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl -
>>> Closing the SlotManager.
>>> [flink-akka.actor.default-dispatcher-4] INFO
>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl -
>>> Suspending the SlotManager.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.io.network.NettyShuffleEnvironment - Shutting down
>>> the network environment and its components.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.rest.handler.legacy.backpressure.BackPressureRequestCoordinator
>>> - Shutting down back pressure request coordinator.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopped
>>> dispatcher akka://flink/user/dispatcher.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.io.disk.FileChannelManagerImpl -
>>> FileChannelManager removed spill file directory
>>> /tmp/flink-netty-shuffle-9663a939-6608-4337-94b2-f32bfdca367d
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.taskexecutor.KvStateService - Shutting down the
>>> kvState service and its components.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader
>>> service.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.filecache.FileCache - removed file cache directory
>>> /tmp/flink-dist-cache-a640b6a1-c444-4886-8b6f-98aed046bf2b
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor
>>> akka://flink/user/taskmanager_2.
>>> [flink-akka.actor.default-dispatcher-5] INFO
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC
>>> service.
>>> [flink-metrics-2] INFO
>>> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down
>>> remote daemon.
>>> [flink-metrics-2] INFO
>>> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut
>>> down; proceeding with flushing remote transports.
>>> [flink-metrics-2] INFO
>>> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
>>> [flink-metrics-2] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService
>>> - Stopping Akka RPC service.
>>> [flink-metrics-2] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService
>>> - Stopped Akka RPC service.
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at
>>> 0.0.0.0:42851
>>> [flink-akka.actor.default-dispatcher-3] INFO
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
>>
>>
>> Best,
>> Mike
>>
>

Reply via email to