Thank you for the reply. > Why not manually docker pull the image (remember to adjust the container location to omit to registry) locally first?
Actually, when using docker, I have already pulled the image from remote repository (GCR). Is there a way to make Flink Runner to not call "docker pull" if the container is already pulled ? Thanks, Yu Watanabe On Fri, Sep 20, 2019 at 7:48 PM Benjamin Tan <[email protected]> wrote: > Seems like some file is missing. Why not manually docker pull the image > (remember to adjust the container location to omit to registry) locally > first? At least you can eliminate another source of trouble. > > Also, depending on your pipeline, if you are running distributed, you must > make sure your files are accessible. For example, local paths won’t work at > all. > > So maybe you can force a single worker first too. > > Sent from my iPhone > > On 20 Sep 2019, at 5:11 PM, Yu Watanabe <[email protected]> wrote: > > Ankur > > Thank you for the advice. > You're right. Looking at the task manager's log, looks like first "docker > pull" fails from yarn user and then couple of errors comes after. > As a result, "docker run" seems to fail. > I have been working on whole week and still not manage through from yarn > session to get authenticated against Google Container Registry... > > > ============================================================================================== > 2019-09-19 06:47:38,196 INFO org.apache.flink.runtime.taskmanager.Task > - MapPartition (MapPartition at [3]{Create, > ParDo(EsOutputFn)}) (1/1) (d2f0d79e4614c3b0cb5a8cbd38de37da) switched from > DEPLOYING to RUNNING. > 2019-09-19 06:47:41,181 WARN > org.apache.beam.runners.fnexecution.environment.DockerCommand - Unable to > pull docker image asia.gcr.io/PROJECTNAME/beam/python3:latest, cause: > Received exit code 1 for command 'docker pull > asia.gcr.io/creationline001/beam/python3:latest'. stderr: Error response > from daemon: unauthorized: You don't have the needed permissions to perform > this operation, and you may have invalid credentials. To authenticate your > request, follow the steps in: > https://cloud.google.com/container-registry/docs/advanced-authentication > 2019-09-19 06:47:44,035 INFO > > org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService > - GetManifest for > /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST > 2019-09-19 06:47:44,037 INFO > > org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService > - Loading manifest for retrieval token > /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST > 2019-09-19 06:47:44,046 INFO > > org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService > - GetManifest for > /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST > failed > java.util.concurrent.ExecutionException: java.io.FileNotFoundException: > /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST > (No such file or directory) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) > ... > Caused by: java.io.FileNotFoundException: > /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.<init>(FileInputStream.java:138) > at > org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:114) > ... > 2019-09-19 06:48:43,952 ERROR org.apache.flink.runtime.operators.BatchTask > - Error in task code: MapPartition (MapPartition at > [3]{Create, ParDo(EsOutputFn)}) (1/1) > java.lang.Exception: The user defined 'open()' method caused an exception: > java.io.IOException: Received exit code 1 for command 'docker inspect -f > {{.State.Running}} > 7afbdcfd241629d24872ba1c74ef10f3d07c854c9cc675a65d4d16b9fdbde752'. stderr: > Error: No such object: > 7afbdcfd241629d24872ba1c74ef10f3d07c854c9cc675a65d4d16b9fdbde752 > at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > ... > > ============================================================================================== > > Thanks, > Yu Watanabe > > On Thu, Sep 19, 2019 at 3:47 AM Ankur Goenka <[email protected]> wrote: > >> Adding to the previous suggestions. >> You can also add "--retain_docker_container" to your pipeline option and >> later login to the machine to check the docker container log. >> >> Also, in my experience running on yarn, the yarn user some time do not >> have access to use docker. I would suggest checking if the yarn user on >> TaskManagers have permission to use docker. >> >> >> On Wed, Sep 18, 2019 at 11:23 AM Kyle Weaver <[email protected]> wrote: >> >>> > Per your suggest, I read the design sheet and it states that harness >>> container is a mandatory settings for all TaskManger. >>> >>> That doc is out of date. As Benjamin said, it's not strictly required >>> any more to use Docker. However, it is still recommended, as Docker makes >>> managing dependencies a lot easier, whereas PROCESS mode involves managing >>> dependencies via shell scripts. >>> >>> > Caused by: java.lang.Exception: The user defined 'open()' method >>> caused an exception: java.io.IOException: Received exit code 1 for command >>> 'docker inspect -f {{.State.Running}} >>> 3e7e0d2d9d8362995d4b566ce1611834d56c5ca550ae89ae698a279271e4c33b'. stderr: >>> Error: No such object: >>> 3e7e0d2d9d8362995d4b566ce1611834d56c5ca550ae89ae698a279271e4c33b >>> >>> This means your Docker container is failing to start up for some reason. >>> I recommend either a) running the container manually and inspecting the >>> logs, or b) you can use the master or Beam 2.16 branches, which have better >>> Docker logging (https://github.com/apache/beam/pull/9389). >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | [email protected] >>> >>> >>> On Wed, Sep 18, 2019 at 8:04 AM Yu Watanabe <[email protected]> >>> wrote: >>> >>>> Thank you for the reply. >>>> >>>> I see files "boot" under below directories. >>>> But these seems to be used for containers. >>>> >>>> (python) admin@ip-172-31-9-89:~/beam-release-2.15.0$ find ./ -name >>>> "boot" -exec ls -l {} \; >>>> lrwxrwxrwx 1 admin admin 23 Sep 16 23:43 >>>> ./sdks/python/container/.gogradle/project_gopath/src/ >>>> github.com/apache/beam/sdks/python/boot -> ../../../../../../../.. >>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48 >>>> ./sdks/python/container/build/target/launcher/linux_amd64/boot >>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48 >>>> ./sdks/python/container/build/target/launcher/darwin_amd64/boot >>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48 >>>> ./sdks/python/container/py3/build/docker/target/linux_amd64/boot >>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48 >>>> ./sdks/python/container/py3/build/docker/target/darwin_amd64/boot >>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48 >>>> ./sdks/python/container/py3/build/target/linux_amd64/boot >>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48 >>>> ./sdks/python/container/py3/build/target/darwin_amd64/boot >>>> >>>> On Wed, Sep 18, 2019 at 11:37 PM Benjamin Tan < >>>> [email protected]> wrote: >>>> >>>>> Try this as part of PipelineOptions: >>>>> >>>>> --environment_config={\"command\":\"/opt/apache/beam/boot\"} >>>>> >>>>> On 2019/09/18 10:40:42, Yu Watanabe <[email protected]> wrote: >>>>> > Hello. >>>>> > >>>>> > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and >>>>> submit job >>>>> > to AWS EMR (5.26.0). >>>>> > >>>>> > However, I get below error when I run the pipeline and fail. >>>>> > >>>>> > ========================================================- >>>>> > Caused by: java.lang.Exception: The user defined 'open()' method >>>>> caused an >>>>> > exception: java.io.IOException: Cannot run program "docker": >>>>> error=2, No >>>>> > such file or directory >>>>> > at >>>>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) >>>>> > at >>>>> > >>>>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) >>>>> > at >>>>> org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) >>>>> > ... 1 more >>>>> > Caused by: >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: >>>>> > java.io.IOException: Cannot run program "docker": error=2, No such >>>>> file or >>>>> > directory >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) >>>>> > at >>>>> > >>>>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) >>>>> > at >>>>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) >>>>> > ... 3 more >>>>> > Caused by: java.io.IOException: Cannot run program "docker": >>>>> error=2, No >>>>> > such file or directory >>>>> > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:141) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:152) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178) >>>>> > at >>>>> > >>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) >>>>> > at >>>>> > >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) >>>>> > ... 11 more >>>>> > Caused by: java.io.IOException: error=2, No such file or directory >>>>> > at java.lang.UNIXProcess.forkAndExec(Native Method) >>>>> > at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) >>>>> > at java.lang.ProcessImpl.start(ProcessImpl.java:134) >>>>> > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) >>>>> > ... 24 more >>>>> > ========================================================- >>>>> > >>>>> > Pipeline options are below. >>>>> > ========================================================- >>>>> > options = PipelineOptions([ >>>>> > "--runner=FlinkRunner", >>>>> > "--flink_version=1.8", >>>>> > >>>>> > >>>>> "--flink_master_url=ip-172-31-1-84.ap-northeast-1.compute.internal:43581", >>>>> > "--environment_config= >>>>> > asia.gcr.io/PROJECTNAME/beam/python3", >>>>> > "--experiments=beam_fn_api" >>>>> > ]) >>>>> > >>>>> > >>>>> > p = beam.Pipeline(options=options) >>>>> > ========================================================- >>>>> > >>>>> > I am able to run docker info ec2-user on the server where script is >>>>> > running.. >>>>> > >>>>> > ========================================================- >>>>> > (python) [ec2-user@ip-172-31-2-121 ~]$ docker info >>>>> > Containers: 0 >>>>> > Running: 0 >>>>> > Paused: 0 >>>>> > Stopped: 0 >>>>> > ... >>>>> > ========================================================- >>>>> > >>>>> > I used "debian-stretch" . >>>>> > >>>>> > ========================================================- >>>>> > >>>>> debian-stretch-hvm-x86_64-gp2-2019-09-08-17994-572488bb-fc09-4638-8628-e1e1d26436f4-ami-0ed2d2283aa1466df.4 >>>>> > (ami-06f16171199d98c63) >>>>> > ========================================================- >>>>> > >>>>> > This seems to not happen when flink runs locally. >>>>> > >>>>> > ========================================================- >>>>> > admin@ip-172-31-9-89:/opt/flink$ sudo ss -atunp | grep 8081 >>>>> > tcp LISTEN 0 128 :::8081 :::* >>>>> > users:(("java",pid=18420,fd=82)) >>>>> > admin@ip-172-31-9-89:/opt/flink$ sudo ps -ef | grep java | head -1 >>>>> > admin 17698 1 0 08:59 ? 00:00:12 java -jar >>>>> > >>>>> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar >>>>> > --flink-master-url >>>>> ip-172-31-1-84.ap-northeast-1.compute.internal:43581 >>>>> > --artifacts-dir /tmp/artifactskj47j8yn --job-port 48205 >>>>> --artifact-port 0 >>>>> > --expansion-port 0 >>>>> > admin@ip-172-31-9-89:/opt/flink$ >>>>> > ========================================================- >>>>> > >>>>> > Would there be any other setting I need to look for when running on >>>>> EC2 >>>>> > instance ? >>>>> > >>>>> > Thanks, >>>>> > Yu Watanabe >>>>> > >>>>> > -- >>>>> > Yu Watanabe >>>>> > Weekend Freelancer who loves to challenge building data platform >>>>> > [email protected] >>>>> > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> >>>>> [image: >>>>> > Twitter icon] <https://twitter.com/yuwtennis> >>>>> > >>>>> >>>> >>>> >>>> -- >>>> Yu Watanabe >>>> Weekend Freelancer who loves to challenge building data platform >>>> [email protected] >>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >>>> Twitter icon] <https://twitter.com/yuwtennis> >>>> >>> > > -- > Yu Watanabe > Weekend Freelancer who loves to challenge building data platform > [email protected] > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: > Twitter icon] <https://twitter.com/yuwtennis> > > -- Yu Watanabe Weekend Freelancer who loves to challenge building data platform [email protected] [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: Twitter icon] <https://twitter.com/yuwtennis>
