Ankur

Thank you for the advice.
You're right. Looking at the task manager's log,  looks like first "docker
pull" fails from yarn user and then couple of errors comes after.
As a result, "docker run" seems to fail.
I have been working on whole week and still not manage through from  yarn
session to get authenticated against Google Container Registry...

==============================================================================================
2019-09-19 06:47:38,196 INFO  org.apache.flink.runtime.taskmanager.Task
                - MapPartition (MapPartition at [3]{Create,
ParDo(EsOutputFn)}) (1/1) (d2f0d79e4614c3b0cb5a8cbd38de37da) switched from
DEPLOYING to RUNNING.
2019-09-19 06:47:41,181 WARN
 org.apache.beam.runners.fnexecution.environment.DockerCommand  - Unable to
pull docker image asia.gcr.io/PROJECTNAME/beam/python3:latest, cause:
Received exit code 1 for command 'docker pull
asia.gcr.io/creationline001/beam/python3:latest'. stderr: Error response
from daemon: unauthorized: You don't have the needed permissions to perform
this operation, and you may have invalid credentials. To authenticate your
request, follow the steps in:
https://cloud.google.com/container-registry/docs/advanced-authentication
2019-09-19 06:47:44,035 INFO
 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - GetManifest for
/tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
2019-09-19 06:47:44,037 INFO
 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - Loading manifest for retrieval token
/tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
2019-09-19 06:47:44,046 INFO
 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - GetManifest for
/tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
failed
java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
/tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
(No such file or directory)
        at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
        at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
...
Caused by: java.io.FileNotFoundException:
/tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
(No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at
org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:114)
...
2019-09-19 06:48:43,952 ERROR org.apache.flink.runtime.operators.BatchTask
                 - Error in task code:  MapPartition (MapPartition at
[3]{Create, ParDo(EsOutputFn)}) (1/1)
java.lang.Exception: The user defined 'open()' method caused an exception:
java.io.IOException: Received exit code 1 for command 'docker inspect -f
{{.State.Running}}
7afbdcfd241629d24872ba1c74ef10f3d07c854c9cc675a65d4d16b9fdbde752'. stderr:
Error: No such object:
7afbdcfd241629d24872ba1c74ef10f3d07c854c9cc675a65d4d16b9fdbde752
        at
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
        at
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
        at java.lang.Thread.run(Thread.java:748)
...
==============================================================================================

Thanks,
Yu Watanabe

On Thu, Sep 19, 2019 at 3:47 AM Ankur Goenka <[email protected]> wrote:

> Adding to the previous suggestions.
> You can also add "--retain_docker_container" to your pipeline option and
> later login to the machine to check the docker container log.
>
> Also, in my experience running on yarn, the yarn user some time do not
> have access to use docker. I would suggest checking if the yarn user on
> TaskManagers have permission to use docker.
>
>
> On Wed, Sep 18, 2019 at 11:23 AM Kyle Weaver <[email protected]> wrote:
>
>> > Per your suggest, I read the design sheet and  it states that harness
>> container is a mandatory settings for  all TaskManger.
>>
>> That doc is out of date. As Benjamin said, it's not strictly required any
>> more to use Docker. However, it is still recommended, as Docker makes
>> managing dependencies a lot easier, whereas PROCESS mode involves managing
>> dependencies via shell scripts.
>>
>> > Caused by: java.lang.Exception: The user defined 'open()' method caused
>> an exception: java.io.IOException: Received exit code 1 for command 'docker
>> inspect -f {{.State.Running}}
>> 3e7e0d2d9d8362995d4b566ce1611834d56c5ca550ae89ae698a279271e4c33b'. stderr:
>> Error: No such object:
>> 3e7e0d2d9d8362995d4b566ce1611834d56c5ca550ae89ae698a279271e4c33b
>>
>> This means your Docker container is failing to start up for some reason.
>> I recommend either a) running the container manually and inspecting the
>> logs, or b) you can use the master or Beam 2.16 branches, which have better
>> Docker logging (https://github.com/apache/beam/pull/9389).
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib | [email protected]
>>
>>
>> On Wed, Sep 18, 2019 at 8:04 AM Yu Watanabe <[email protected]>
>> wrote:
>>
>>> Thank you for the reply.
>>>
>>> I see files "boot" under below directories.
>>> But these seems to be used for containers.
>>>
>>>   (python) admin@ip-172-31-9-89:~/beam-release-2.15.0$ find ./ -name
>>> "boot" -exec ls -l {} \;
>>> lrwxrwxrwx 1 admin admin 23 Sep 16 23:43
>>> ./sdks/python/container/.gogradle/project_gopath/src/
>>> github.com/apache/beam/sdks/python/boot -> ../../../../../../../..
>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48
>>> ./sdks/python/container/build/target/launcher/linux_amd64/boot
>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48
>>> ./sdks/python/container/build/target/launcher/darwin_amd64/boot
>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48
>>> ./sdks/python/container/py3/build/docker/target/linux_amd64/boot
>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48
>>> ./sdks/python/container/py3/build/docker/target/darwin_amd64/boot
>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48
>>> ./sdks/python/container/py3/build/target/linux_amd64/boot
>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48
>>> ./sdks/python/container/py3/build/target/darwin_amd64/boot
>>>
>>> On Wed, Sep 18, 2019 at 11:37 PM Benjamin Tan <
>>> [email protected]> wrote:
>>>
>>>> Try this as part of PipelineOptions:
>>>>
>>>> --environment_config={\"command\":\"/opt/apache/beam/boot\"}
>>>>
>>>> On 2019/09/18 10:40:42, Yu Watanabe <[email protected]> wrote:
>>>> > Hello.
>>>> >
>>>> > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and
>>>> submit job
>>>> > to AWS EMR (5.26.0).
>>>> >
>>>> > However, I get below error when I run the pipeline and fail.
>>>> >
>>>> > ========================================================-
>>>> > Caused by: java.lang.Exception: The user defined 'open()' method
>>>> caused an
>>>> > exception: java.io.IOException: Cannot run program "docker": error=2,
>>>> No
>>>> > such file or directory
>>>> >         at
>>>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>>>> >         at
>>>> org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>>>> >         ... 1 more
>>>> > Caused by:
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>>>> > java.io.IOException: Cannot run program "docker": error=2, No such
>>>> file or
>>>> > directory
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
>>>> >         at
>>>> >
>>>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>>>> >         at
>>>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
>>>> >         ... 3 more
>>>> > Caused by: java.io.IOException: Cannot run program "docker": error=2,
>>>> No
>>>> > such file or directory
>>>> >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:141)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:152)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178)
>>>> >         at
>>>> >
>>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
>>>> >         at
>>>> >
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
>>>> >         ... 11 more
>>>> > Caused by: java.io.IOException: error=2, No such file or directory
>>>> >         at java.lang.UNIXProcess.forkAndExec(Native Method)
>>>> >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
>>>> >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>>>> >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>>>> >         ... 24 more
>>>> >  ========================================================-
>>>> >
>>>> > Pipeline options are below.
>>>> >  ========================================================-
>>>> >         options = PipelineOptions([
>>>> >                       "--runner=FlinkRunner",
>>>> >                       "--flink_version=1.8",
>>>> >
>>>> >
>>>> "--flink_master_url=ip-172-31-1-84.ap-northeast-1.compute.internal:43581",
>>>> >                       "--environment_config=
>>>> > asia.gcr.io/PROJECTNAME/beam/python3",
>>>> >                       "--experiments=beam_fn_api"
>>>> >                   ])
>>>> >
>>>> >
>>>> >         p = beam.Pipeline(options=options)
>>>> >  ========================================================-
>>>> >
>>>> > I am able to run docker info ec2-user on the server where script is
>>>> > running..
>>>> >
>>>> >  ========================================================-
>>>> > (python) [ec2-user@ip-172-31-2-121 ~]$ docker info
>>>> > Containers: 0
>>>> >  Running: 0
>>>> >  Paused: 0
>>>> >  Stopped: 0
>>>> > ...
>>>> >  ========================================================-
>>>> >
>>>> > I used  "debian-stretch" .
>>>> >
>>>> > ========================================================-
>>>> >
>>>> debian-stretch-hvm-x86_64-gp2-2019-09-08-17994-572488bb-fc09-4638-8628-e1e1d26436f4-ami-0ed2d2283aa1466df.4
>>>> > (ami-06f16171199d98c63)
>>>> > ========================================================-
>>>> >
>>>> > This seems to not happen when flink runs locally.
>>>> >
>>>> > ========================================================-
>>>> > admin@ip-172-31-9-89:/opt/flink$ sudo ss -atunp | grep 8081
>>>> > tcp    LISTEN     0      128      :::8081                 :::*
>>>> >       users:(("java",pid=18420,fd=82))
>>>> > admin@ip-172-31-9-89:/opt/flink$ sudo ps -ef | grep java | head -1
>>>> > admin    17698     1  0 08:59 ?        00:00:12 java -jar
>>>> >
>>>> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
>>>> > --flink-master-url
>>>> ip-172-31-1-84.ap-northeast-1.compute.internal:43581
>>>> > --artifacts-dir /tmp/artifactskj47j8yn --job-port 48205
>>>> --artifact-port 0
>>>> > --expansion-port 0
>>>> > admin@ip-172-31-9-89:/opt/flink$
>>>> > ========================================================-
>>>> >
>>>> > Would there be any other setting I need to look for when running on
>>>> EC2
>>>> > instance ?
>>>> >
>>>> > Thanks,
>>>> > Yu Watanabe
>>>> >
>>>> > --
>>>> > Yu Watanabe
>>>> > Weekend Freelancer who loves to challenge building data platform
>>>> > [email protected]
>>>> > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>
>>>> [image:
>>>> > Twitter icon] <https://twitter.com/yuwtennis>
>>>> >
>>>>
>>>
>>>
>>> --
>>> Yu Watanabe
>>> Weekend Freelancer who loves to challenge building data platform
>>> [email protected]
>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>
>>

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
[email protected]
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Reply via email to