Hello.

I am working on flink runner (2.15.0) and would like to ask question about
how to solve my error.

Currently , I have a remote cluster deployed as below . (please see slide1)
All master and worker nodes are installed on different server from apache
beam.

https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing

When I run beam pipeline, harness container tries to start up, however,
fails immediately with below error on docker side.
=====================================================================================
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
Initializing python harness: /opt/apache/beam/boot --id=1
--logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
--provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
Sep 23 21:04:05 ip-172-31-0-143 dockerd:
time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
module=libcontainerd namespace=moby topic=/tasks/start
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
Failed to retrieve staged files: failed to get manifest
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
Unknown desc =
=====================================================================================

At the same time, task manager logs below error.
=====================================================================================
2019-09-23 21:04:05,525 INFO
 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - GetManifest for
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
2019-09-23 21:04:05,526 INFO
 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - Loading manifest for retrieval token
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
2019-09-23 21:04:05,531 INFO
 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - GetManifest for
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
failed
java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
(No such file or directory)
        at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
...
=====================================================================================

I see this artifact directory on the server where beam pipeline is executed
but not on worker node.
=====================================================================================
# Beam server
(python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld /tmp/artifactsfkyik3us
drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us

# Flink worker node
[ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
 
=====================================================================================

>From the error, it seems that container is not starting up correctly due to
manifest file is missing.
What would be a good approach to reference artifact directory from worker
node?
I appreciate if I could get some advice .

Best Regards,
Yu Watanabe

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
[email protected]
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Reply via email to