Hello. I am working on flink runner (2.15.0) and would like to ask question about how to solve my error.
Currently , I have a remote cluster deployed as below . (please see slide1) All master and worker nodes are installed on different server from apache beam. https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing When I run beam pipeline, harness container tries to start up, however, fails immediately with below error on docker side. ===================================================================================== Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1 --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303 --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869 Sep 23 21:04:05 ip-172-31-0-143 dockerd: time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/start Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05 Failed to retrieve staged files: failed to get manifest Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by: Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code = Unknown desc = ===================================================================================== At the same time, task manager logs below error. ===================================================================================== 2019-09-23 21:04:05,525 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - GetManifest for /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST 2019-09-23 21:04:05,526 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - Loading manifest for retrieval token /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST 2019-09-23 21:04:05,531 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - GetManifest for /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST failed java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST (No such file or directory) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) ... ===================================================================================== I see this artifact directory on the server where beam pipeline is executed but not on worker node. ===================================================================================== # Beam server (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld /tmp/artifactsfkyik3us drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us # Flink worker node [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us ls: cannot access /tmp/artifactsfkyik3us: No such file or directory ===================================================================================== >From the error, it seems that container is not starting up correctly due to manifest file is missing. What would be a good approach to reference artifact directory from worker node? I appreciate if I could get some advice . Best Regards, Yu Watanabe -- Yu Watanabe Weekend Freelancer who loves to challenge building data platform [email protected] [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: Twitter icon] <https://twitter.com/yuwtennis>
