You need to set your artifact directory to point to a distributed filesystem that's also accessible to the workers (when starting up the job server).
On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <[email protected]> wrote: > Hello. > > I am working on flink runner (2.15.0) and would like to ask question about > how to solve my error. > > Currently , I have a remote cluster deployed as below . (please see > slide1) > All master and worker nodes are installed on different server from apache > beam. > > > https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing > > When I run beam pipeline, harness container tries to start up, however, > fails immediately with below error on docker side. > > ===================================================================================== > Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05 > Initializing python harness: /opt/apache/beam/boot --id=1 > --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303 > --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869 > Sep 23 21:04:05 ip-172-31-0-143 dockerd: > time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event > module=libcontainerd namespace=moby topic=/tasks/start > Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05 > Failed to retrieve staged files: failed to get manifest > Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by: > Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code = > Unknown desc = > > ===================================================================================== > > At the same time, task manager logs below error. > > ===================================================================================== > 2019-09-23 21:04:05,525 INFO > > org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService > - GetManifest for > /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST > 2019-09-23 21:04:05,526 INFO > > org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService > - Loading manifest for retrieval token > /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST > 2019-09-23 21:04:05,531 INFO > > org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService > - GetManifest for > /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST > failed > java.util.concurrent.ExecutionException: java.io.FileNotFoundException: > /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST > (No such file or directory) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) > ... > > ===================================================================================== > > I see this artifact directory on the server where beam pipeline is > executed but not on worker node. > > ===================================================================================== > # Beam server > (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld > /tmp/artifactsfkyik3us > drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us > > # Flink worker node > [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us > ls: cannot access /tmp/artifactsfkyik3us: No such file or directory > > > ===================================================================================== > > From the error, it seems that container is not starting up correctly due > to manifest file is missing. > What would be a good approach to reference artifact directory from worker > node? > I appreciate if I could get some advice . > > Best Regards, > Yu Watanabe > > -- > Yu Watanabe > Weekend Freelancer who loves to challenge building data platform > [email protected] > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: > Twitter icon] <https://twitter.com/yuwtennis> >
