You need to set your artifact directory to point to a
distributed filesystem that's also accessible to the workers (when starting
up the job server).

On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <[email protected]> wrote:

> Hello.
>
> I am working on flink runner (2.15.0) and would like to ask question about
> how to solve my error.
>
> Currently , I have a remote cluster deployed as below . (please see
> slide1)
> All master and worker nodes are installed on different server from apache
> beam.
>
>
> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>
> When I run beam pipeline, harness container tries to start up, however,
> fails immediately with below error on docker side.
>
> =====================================================================================
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
> Initializing python harness: /opt/apache/beam/boot --id=1
> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
> module=libcontainerd namespace=moby topic=/tasks/start
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
> Failed to retrieve staged files: failed to get manifest
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
> Unknown desc =
>
> =====================================================================================
>
> At the same time, task manager logs below error.
>
> =====================================================================================
> 2019-09-23 21:04:05,525 INFO
>  
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>  - GetManifest for
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> 2019-09-23 21:04:05,526 INFO
>  
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>  - Loading manifest for retrieval token
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> 2019-09-23 21:04:05,531 INFO
>  
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>  - GetManifest for
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> failed
> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> (No such file or directory)
>         at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
> ...
>
> =====================================================================================
>
> I see this artifact directory on the server where beam pipeline is
> executed but not on worker node.
>
> =====================================================================================
> # Beam server
> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
> /tmp/artifactsfkyik3us
> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>
> # Flink worker node
> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>
>  
> =====================================================================================
>
> From the error, it seems that container is not starting up correctly due
> to manifest file is missing.
> What would be a good approach to reference artifact directory from worker
> node?
> I appreciate if I could get some advice .
>
> Best Regards,
> Yu Watanabe
>
> --
> Yu Watanabe
> Weekend Freelancer who loves to challenge building data platform
> [email protected]
> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
> Twitter icon] <https://twitter.com/yuwtennis>
>

Reply via email to