Ankur. Thank you for the comment.
Manual start up with job server looks more solid. Setting TMPDIR then using Flink Runner sounded bit hacky . Thanks, Yu On Tue, Sep 24, 2019 at 2:06 PM Ankur Goenka <[email protected]> wrote: > Flink job server does not have artifacts-dir option yet. > We have a PR to add it https://github.com/apache/beam/pull/9648 > > However, for now you can do a few manual steps to achieve this. > > Start Job server. > > 1. Download > https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar > > 2. Start the job server > java -jar > /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar > --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081 > --artifacts-dir /home/ec2-user --job-port 8099 > > 3. Submit your pipeline > options = PipelineOptions([ > "--runner=PortableRunner", > "--environment_config= > asia.gcr.io/creationline001/beam/python3:latest", > "--environment_type=DOCKER", > "--experiments=beam_fn_api" > ]) > > On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <[email protected]> wrote: > >> Kyle. >> >> Thank you for the assistance. >> >> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of the >> pipline. >> ==================================================================== >> WARNING:root:Discarding unparseable args: ['--flink_version=1.8', >> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081', >> '--artifacts-dir=/home/ec2-user'] >> ... >> WARNING:root:Downloading job server jar from >> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar >> DEBUG:root:Starting job service with ['java', '-jar', >> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar', >> '--flink-master-url', >> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir', >> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0, >> '--expansion-port', 0] >> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757. >> ... >> DEBUG:root:Runner option 'job_endpoint' was already added >> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added >> WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin'] >> ... >> ==================================================================== >> >> My pipeline option . >> ==================================================================== >> options = PipelineOptions([ >> "--runner=FlinkRunner", >> "--flink_version=1.8", >> >> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081" >> , >> "--environment_config= >> asia.gcr.io/creationline001/beam/python3:latest", >> "--environment_type=DOCKER", >> "--experiments=beam_fn_api", >> "--artifacts-dir=/home/admin" >> ]) >> ==================================================================== >> >> Tracing from the log , looks like artifacts dir respects the default >> tempdir of OS. >> Thus, to adjust it I will need environment variable instead. I used >> 'TMPDIR' in my case. >> ==================================================================== >> >> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203 >> artifacts_dir = self.local_temp_dir(prefix='artifacts') >> >> => >> >> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157 >> return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs) >> >> => >> >> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102 >> self._local_temp_root = None >> ==================================================================== >> >> Then it worked. >> ==================================================================== >> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR >> TMPDIR=/home/admin >> >> => >> DEBUG:root:Starting job service with ['java', '-jar', >> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar', >> '--flink-master-url', >> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir', >> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0, >> '--expansion-port', 0] >> ==================================================================== >> >> I will use nfs server for now to share the artifact directory with worker >> nodes. >> >> Thanks. >> Yu Watanabe >> >> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <[email protected]> wrote: >> >>> The relevant configuration flag for the job server is `--artifacts-dir`. >>> >>> @Robert Bradshaw <[email protected]> I added this info to the log >>> message: https://github.com/apache/beam/pull/9646 >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | [email protected] >>> >>> >>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <[email protected]> >>> wrote: >>> >>>> You need to set your artifact directory to point to a >>>> distributed filesystem that's also accessible to the workers (when starting >>>> up the job server). >>>> >>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <[email protected]> >>>> wrote: >>>> >>>>> Hello. >>>>> >>>>> I am working on flink runner (2.15.0) and would like to ask question >>>>> about how to solve my error. >>>>> >>>>> Currently , I have a remote cluster deployed as below . (please see >>>>> slide1) >>>>> All master and worker nodes are installed on different server from >>>>> apache beam. >>>>> >>>>> >>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing >>>>> >>>>> When I run beam pipeline, harness container tries to start up, >>>>> however, fails immediately with below error on docker side. >>>>> >>>>> ===================================================================================== >>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 >>>>> 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1 >>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303 >>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869 >>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd: >>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event >>>>> module=libcontainerd namespace=moby topic=/tasks/start >>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 >>>>> 12:04:05 Failed to retrieve staged files: failed to get manifest >>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by: >>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code = >>>>> Unknown desc = >>>>> >>>>> ===================================================================================== >>>>> >>>>> At the same time, task manager logs below error. >>>>> >>>>> ===================================================================================== >>>>> 2019-09-23 21:04:05,525 INFO >>>>> >>>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>>> - GetManifest for >>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>> 2019-09-23 21:04:05,526 INFO >>>>> >>>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>>> - Loading manifest for retrieval token >>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>> 2019-09-23 21:04:05,531 INFO >>>>> >>>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>>> - GetManifest for >>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>> failed >>>>> java.util.concurrent.ExecutionException: >>>>> java.io.FileNotFoundException: >>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>> (No such file or directory) >>>>> at >>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) >>>>> ... >>>>> >>>>> ===================================================================================== >>>>> >>>>> I see this artifact directory on the server where beam pipeline is >>>>> executed but not on worker node. >>>>> >>>>> ===================================================================================== >>>>> # Beam server >>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld >>>>> /tmp/artifactsfkyik3us >>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us >>>>> >>>>> # Flink worker node >>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us >>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory >>>>> >>>>> >>>>> ===================================================================================== >>>>> >>>>> From the error, it seems that container is not starting up correctly >>>>> due to manifest file is missing. >>>>> What would be a good approach to reference artifact directory from >>>>> worker node? >>>>> I appreciate if I could get some advice . >>>>> >>>>> Best Regards, >>>>> Yu Watanabe >>>>> >>>>> -- >>>>> Yu Watanabe >>>>> Weekend Freelancer who loves to challenge building data platform >>>>> [email protected] >>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >>>>> Twitter icon] <https://twitter.com/yuwtennis> >>>>> >>>> >> >> -- >> Yu Watanabe >> Weekend Freelancer who loves to challenge building data platform >> [email protected] >> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >> Twitter icon] <https://twitter.com/yuwtennis> >> > -- Yu Watanabe Weekend Freelancer who loves to challenge building data platform [email protected] [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: Twitter icon] <https://twitter.com/yuwtennis>
