Flink job server does not have artifacts-dir option yet. We have a PR to add it https://github.com/apache/beam/pull/9648
However, for now you can do a few manual steps to achieve this. Start Job server. 1. Download https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar 2. Start the job server java -jar /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081 --artifacts-dir /home/ec2-user --job-port 8099 3. Submit your pipeline options = PipelineOptions([ "--runner=PortableRunner", "--environment_config= asia.gcr.io/creationline001/beam/python3:latest", "--environment_type=DOCKER", "--experiments=beam_fn_api" ]) On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <[email protected]> wrote: > Kyle. > > Thank you for the assistance. > > Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of the > pipline. > ==================================================================== > WARNING:root:Discarding unparseable args: ['--flink_version=1.8', > '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081', > '--artifacts-dir=/home/ec2-user'] > ... > WARNING:root:Downloading job server jar from > https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar > DEBUG:root:Starting job service with ['java', '-jar', > '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar', > '--flink-master-url', > 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir', > '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0, > '--expansion-port', 0] > DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757. > ... > DEBUG:root:Runner option 'job_endpoint' was already added > DEBUG:root:Runner option 'sdk_worker_parallelism' was already added > WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin'] > ... > ==================================================================== > > My pipeline option . > ==================================================================== > options = PipelineOptions([ > "--runner=FlinkRunner", > "--flink_version=1.8", > > "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081" > , > "--environment_config= > asia.gcr.io/creationline001/beam/python3:latest", > "--environment_type=DOCKER", > "--experiments=beam_fn_api", > "--artifacts-dir=/home/admin" > ]) > ==================================================================== > > Tracing from the log , looks like artifacts dir respects the default > tempdir of OS. > Thus, to adjust it I will need environment variable instead. I used > 'TMPDIR' in my case. > ==================================================================== > > https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203 > artifacts_dir = self.local_temp_dir(prefix='artifacts') > > => > > https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157 > return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs) > > => > > https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102 > self._local_temp_root = None > ==================================================================== > > Then it worked. > ==================================================================== > (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR > TMPDIR=/home/admin > > => > DEBUG:root:Starting job service with ['java', '-jar', > '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar', > '--flink-master-url', > 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir', > '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0, > '--expansion-port', 0] > ==================================================================== > > I will use nfs server for now to share the artifact directory with worker > nodes. > > Thanks. > Yu Watanabe > > On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <[email protected]> wrote: > >> The relevant configuration flag for the job server is `--artifacts-dir`. >> >> @Robert Bradshaw <[email protected]> I added this info to the log >> message: https://github.com/apache/beam/pull/9646 >> >> Kyle Weaver | Software Engineer | github.com/ibzib | [email protected] >> >> >> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <[email protected]> >> wrote: >> >>> You need to set your artifact directory to point to a >>> distributed filesystem that's also accessible to the workers (when starting >>> up the job server). >>> >>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <[email protected]> >>> wrote: >>> >>>> Hello. >>>> >>>> I am working on flink runner (2.15.0) and would like to ask question >>>> about how to solve my error. >>>> >>>> Currently , I have a remote cluster deployed as below . (please see >>>> slide1) >>>> All master and worker nodes are installed on different server from >>>> apache beam. >>>> >>>> >>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing >>>> >>>> When I run beam pipeline, harness container tries to start up, however, >>>> fails immediately with below error on docker side. >>>> >>>> ===================================================================================== >>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05 >>>> Initializing python harness: /opt/apache/beam/boot --id=1 >>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303 >>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869 >>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd: >>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event >>>> module=libcontainerd namespace=moby topic=/tasks/start >>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05 >>>> Failed to retrieve staged files: failed to get manifest >>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by: >>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code = >>>> Unknown desc = >>>> >>>> ===================================================================================== >>>> >>>> At the same time, task manager logs below error. >>>> >>>> ===================================================================================== >>>> 2019-09-23 21:04:05,525 INFO >>>> >>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>> - GetManifest for >>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>> 2019-09-23 21:04:05,526 INFO >>>> >>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>> - Loading manifest for retrieval token >>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>> 2019-09-23 21:04:05,531 INFO >>>> >>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>> - GetManifest for >>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>> failed >>>> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: >>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>> (No such file or directory) >>>> at >>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) >>>> ... >>>> >>>> ===================================================================================== >>>> >>>> I see this artifact directory on the server where beam pipeline is >>>> executed but not on worker node. >>>> >>>> ===================================================================================== >>>> # Beam server >>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld >>>> /tmp/artifactsfkyik3us >>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us >>>> >>>> # Flink worker node >>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us >>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory >>>> >>>> >>>> ===================================================================================== >>>> >>>> From the error, it seems that container is not starting up correctly >>>> due to manifest file is missing. >>>> What would be a good approach to reference artifact directory from >>>> worker node? >>>> I appreciate if I could get some advice . >>>> >>>> Best Regards, >>>> Yu Watanabe >>>> >>>> -- >>>> Yu Watanabe >>>> Weekend Freelancer who loves to challenge building data platform >>>> [email protected] >>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >>>> Twitter icon] <https://twitter.com/yuwtennis> >>>> >>> > > -- > Yu Watanabe > Weekend Freelancer who loves to challenge building data platform > [email protected] > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: > Twitter icon] <https://twitter.com/yuwtennis> >
