https://issues.apache.org/jira/browse/BEAM-8312 should help with this. In summary, I would say that portable beam-on-Flink is basically feature complete, but we're still smoothing out some of the ease-of-use issues. And feedback like this really helps, so thanks!
On Tue, Sep 24, 2019 at 8:10 PM Yu Watanabe <[email protected]> wrote: > I needed little adjustment with the pipeline option to work it out. > > Pipeline option required 'job_endpoint' when using manual start up of job > server using jar file. > ========================================================= > options = PipelineOptions([ > "--runner=PortableRunner", > "--environment_config= > asia.gcr.io/creationline001/beam/python3:latest", > "--environment_type=DOCKER", > "--experiments=beam_fn_api", > "--job_endpoint=localhost:8099" > ]) > ========================================================= > > Otherwise, runner spins up job server container. > ========================================================== > ERROR:root:Starting job service with ['docker', 'run', '-v', > '/usr/bin/docker:/bin/docker', '-v', > '/var/run/docker.sock:/var/run/docker.sock', '--network=host', ' > admin-docker-apache.bintray.io/beam/flink-job-server:latest', > '--job-host', 'localhost', '--job-port', '42915', '--artifact-port', > '56561', '--expansion-port', '53857'] > ERROR:root:Error bringing up job service > ========================================================== > > Without "job_endpoint" option default job server is used. > ========================================================== > > https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/portable_runner.py#L176 > server = self.default_job_server(options) > > => > > https://github.com/apache/beam/blob/d963aeb91a63f165b5ff1ebf6add8275aec204f1/sdks/python/apache_beam/runners/portability/job_server.py#L172 > "-docker-apache.bintray.io/beam/flink-job-server:latest" > ========================================================== > > Thanks, > Yu > > On Tue, Sep 24, 2019 at 2:06 PM Ankur Goenka <[email protected]> wrote: > >> Flink job server does not have artifacts-dir option yet. >> We have a PR to add it https://github.com/apache/beam/pull/9648 >> >> However, for now you can do a few manual steps to achieve this. >> >> Start Job server. >> >> 1. Download >> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar >> >> 2. Start the job server >> java -jar >> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar >> --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081 >> --artifacts-dir /home/ec2-user --job-port 8099 >> >> 3. Submit your pipeline >> options = PipelineOptions([ >> "--runner=PortableRunner", >> "--environment_config= >> asia.gcr.io/creationline001/beam/python3:latest", >> "--environment_type=DOCKER", >> "--experiments=beam_fn_api" >> ]) >> >> On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <[email protected]> >> wrote: >> >>> Kyle. >>> >>> Thank you for the assistance. >>> >>> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of >>> the pipline. >>> ==================================================================== >>> WARNING:root:Discarding unparseable args: ['--flink_version=1.8', >>> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081', >>> '--artifacts-dir=/home/ec2-user'] >>> ... >>> WARNING:root:Downloading job server jar from >>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar >>> DEBUG:root:Starting job service with ['java', '-jar', >>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar', >>> '--flink-master-url', >>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir', >>> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0, >>> '--expansion-port', 0] >>> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757. >>> ... >>> DEBUG:root:Runner option 'job_endpoint' was already added >>> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added >>> WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin'] >>> ... >>> ==================================================================== >>> >>> My pipeline option . >>> ==================================================================== >>> options = PipelineOptions([ >>> "--runner=FlinkRunner", >>> "--flink_version=1.8", >>> >>> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081" >>> , >>> "--environment_config= >>> asia.gcr.io/creationline001/beam/python3:latest", >>> "--environment_type=DOCKER", >>> "--experiments=beam_fn_api", >>> "--artifacts-dir=/home/admin" >>> ]) >>> ==================================================================== >>> >>> Tracing from the log , looks like artifacts dir respects the default >>> tempdir of OS. >>> Thus, to adjust it I will need environment variable instead. I used >>> 'TMPDIR' in my case. >>> ==================================================================== >>> >>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203 >>> artifacts_dir = self.local_temp_dir(prefix='artifacts') >>> >>> => >>> >>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157 >>> return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs) >>> >>> => >>> >>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102 >>> self._local_temp_root = None >>> ==================================================================== >>> >>> Then it worked. >>> ==================================================================== >>> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR >>> TMPDIR=/home/admin >>> >>> => >>> DEBUG:root:Starting job service with ['java', '-jar', >>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar', >>> '--flink-master-url', >>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir', >>> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0, >>> '--expansion-port', 0] >>> ==================================================================== >>> >>> I will use nfs server for now to share the artifact directory with >>> worker nodes. >>> >>> Thanks. >>> Yu Watanabe >>> >>> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <[email protected]> wrote: >>> >>>> The relevant configuration flag for the job server is `--artifacts-dir`. >>>> >>>> @Robert Bradshaw <[email protected]> I added this info to the log >>>> message: https://github.com/apache/beam/pull/9646 >>>> >>>> Kyle Weaver | Software Engineer | github.com/ibzib | >>>> [email protected] >>>> >>>> >>>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >>>>> You need to set your artifact directory to point to a >>>>> distributed filesystem that's also accessible to the workers (when >>>>> starting >>>>> up the job server). >>>>> >>>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello. >>>>>> >>>>>> I am working on flink runner (2.15.0) and would like to ask question >>>>>> about how to solve my error. >>>>>> >>>>>> Currently , I have a remote cluster deployed as below . (please see >>>>>> slide1) >>>>>> All master and worker nodes are installed on different server from >>>>>> apache beam. >>>>>> >>>>>> >>>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing >>>>>> >>>>>> When I run beam pipeline, harness container tries to start up, >>>>>> however, fails immediately with below error on docker side. >>>>>> >>>>>> ===================================================================================== >>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 >>>>>> 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1 >>>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303 >>>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869 >>>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd: >>>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event >>>>>> module=libcontainerd namespace=moby topic=/tasks/start >>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 >>>>>> 12:04:05 Failed to retrieve staged files: failed to get manifest >>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by: >>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code = >>>>>> Unknown desc = >>>>>> >>>>>> ===================================================================================== >>>>>> >>>>>> At the same time, task manager logs below error. >>>>>> >>>>>> ===================================================================================== >>>>>> 2019-09-23 21:04:05,525 INFO >>>>>> >>>>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>>>> - GetManifest for >>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>>> 2019-09-23 21:04:05,526 INFO >>>>>> >>>>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>>>> - Loading manifest for retrieval token >>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>>> 2019-09-23 21:04:05,531 INFO >>>>>> >>>>>> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService >>>>>> - GetManifest for >>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>>> failed >>>>>> java.util.concurrent.ExecutionException: >>>>>> java.io.FileNotFoundException: >>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST >>>>>> (No such file or directory) >>>>>> at >>>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) >>>>>> ... >>>>>> >>>>>> ===================================================================================== >>>>>> >>>>>> I see this artifact directory on the server where beam pipeline is >>>>>> executed but not on worker node. >>>>>> >>>>>> ===================================================================================== >>>>>> # Beam server >>>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld >>>>>> /tmp/artifactsfkyik3us >>>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us >>>>>> >>>>>> # Flink worker node >>>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us >>>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory >>>>>> >>>>>> >>>>>> ===================================================================================== >>>>>> >>>>>> From the error, it seems that container is not starting up correctly >>>>>> due to manifest file is missing. >>>>>> What would be a good approach to reference artifact directory from >>>>>> worker node? >>>>>> I appreciate if I could get some advice . >>>>>> >>>>>> Best Regards, >>>>>> Yu Watanabe >>>>>> >>>>>> -- >>>>>> Yu Watanabe >>>>>> Weekend Freelancer who loves to challenge building data platform >>>>>> [email protected] >>>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >>>>>> Twitter icon] <https://twitter.com/yuwtennis> >>>>>> >>>>> >>> >>> -- >>> Yu Watanabe >>> Weekend Freelancer who loves to challenge building data platform >>> [email protected] >>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >>> Twitter icon] <https://twitter.com/yuwtennis> >>> >> > > -- > Yu Watanabe > Weekend Freelancer who loves to challenge building data platform > [email protected] > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: > Twitter icon] <https://twitter.com/yuwtennis> >
