Hello Kyle. Thank you for the reply and happy new year for you too !
Yes. I got it working by sharing all container network with host kernel . My docker compose is below. https://github.com/yuwtennis/beam-deployment/blob/master/flink-session-cluster/docker/docker-compose.yml I confirmed it by writing data to file on gcs. ==================================================================== with beam.Pipeline(options=options) as p: lines = ( p | beam.Create(['Hello World.', 'Apache beam']) ) # Write to GCS ( lines | WriteToText('gs://{}/sample.txt'.format(project_id)) ) ==================================================================== https://github.com/yuwtennis/beam-deployment/blob/master/flink-session-cluster/docker/samples/src/sample.py RESULT ==================================================================== $ gsutil cat gs://${PROJECT_ID}/sample.txt-00000-of-00001 Hello World. Apache beam ==================================================================== Thanks, Yu Watanabe On Fri, Jan 3, 2020 at 5:58 AM Kyle Weaver <[email protected]> wrote: > This is the root cause: > > > python-sdk_1 | 2019/12/31 02:59:45 Failed to obtain provisioning > > information: failed to dial server at localhost:45759 > > The Flink task manager and Beam SDK harness use connections over > `localhost` to communicate. > > You will have to put `taskmanager` and `python-sdk` on the same host. > Maybe you can try using `--networking=host` so they will share the > namespace. https://docs.docker.com/network/host/ > > Happy new year! > > Kyle > > On Mon, Dec 30, 2019 at 7:21 PM Yu Watanabe <[email protected]> wrote: > >> Hello . >> >> I would like to ask question about the error I am facing with worker >> pool of sdk container. >> I get below error when I run the pipeline. >> >> ---------------------------------------------------------------------------------------- >> python-sdk_1 | 2019/12/31 02:57:26 Starting worker pool 1: python -m >> apache_beam.runners.worker.worker_pool_main --service_port=50000 >> --container_executable=/opt/apache/beam/boot >> python-sdk_1 | INFO:root:Started worker pool servicer at port: >> localhost:50000 with executable: /opt/apache/beam/boot >> python-sdk_1 | WARNING:root:Starting worker with command >> ['/opt/apache/beam/boot', '--id=1-1', >> '--logging_endpoint=localhost:35615', >> '--artifact_endpoint=localhost:42723', >> '--provision_endpoint=localhost:45759', >> '--control_endpoint=localhost:43185'] >> python-sdk_1 | 2019/12/31 02:57:45 Initializing python harness: >> /opt/apache/beam/boot --id=1-1 --logging_endpoint=localhost:35615 >> --artifact_endpoint=localhost:42723 >> --provision_endpoint=localhost:45759 >> --control_endpoint=localhost:43185 >> python-sdk_1 | 2019/12/31 02:59:45 Failed to obtain provisioning >> information: failed to dial server at localhost:45759 >> python-sdk_1 | caused by: >> python-sdk_1 | context deadline exceeded >> >> ---------------------------------------------------------------------------------------- >> >> In flink taskmanager's log , it keeps waiting for response from sdk >> container. >> >> ---------------------------------------------------------------------------------------- >> taskmanager_1 | 2019-12-31 02:57:45,445 INFO >> org.apache.flink.configuration.GlobalConfiguration - >> Loading configuration property: query.server.port, 6125 >> taskmanager_1 | 2019-12-31 02:58:26,678 INFO >> org.apache.beam.runners.fnexecution.environment.ExternalEnvironmentFactory >> - Still waiting for startup of environment from python-sdk:50000 for >> worker id 1-1 >> >> ---------------------------------------------------------------------------------------- >> >> Looking at flink jobmanager's log, error is logged after starting map >> transform. >> So looks like, request from taskmanager is reached to sdk conatiner >> but not processed correctly. >> Sounds like I am missing some setting for sdk container.. >> >> ---------------------------------------------------------------------------------------- >> jobmanager_1 | 2019-12-31 02:57:44,987 INFO >> org.apache.flink.runtime.executiongraph.ExecutionGraph - >> DataSource (Impulse) (1/1) (4e6f68fd31bafa066b740943bc3ea736) switched >> from RUNNING to FINISHED. >> jobmanager_1 | 2019-12-31 02:57:44,989 INFO >> org.apache.flink.runtime.executiongraph.ExecutionGraph - >> MapPartition (MapPartition at [2]Create/{FlatMap(<lambda at >> core.py:2468>), Map(decode)}) (1/1) (067ed452ebd15c1175ecde0ae40e8ac7) >> switched from DEPLOYING to RUNNING. >> >> ---------------------------------------------------------------------------------------- >> >> Command line for building sdk container is >> >> ---------------------------------------------------------------------------------------- >> ./gradlew :sdks:python:container:py37:dockerPush >> -Pdocker-repository-root=${GCR_HOSTNAME}/${PROJECT_ID} >> -Pdocker-tag=release-2.16.0 >> >> ---------------------------------------------------------------------------------------- >> >> My docker compose >> >> https://github.com/yuwtennis/beam-deployment/blob/master/flink-session-cluster/docker/docker-compose.yml >> >> My pipeline code >> >> https://github.com/yuwtennis/beam-deployment/blob/master/flink-session-cluster/docker/samples/src/sample.py >> >> Would there be any settings I need to use for starting up sdk container ? >> >> Best Regards, >> Yu Watanabe >> >> -- >> Yu Watanabe >> [email protected] >> > -- Yu Watanabe [email protected] [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: Twitter icon] <https://twitter.com/yuwtennis>
