We updated from beam 2.18.0 to 2.24.0 and have been having issues using the python ReadFromPubSub external transform in flink 1.10. It seems like it starts up just fine, but it doesn’t consume any messages.
I tried to reduce it to a simple example and tested back to beam 2.22.0 but have gotten the same results (of no messages being read). I have a hard time believing that it’s been broken for so many versions but I can’t seem to identify what I’ve done wrong. Steps to test: 1) Spin up the expansion service docker run -d --rm --network host apache/beam_flink1.10_job_server:2.24.0 2) Create a simple pipeline using apache_beam.io.external.gcp.pubsub.ReadFromPubSub Here’s mine: https://gist.github.com/sambvfx/a8582f4805e468a97331b0eb13911ebf 3) Run the pipeline python -m pubsub_example --runner=FlinkRunner --save_main_session --flink_submit_uber_jar --environment_type=DOCKER --environment_config=apache/beam_python3.7_sdk:2.24.0 --checkpointing_interval=10000 --streaming 4) Emit a few pubsub messages python -m pubsub_example --msg hello python -m pubsub_example --msg world What am I missing here? -Sam ------------------------------ Some debugging things I’ve tried: - I can run ReadFromPubSub (the DirectRunner version just fine). - I confirmed that the gcloud credentials make it into the java sdk container that spins up. Without these you get credential type errors. - I modified the java DockerEnvironmentFactory to instead mount GOOGLE_APPLICATION_CREDENTIALS service account .json and set the env var. - I’ve tried a variety of different flink flags.
