We updated from beam 2.18.0 to 2.24.0 and have been having issues using the
python ReadFromPubSub external transform in flink 1.10. It seems like it
starts up just fine, but it doesn’t consume any messages.

I tried to reduce it to a simple example and tested back to beam 2.22.0 but
have gotten the same results (of no messages being read).

I have a hard time believing that it’s been broken for so many versions but
I can’t seem to identify what I’ve done wrong.

Steps to test:

1) Spin up the expansion service

docker run -d --rm --network host apache/beam_flink1.10_job_server:2.24.0

2) Create a simple pipeline using
apache_beam.io.external.gcp.pubsub.ReadFromPubSub
Here’s mine:
https://gist.github.com/sambvfx/a8582f4805e468a97331b0eb13911ebf

3) Run the pipeline

python -m pubsub_example --runner=FlinkRunner --save_main_session
--flink_submit_uber_jar --environment_type=DOCKER
--environment_config=apache/beam_python3.7_sdk:2.24.0
--checkpointing_interval=10000 --streaming

4) Emit a few pubsub messages

python -m pubsub_example --msg hello
python -m pubsub_example --msg world

What am I missing here?
-Sam
------------------------------

Some debugging things I’ve tried:

   - I can run ReadFromPubSub (the DirectRunner version just fine).
   - I confirmed that the gcloud credentials make it into the java sdk
   container that spins up. Without these you get credential type errors.
   - I modified the java DockerEnvironmentFactory to instead mount
   GOOGLE_APPLICATION_CREDENTIALS service account .json and set the env var.
   - I’ve tried a variety of different flink flags.

Reply via email to