Hi, I am trying to run beam on a small spark cluster. I setup spark (master plus one slave). I am using the portable runner and invoke the beam pipeline with:
python -m apache_beam.examples.wordcount gs://datapipeline-output/shakespeare-alls-11.txt --output gs://datapipeline-output/output/ --project august-ascent-325423 --runner PortableRunner --job_endpoint=localhost:8099 --environment_type=DOCKER I always get an error: Caused by: java.util.concurrent.TimeoutException: Timed out while waiting for command 'docker run -d --network=host --env=DOCKER_MAC_CONTAINER=null apache/beam_python3.8_sdk:2.32.0 --id=4-1 --provision_endpoint=localhost:46757' It takes ~2.5 minutes to pull the beam image which should be enough. But I pulled the image manually (docker pull apache/beam_python3.8_sdk:2.32.0) and then tried to run the pipeline again. Now, when I run the pipeline I get an error: java.io.FileNotFoundException: /tmp/beam-artifact-staging/60321f712323c195764ab31b3e205b228a405fbb80b50fafa67b38b21959c63f/1-ref_Environment_default_e-pickled_main_session (No such file or directory) and then further down ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: No container running for id 7014a9ea98dc0b3f453a9d3860aff43ba42214195d2240d7cefcefcfabf93879 (here is the full strack trace: https://drive.google.com/file/d/1mRzt8G7I9Akkya48KfAbrqPp8wRCzXDe/view) Any pointer or idea is appreciated (sorry, if this is something obvious - I'm still pretty new to beam/spark). Thanks Mark
