Hi,

I am trying to run beam on a small spark cluster. I setup spark (master
plus one slave). I am using the portable runner and invoke the beam
pipeline with:

python -m apache_beam.examples.wordcount
 gs://datapipeline-output/shakespeare-alls-11.txt --output
gs://datapipeline-output/output/   --project august-ascent-325423 --runner
PortableRunner --job_endpoint=localhost:8099 --environment_type=DOCKER

I always get an error:
Caused by: java.util.concurrent.TimeoutException: Timed out while waiting
for command 'docker run -d --network=host --env=DOCKER_MAC_CONTAINER=null
apache/beam_python3.8_sdk:2.32.0 --id=4-1
--provision_endpoint=localhost:46757'

It takes ~2.5 minutes to pull the beam image which should be enough. But I
pulled the image manually (docker pull apache/beam_python3.8_sdk:2.32.0)
and then tried to run the pipeline again.

Now, when I run the pipeline I get an error:
java.io.FileNotFoundException:
/tmp/beam-artifact-staging/60321f712323c195764ab31b3e205b228a405fbb80b50fafa67b38b21959c63f/1-ref_Environment_default_e-pickled_main_session
(No such file or directory)

and then further down

ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.IllegalStateException: No container running for id
7014a9ea98dc0b3f453a9d3860aff43ba42214195d2240d7cefcefcfabf93879

(here is the full strack trace:
https://drive.google.com/file/d/1mRzt8G7I9Akkya48KfAbrqPp8wRCzXDe/view)

Any pointer or idea is appreciated (sorry, if this is something obvious -
I'm still pretty new to beam/spark).

Thanks
      Mark

Reply via email to