Thanks Dian, that seemed to do the trick. I built a Docker Image simply using:

FROM apache/beam_python3.8_sdk:2.31.0
COPY flink_data/input.txt .

I specified nothing in the Pipeline options other than:

"--runner=FlinkRunner",
"--flink_master=localhost:8081",
"--environment_type=DOCKER",
"--environment_config=xor/beam_worker:latest"

This seemed to do the trick!

I’m curious if you have any recommendations on the typical strategies for 
supplying artifacts like this at runtime, rather than having to build an image 
for jobs? I think ideally we have a pipeline consistently running somewhere and 
we can submit data―let’s say CSV files―and the pipeline will process them as a 
job is kicked off. Given how much discussion I’ve seen around Kafka, I imagine 
this is done with some sort of distributed messaging framework.

If anyone can offer suggestions or resources, I’d very much appreciate it!

Thanks again for the help!

Best,
Adam


From: Dian Fu <dian0511...@gmail.com>
Date: Friday, September 3, 2021 at 2:10 AM
To: Adam Pearce <adam.pea...@xorsecurity.com>
Cc: user@flink.apache.org <user@flink.apache.org>
Subject: [EXTERNAL]Re: [Question] Basic Python examples.wordcount on local 
FlinkRunner
This seems more like a Beam issue although it uses Flink runner. It would be 
helpful to also send it to the Beam user mailing list.

Regarding to this issue itself, could you check is input.txt accessible in the 
Docker container?

Regards,
Dian


2021年9月3日 上午5:19,Adam Pearce 
<adam.pea...@xorsecurity.com<mailto:adam.pea...@xorsecurity.com>> 写道:

Hello all,

I’m attempting to run a simple, minimally viable example of a Beam pipeline on 
Flink. I have installed flink-1.13.2 following the setup instructions, and have 
successfully run the server and navigated to localhost:8081 to view the Web UI. 
I can see the job successfully submitted and running. It runs, and completes, 
but the output is not appropriate and the final output never occurs. I have 
enabled DEBUG logging for further output, but I don’t really see anything that 
would indicate issues other than what I am showing below. I’ve attached the 
complete log.

I am running the following from a Python 3.8.12 virtualenv with apache-beam 
2.31.0 installed via pip:

python -m apache_beam.examples.wordcount --input input.txt --output counts 
--runner FlinkRunner --flink_master="localhost:8081" [--flink_submit_uber_jar]

I have tried with and without “--flink_submit_uber_jar” without any change. The 
local embedded run of this pipeline works (same command as above, omitting 
“flink_master”. I understand that running this through the Flink server will 
use docker to stand up containers to perform the work. It appears to be 
successfully pulling the image: “apache/beam_python3.8_sdk:2.31.0”. I’m curious 
if there is some issue with the Python container, because in the logs, I am 
seeing:

2021/09/02 20:57:54 Initializing python harness: /opt/apache/beam/boot --id=5-1 
--provision_endpoint=host.docker.internal:55847
2021/09/02 20:57:54 Downloaded: /tmp/staged/pickled_main_session 
(sha256:4e9a1199bade55ad73ae6872c8f156c69227ef23d8155f19dead745264999084, size: 
3029)
2021/09/02 20:57:54 Found artifact: pickled_main_session
2021/09/02 20:57:54 Installing setup packages ...
2021/09/02 20:57:54 Executing: python -m 
apache_beam.runners.worker.sdk_worker_main
2021/09/02 20:57:55 Python exited: <nil>

I’ve searched high and low for that “Python exited: <nil>” but have found very 
little.

Further context:
Operating system: macOS 11.5.2
Docker: Docker version 20.10.8, build 3967b7d
Python: 3.8.12 (virtualenv)
Beam: 2.31.0
Flink: 1.13.2
This communication is the property of XOR Security and may contain confidential 
or privileged information. Unauthorized use of this communication is strictly 
prohibited and may be unlawful. If you have received this communication in 
error, please immediately notify the sender by reply e-mail and destroy all 
copies of the communication and any attachments. 
<flink-root-standalonesession-3-MacBook-Pro.local.log><flink-root-taskexecutor-3-MacBook-Pro.local.log>

This communication is the property of XOR Security and may contain confidential 
or privileged information. Unauthorized use of this communication is strictly 
prohibited and may be unlawful. If you have received this communication in 
error, please immediately notify the sender by reply e-mail and destroy all 
copies of the communication and any attachments.

Reply via email to