Hi Levan,

For your setup1 & 2, it looks like the python environment is not ready.
Have you tried python -m pip install apache-flink for the first 2 setups?
For your setup3, as you are trying to use `flink run ...` command, it will
try to connect to a launched flink cluster but I guess you did not launch
the flink cluster. You can do `start-cluster.sh` first to launch a
standalone flink cluster and then try the `flink run ...` command.
For your setup4, the reason why it works well is that it will use the
default mini cluster to run the pyflink job. So even you haven't started a
standalone cluster, it can work as well.

Best,
Biao Geng

Levan Huyen <lvhu...@gmail.com> 于2022年10月19日周三 17:07写道:

> Hi,
>
> I'm new to PyFlink, and I couldn't run a basic example that shipped with
> Flink.
> This is the command I tried:
>
> ./bin/flink run -py examples/python/datastream/word_count.py
>
> Here below are the results I got with different setups:
>
> 1. On AWS EMR 6.8.0 (Flink 1.15.1):
> *Error: No module named 'google'*I tried with the Flink shipped with EMR,
> or the binary v1.15.1/v1.15.2 downloaded from Flink site. I got that same
> error message in all cases.
>
> Traceback (most recent call last):
>
>   File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
>
>     "__main__", mod_spec)
>
>   File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
>
>     exec(code, run_globals)
>
>   File
> "/tmp/pyflink/622f19cc-6616-45ba-b7cb-20a1462c4c3f/a29eb7a4-fff1-4d98-9b38-56b25f97b70a/word_count.py",
> line 134, in <module>
>
>     word_count(known_args.input, known_args.output)
>
>   File
> "/tmp/pyflink/622f19cc-6616-45ba-b7cb-20a1462c4c3f/a29eb7a4-fff1-4d98-9b38-56b25f97b70a/word_count.py",
> line 89, in word_count
>
>     ds = ds.flat_map(split) \
>
>   File
> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
> line 333, in flat_map
>
>   File
> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
> line 557, in process
>
>   File
> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py",
> line 23, in <module>
>
> ModuleNotFoundError: No module named 'google'
>
> org.apache.flink.client.program.ProgramAbortException:
> java.lang.RuntimeException: Python process exits with code: 1
>
> 2. On my Mac, without a virtual environment (so `*-pyclientexec python3*`
> is included in the run command): got the same error as with EMR, but
> there's a stdout line from `*print()*` in the Python script
>
>   File
> "/Users/lvh/dev/tmp/flink-1.15.2/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
> line 557, in process
>
>   File "<frozen zipimport>", line 259, in load_module
>
>   File
> "/Users/lvh/dev/tmp/flink-1.15.2/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py",
> line 23, in <module>
>
> ModuleNotFoundError: No module named 'google'
>
> Executing word_count example with default input data set.
>
> Use --input to specify file input.
>
> org.apache.flink.client.program.ProgramAbortException:
> java.lang.RuntimeException: Python process exits with code: 1
>
> 3. On my Mac, with a virtual environment and Python package `
> *apache-flink`* installed: Flink tried to connect to localhost:8081 (I
> don't know why), and failed with 'connection refused'.
>
> Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException:
> Could not complete the operation. Number of retries has been exhausted.
>
> at
> org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:395)
>
> ... 21 more
>
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> Connection refused: localhost/127.0.0.1:8081
>
>  4. If I run that same example job using Python: `*python word_count.py*`
> then it runs well.
>
> I tried with both v1.15.2 and 1.15.1, Python 3.7 & 3.8, and got the same
> result.
>
> Could someone please help?
>
> Thanks.
>

Reply via email to