Hello,

I'm having quite a bit of trouble running pyflink from the default flink
distribution tarballs.  I'd expect the python examples to work as long as
python is installed, and we've got the distribution.  Some python
dependencies are not included in the flink distribution tarballs:
cloudpickle, py4j and pyflink are in opt/python.  Others are not, e.g.
protobuf.

Now that I'm looking, I see that the pyflink installation instructions
<https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/>
are
to install via pip.

I'm doing this in Docker for use with the flink-kubernetes-operator.  In
the Using Flink Python on Docker
<https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker>
instructions,
there is a pip3 install apache-flink step.  I find this strange, since I'd
expect the 'FROM flink:1.15.2'  part to be sufficient.

By pip installing apache-flink, this docker image will have the flink
distro installed at /opt/flink and FLINK_HOME set to /opt/flink
<https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
BUT ALSO flink lib jars will be installed at e.g.
/usr/local/lib/python3.7/dist-packages/pyflink/lib!
So, by following those instructions, flink is effectively installed twice
into the docker image.

Am I correct or am I missing something?

Is using pyflink from the flink distribution tarball (without pip) not a
supported way to use pyflink?

Thanks!
-Andrew Otto
 Wikimedia Foundation

Reply via email to