Hello, I'm having quite a bit of trouble running pyflink from the default flink distribution tarballs. I'd expect the python examples to work as long as python is installed, and we've got the distribution. Some python dependencies are not included in the flink distribution tarballs: cloudpickle, py4j and pyflink are in opt/python. Others are not, e.g. protobuf.
Now that I'm looking, I see that the pyflink installation instructions <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/> are to install via pip. I'm doing this in Docker for use with the flink-kubernetes-operator. In the Using Flink Python on Docker <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker> instructions, there is a pip3 install apache-flink step. I find this strange, since I'd expect the 'FROM flink:1.15.2' part to be sufficient. By pip installing apache-flink, this docker image will have the flink distro installed at /opt/flink and FLINK_HOME set to /opt/flink <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. BUT ALSO flink lib jars will be installed at e.g. /usr/local/lib/python3.7/dist-packages/pyflink/lib! So, by following those instructions, flink is effectively installed twice into the docker image. Am I correct or am I missing something? Is using pyflink from the flink distribution tarball (without pip) not a supported way to use pyflink? Thanks! -Andrew Otto Wikimedia Foundation