Great, thank you so much for your responses.  It all makes sense now. :)

On Mon, Jan 30, 2023 at 10:41 PM Dian Fu <dian0511...@gmail.com> wrote:

> >> What is the reason for including
> opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base
> distribution then?  Oh, a guess: to make it easier for TaskManagers to run
> pyflink without having pyflink installed themselves?  Somehow I'd guess
> this wouldn't work tho; I'd assume TaskManagers would also need some python
> transitive dependencies, e.g. google protobuf.
>
> It has some historical reasons. In the first version (1.9.x) which has not
> provided Python UDF support, it's not necessary to install PyFlink in the
> nodes of TaskManagers. Since 1.10 which supports Python UDF, users have to
> install PyFlink in the nodes of TaskManager as there are many transitive
> dependencies, e.g. Apache Beam、protobuf、pandas, etc. However, we have not
> removed these packages as they are still useful for client node which is
> responsible for compiling jobs(it's not necessary to install PyFlink in the
> client node).
>
> >> Since we're building our own Docker image, I'm going the other way
> around: just install pyflink, and symlink /opt/flink ->
> /usr/lib/python3.7/dist-packages/pyflink.  So far so good, but I'm
> worried that something will be fishy when trying to run JVM apps via
> pyflink.
>
> Good idea! It contains all the things necessary needed to run JVM apps in
> the PyFlink package and so I think you could just try this way.
>
> Regards,
> Dian
>
> On Mon, Jan 30, 2023 at 9:58 PM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Thanks Dian!
>>
>> > >> Is using pyflink from the flink distribution tarball (without pip)
>> not a supported way to use pyflink?
>> > You are right.
>>
>> What is the reason for including
>> opt/python/{pyflink.zip,cloudpickle.zip,py4j.zip} in the base
>> distribution then?  Oh, a guess: to make it easier for TaskManagers to run
>> pyflink without having pyflink installed themselves?  Somehow I'd guess
>> this wouldn't work tho; I'd assume TaskManagers would also need some python
>> transitive dependencies, e.g. google protobuf.
>>
>> > you could remove the JAR packages located under
>> /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
>> install apache-flink`
>>
>> Since we're building our own Docker image, I'm going the other way
>> around: just install pyflink, and symlink /opt/flink ->
>> /usr/lib/python3.7/dist-packages/pyflink.  So far so good, but I'm worried
>> that something will be fishy when trying to run JVM apps via pyflink.
>>
>> -Ao
>>
>>
>>
>> On Sun, Jan 29, 2023 at 1:43 AM Dian Fu <dian0511...@gmail.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> >> By pip installing apache-flink, this docker image will have the flink
>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
>>> BUT ALSO flink lib jars will be installed at e.g.
>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
>>> So, by following those instructions, flink is effectively installed
>>> twice into the docker image.
>>>
>>> Yes, your understanding is correct. The base image `flink:1.15.2`
>>> doesn't include PyFlink and so you need to build a custom image if you want
>>> to use PyFlink. Regarding to the jar packages which are installed twice,
>>> you could remove the JAR packages located under
>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
>>> install apache-flink`. It will use the JAR packages located under
>>> $FLINK_HOME/lib.
>>>
>>> >> Is using pyflink from the flink distribution tarball (without pip)
>>> not a supported way to use pyflink?
>>> You are right.
>>>
>>> Regards,
>>> Dian
>>>
>>>
>>> On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote:
>>>
>>>> Ah, oops and my original email had a typo:
>>>> > Some python dependencies are not included in the flink distribution
>>>> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>>>>
>>>> Should read:
>>>> > Some python dependencies ARE included in the flink distribution
>>>> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>>>>
>>>> On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org>
>>>> wrote:
>>>>
>>>>> Let me ask a related question:
>>>>>
>>>>> We are building our own base Flink docker image.  We will be deploying
>>>>> both JVM and python apps via flink-kubernetes-operator.
>>>>>
>>>>> Is there any reason not to install Flink in this image via `pip
>>>>> install apache-flink` and use it for JVM apps?
>>>>>
>>>>> -Andrew Otto
>>>>>  Wikimedia Foundation
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm having quite a bit of trouble running pyflink from the default
>>>>>> flink distribution tarballs.  I'd expect the python examples to work as
>>>>>> long as python is installed, and we've got the distribution.  Some python
>>>>>> dependencies are not included in the flink distribution tarballs:
>>>>>> cloudpickle, py4j and pyflink are in opt/python.  Others are not, e.g.
>>>>>> protobuf.
>>>>>>
>>>>>> Now that I'm looking, I see that the pyflink installation
>>>>>> instructions
>>>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/>
>>>>>>  are
>>>>>> to install via pip.
>>>>>>
>>>>>> I'm doing this in Docker for use with the flink-kubernetes-operator.
>>>>>> In the Using Flink Python on Docker
>>>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker>
>>>>>>  instructions,
>>>>>> there is a pip3 install apache-flink step.  I find this strange, since 
>>>>>> I'd
>>>>>> expect the 'FROM flink:1.15.2'  part to be sufficient.
>>>>>>
>>>>>> By pip installing apache-flink, this docker image will have the flink
>>>>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
>>>>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
>>>>>> BUT ALSO flink lib jars will be installed at e.g.
>>>>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
>>>>>> So, by following those instructions, flink is effectively installed
>>>>>> twice into the docker image.
>>>>>>
>>>>>> Am I correct or am I missing something?
>>>>>>
>>>>>> Is using pyflink from the flink distribution tarball (without pip)
>>>>>> not a supported way to use pyflink?
>>>>>>
>>>>>> Thanks!
>>>>>> -Andrew Otto
>>>>>>  Wikimedia Foundation
>>>>>>
>>>>>>

Reply via email to