[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-10-09 Thread Deepyaman Datta (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepyaman Datta updated FLINK-33225:

Description: 
In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
`JVM_ARGS` need to be passed as an array. For example, the current behavior of 
export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'` 
is:

{{>               raise RuntimeError(}}
{{                    "Java gateway process exited before sending its port 
number.\nStderr:\n"}}
{{                    + stderr_info}}
{{                )}}
{{E               RuntimeError: Java gateway process exited before sending its 
port number.}}
{{E               Stderr:}}
{{E               Improperly specified VM option 'CompressedClassSpaceSize=100M 
-XX:MaxMetaspaceSize=200M'}}
{{E               Error: Could not create the Java Virtual Machine.}}
{{E               Error: A fatal exception has occurred. Program will exit.}}

  was:
In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
`JVM_ARGS` need to be passed as an array. For example, the current behavior of 
`export JVM_ARGS='-XX:CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'` 
is:

```
>               raise RuntimeError(
                    "Java gateway process exited before sending its port 
number.\nStderr:\n"
                    + stderr_info
                )
E               RuntimeError: Java gateway process exited before sending its 
port number.
E               Stderr:
E               Improperly specified VM option 'CompressedClassSpaceSize=100M 
-XX:MaxMetaspaceSize=200M'
E               Error: Could not create the Java Virtual Machine.
E               Error: A fatal exception has occurred. Program will exit.


> Python API incorrectly passes `JVM_ARGS` as single argument
> ---
>
> Key: FLINK-33225
> URL: https://issues.apache.org/jira/browse/FLINK-33225
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.18.0, 1.17.1, 1.18.1
>Reporter: Deepyaman Datta
>Priority: Major
>  Labels: github-pullrequest
>
> In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
> `JVM_ARGS` need to be passed as an array. For example, the current behavior 
> of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M 
> -XX:MaxMetaspaceSize=200M'` is:
> {{>               raise RuntimeError(}}
> {{                    "Java gateway process exited before sending its port 
> number.\nStderr:\n"}}
> {{                    + stderr_info}}
> {{                )}}
> {{E               RuntimeError: Java gateway process exited before sending 
> its port number.}}
> {{E               Stderr:}}
> {{E               Improperly specified VM option 
> 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}}
> {{E               Error: Could not create the Java Virtual Machine.}}
> {{E               Error: A fatal exception has occurred. Program will exit.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-10-09 Thread Deepyaman Datta (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepyaman Datta updated FLINK-33225:

Description: 
In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
`JVM_ARGS` need to be passed as an array. For example, the current behavior of 
`export JVM_ARGS='-XX:CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'` 
is:

```
>               raise RuntimeError(
                    "Java gateway process exited before sending its port 
number.\nStderr:\n"
                    + stderr_info
                )
E               RuntimeError: Java gateway process exited before sending its 
port number.
E               Stderr:
E               Improperly specified VM option 'CompressedClassSpaceSize=100M 
-XX:MaxMetaspaceSize=200M'
E               Error: Could not create the Java Virtual Machine.
E               Error: A fatal exception has occurred. Program will exit.

> Python API incorrectly passes `JVM_ARGS` as single argument
> ---
>
> Key: FLINK-33225
> URL: https://issues.apache.org/jira/browse/FLINK-33225
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.18.0, 1.17.1, 1.18.1
>Reporter: Deepyaman Datta
>Priority: Major
>  Labels: github-pullrequest
>
> In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
> `JVM_ARGS` need to be passed as an array. For example, the current behavior 
> of `export JVM_ARGS='-XX:CompressedClassSpaceSize=100M 
> -XX:MaxMetaspaceSize=200M'` is:
> ```
> >               raise RuntimeError(
>                     "Java gateway process exited before sending its port 
> number.\nStderr:\n"
>                     + stderr_info
>                 )
> E               RuntimeError: Java gateway process exited before sending its 
> port number.
> E               Stderr:
> E               Improperly specified VM option 'CompressedClassSpaceSize=100M 
> -XX:MaxMetaspaceSize=200M'
> E               Error: Could not create the Java Virtual Machine.
> E               Error: A fatal exception has occurred. Program will exit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-10-09 Thread Deepyaman Datta (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepyaman Datta updated FLINK-33225:

Affects Version/s: 1.17.1
   1.18.0
   1.18.1

> Python API incorrectly passes `JVM_ARGS` as single argument
> ---
>
> Key: FLINK-33225
> URL: https://issues.apache.org/jira/browse/FLINK-33225
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.18.0, 1.17.1, 1.18.1
>Reporter: Deepyaman Datta
>Priority: Major
>  Labels: github-pullrequest
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-10-09 Thread Deepyaman Datta (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepyaman Datta updated FLINK-33225:

Labels: github-pullrequest  (was: )

> Python API incorrectly passes `JVM_ARGS` as single argument
> ---
>
> Key: FLINK-33225
> URL: https://issues.apache.org/jira/browse/FLINK-33225
> Project: Flink
>  Issue Type: Bug
>Reporter: Deepyaman Datta
>Priority: Major
>  Labels: github-pullrequest
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-10-09 Thread Deepyaman Datta (Jira)
Deepyaman Datta created FLINK-33225:
---

 Summary: Python API incorrectly passes `JVM_ARGS` as single 
argument
 Key: FLINK-33225
 URL: https://issues.apache.org/jira/browse/FLINK-33225
 Project: Flink
  Issue Type: Bug
Reporter: Deepyaman Datta






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Deepyaman Datta (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987
 ] 

Deepyaman Datta edited comment on FLINK-32758 at 8/29/23 3:04 PM:
--

[~dianfu] I'm happy with the `\!=1.8.0` constraint.


was (Author: deepyaman):
[~dianfu] I'm happy with the `!=1.8.0` constraint!

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> 

[jira] [Comment Edited] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Deepyaman Datta (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987
 ] 

Deepyaman Datta edited comment on FLINK-32758 at 8/29/23 3:04 PM:
--

[~dianfu] I'm happy with the `!=1.8.0` constraint!


was (Author: deepyaman):
[~dianfu] I'm happy with the `\!=1.8.0` constraint.

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> 

[jira] [Comment Edited] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Deepyaman Datta (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987
 ] 

Deepyaman Datta edited comment on FLINK-32758 at 8/29/23 3:04 PM:
--

[~dianfu] I'm happy with the `\!=1.8.0` constraint!


was (Author: deepyaman):
[~dianfu] I'm happy with the `!=1.8.0` constraint!

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> 

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Deepyaman Datta (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987
 ] 

Deepyaman Datta commented on FLINK-32758:
-

[~dianfu] I'm happy with the `!=1.8.0` constraint!

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Deepyaman Datta (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759585#comment-17759585
 ] 

Deepyaman Datta commented on FLINK-32758:
-

[~Sergey Nuyanzin] This looks to be related to 
[https://github.com/fastavro/fastavro/issues/701]; while we pin `cython<3` for 
PyFlink, `fastavro` is getting built separately with Cython 3. One possible 
solution is to do something like 
[https://stackoverflow.com/a/76837035/1093967,] where `cython<3` is installed 
globally in the environment and used for building all of the libraries (I 
think). I'm not sure how you all feel about that, but I try to raise a PR with 
that, if helpful. It seems the failing test is on nightly build that runs a lot 
more checks; I'm not sure how I can verify that a potential fix would work, if 
I try? Can I trigger these tests manually?

The other possibility is to check why `fastavro>=1.8.1` isn't getting picked, 
and it's using `fastavro==1.8.0`. The newer versions have the Cython pin in 
their build requirements, and we wouldn't need to do a `pip wheel 
--no-build-isolation`. I can try to check this later today.

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via 

[jira] [Created] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-04 Thread Deepyaman Datta (Jira)
Deepyaman Datta created FLINK-32758:
---

 Summary: PyFlink bounds are overly restrictive and outdated
 Key: FLINK-32758
 URL: https://issues.apache.org/jira/browse/FLINK-32758
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.17.1
Reporter: Deepyaman Datta


Hi! I am part of a team building the Flink backend for Ibis 
([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
under the hood for execution; however, PyFlink's requirements are incompatible 
with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and 
restrictive requirements prevent it from being used alongside most recent 
releases of Python data libraries.

Some of the major libraries we (and likely others in the Python community 
interested in using PyFlink alongside other libraries) need compatibility with:
 * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
compatible with latest)
 * pandas (should be compatible with 2.x series, but also probably with 1.4.x, 
released January 2022, and 1.5.x)
 * numpy (1.22 was released in December 2022)
 * Newer releases of Apache Beam
 * Newer releases of cython

Furthermore, uncapped dependencies could be more generally preferable, as they 
avoid the need for frequent PyFlink releases as newer versions of libraries are 
released. A common (and great) argument for not upper-bounding dependencies, 
especially for libraries: 
[https://iscinumpy.dev/post/bound-version-constraints/]

I am currently testing removing upper bounds in 
[https://github.com/apache/flink/pull/23141]; so far, builds pass without issue 
in 
[b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
 and I'm currently waiting on 
[c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
 to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
dependencies results in:



{{#}}
{{# This file is autogenerated by pip-compile with Python 3.8}}
{{# by the following command:}}
{{#}}
{{#    pip-compile --config=pyproject.toml 
--output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
{{#}}
{{apache-beam==2.49.0}}
{{    # via -r dev/dev-requirements.txt}}
{{avro-python3==1.10.2}}
{{    # via -r dev/dev-requirements.txt}}
{{certifi==2023.7.22}}
{{    # via requests}}
{{charset-normalizer==3.2.0}}
{{    # via requests}}
{{cloudpickle==2.2.1}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{crcmod==1.7}}
{{    # via apache-beam}}
{{cython==3.0.0}}
{{    # via -r dev/dev-requirements.txt}}
{{dill==0.3.1.1}}
{{    # via apache-beam}}
{{dnspython==2.4.1}}
{{    # via pymongo}}
{{docopt==0.6.2}}
{{    # via hdfs}}
{{exceptiongroup==1.1.2}}
{{    # via pytest}}
{{fastavro==1.8.2}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{fasteners==0.18}}
{{    # via apache-beam}}
{{find-libpython==0.3.0}}
{{    # via pemja}}
{{grpcio==1.56.2}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   grpcio-tools}}
{{grpcio-tools==1.56.2}}
{{    # via -r dev/dev-requirements.txt}}
{{hdfs==2.7.0}}
{{    # via apache-beam}}
{{httplib2==0.22.0}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{idna==3.4}}
{{    # via requests}}
{{iniconfig==2.0.0}}
{{    # via pytest}}
{{numpy==1.24.4}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   pandas}}
{{    #   pyarrow}}
{{objsize==0.6.1}}
{{    # via apache-beam}}
{{orjson==3.9.2}}
{{    # via apache-beam}}
{{packaging==23.1}}
{{    # via pytest}}
{{pandas==2.0.3}}
{{    # via -r dev/dev-requirements.txt}}
{{pemja==0.3.0 ; platform_system != "Windows"}}
{{    # via -r dev/dev-requirements.txt}}
{{pluggy==1.2.0}}
{{    # via pytest}}
{{proto-plus==1.22.3}}
{{    # via apache-beam}}
{{protobuf==4.23.4}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   grpcio-tools}}
{{    #   proto-plus}}
{{py4j==0.10.9.7}}
{{    # via -r dev/dev-requirements.txt}}
{{pyarrow==11.0.0}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{pydot==1.4.2}}
{{    # via apache-beam}}
{{pymongo==4.4.1}}
{{    # via apache-beam}}
{{pyparsing==3.1.1}}
{{    # via}}
{{    #   httplib2}}
{{    #   pydot}}
{{pytest==7.4.0}}
{{    # via -r dev/dev-requirements.txt}}
{{python-dateutil==2.8.2}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   pandas}}
{{pytz==2023.3}}
{{    # via}}
{{    #   -r dev/dev-requirements.txt}}
{{    #   apache-beam}}
{{    #   pandas}}
{{regex==2023.6.3}}
{{    # via apache-beam}}
{{requests==2.31.0}}
{{    # via}}
{{    #   apache-beam}}
{{    #   hdfs}}
{{six==1.16.0}}
{{    # via}}
{{    #   hdfs}}
{{    #   python-dateutil}}

[jira] [Commented] (FLINK-23159) Correlated sql subquery on the source created via fromValues() failed to compile

2023-03-17 Thread Deepyaman Datta (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702027#comment-17702027
 ] 

Deepyaman Datta commented on FLINK-23159:
-

Hello! I believe I'm affected by this issue, as shared on the Apache Flink 
Slack ([https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679089054243999).]

If I'm understanding correctly, would 
[https://github.com/apache/flink/blob/release-1.17.0-rc3/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/SubQueryDecorrelator.java#L893-L901]
 need to be filled out (to not return `null`)? Is this something that would be 
reasonable for a new contributor to tackle? I'm unfortunately not at all 
familiar with the Flink codebase, but I can give it a shot, if reasonable.

Alternatively, are there any suggested workarounds to this issue?

> Correlated sql subquery on the source created via fromValues() failed to 
> compile
> 
>
> Key: FLINK-23159
> URL: https://issues.apache.org/jira/browse/FLINK-23159
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.13.0
>Reporter: Yun Gao
>Priority: Major
>
> Correlated subquery like 
> {code:java}
> import org.apache.flink.table.api.DataTypes;
> import org.apache.flink.table.api.EnvironmentSettings;
> import org.apache.flink.table.api.Table;
> import org.apache.flink.table.api.TableEnvironment;
> import org.apache.flink.table.types.DataType;
> import org.apache.flink.types.Row;
> import java.util.ArrayList;
> import java.util.List;
> public class SQLQueryTest {
>   public static void main(String[] args) {
> EnvironmentSettings settings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode()
>   .build();
> TableEnvironment tableEnvironment = TableEnvironment.create(settings);
> DataType row = DataTypes.ROW(
>   DataTypes.FIELD("flag", DataTypes.STRING()),
>   DataTypes.FIELD("id", DataTypes.INT()),
>   DataTypes.FIELD("name", DataTypes.STRING())
> );
> Table table = tableEnvironment.fromValues(row, new 
> MyListSource("table1").builder());
> tableEnvironment.createTemporaryView("table1", table);
> table = tableEnvironment.fromValues(row, new 
> MyListSource("table2").builder());
> tableEnvironment.createTemporaryView("table2", table);
> String sql = "select t1.flag from table1 t1 where t1.name in (select 
> t2.name from table2 t2 where t2.id = t1.id)";
> tableEnvironment.explainSql(sql);
>   }
>   public static class MyListSource {
> private String flag;
> public MyListSource(String flag) {
>   this.flag = flag;
> }
> public List builder() {
>   List rows = new ArrayList<>();
>   for (int i = 2; i < 3; i++) {
> Row row = new Row(3);
> row.setField(0, flag);
> row.setField(1, i);
> row.setField(2, "me");
> rows.add(row);
>   }
>   return rows;
> }
>   }
> }
> {code}
> would throws
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> unexpected correlate variable $cor0 in the plan
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkDecorrelateProgram.checkCorrelVariableExists(FlinkDecorrelateProgram.scala:57)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkDecorrelateProgram.optimize(FlinkDecorrelateProgram.scala:42)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
>