[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
[ https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepyaman Datta updated FLINK-33225: Description: In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, `JVM_ARGS` need to be passed as an array. For example, the current behavior of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'` is: {{> raise RuntimeError(}} {{ "Java gateway process exited before sending its port number.\nStderr:\n"}} {{ + stderr_info}} {{ )}} {{E RuntimeError: Java gateway process exited before sending its port number.}} {{E Stderr:}} {{E Improperly specified VM option 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}} {{E Error: Could not create the Java Virtual Machine.}} {{E Error: A fatal exception has occurred. Program will exit.}} was: In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, `JVM_ARGS` need to be passed as an array. For example, the current behavior of `export JVM_ARGS='-XX:CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'` is: ``` > raise RuntimeError( "Java gateway process exited before sending its port number.\nStderr:\n" + stderr_info ) E RuntimeError: Java gateway process exited before sending its port number. E Stderr: E Improperly specified VM option 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M' E Error: Could not create the Java Virtual Machine. E Error: A fatal exception has occurred. Program will exit. > Python API incorrectly passes `JVM_ARGS` as single argument > --- > > Key: FLINK-33225 > URL: https://issues.apache.org/jira/browse/FLINK-33225 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0, 1.17.1, 1.18.1 >Reporter: Deepyaman Datta >Priority: Major > Labels: github-pullrequest > > In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, > `JVM_ARGS` need to be passed as an array. For example, the current behavior > of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M > -XX:MaxMetaspaceSize=200M'` is: > {{> raise RuntimeError(}} > {{ "Java gateway process exited before sending its port > number.\nStderr:\n"}} > {{ + stderr_info}} > {{ )}} > {{E RuntimeError: Java gateway process exited before sending > its port number.}} > {{E Stderr:}} > {{E Improperly specified VM option > 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}} > {{E Error: Could not create the Java Virtual Machine.}} > {{E Error: A fatal exception has occurred. Program will exit.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
[ https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepyaman Datta updated FLINK-33225: Description: In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, `JVM_ARGS` need to be passed as an array. For example, the current behavior of `export JVM_ARGS='-XX:CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'` is: ``` > raise RuntimeError( "Java gateway process exited before sending its port number.\nStderr:\n" + stderr_info ) E RuntimeError: Java gateway process exited before sending its port number. E Stderr: E Improperly specified VM option 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M' E Error: Could not create the Java Virtual Machine. E Error: A fatal exception has occurred. Program will exit. > Python API incorrectly passes `JVM_ARGS` as single argument > --- > > Key: FLINK-33225 > URL: https://issues.apache.org/jira/browse/FLINK-33225 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0, 1.17.1, 1.18.1 >Reporter: Deepyaman Datta >Priority: Major > Labels: github-pullrequest > > In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, > `JVM_ARGS` need to be passed as an array. For example, the current behavior > of `export JVM_ARGS='-XX:CompressedClassSpaceSize=100M > -XX:MaxMetaspaceSize=200M'` is: > ``` > > raise RuntimeError( > "Java gateway process exited before sending its port > number.\nStderr:\n" > + stderr_info > ) > E RuntimeError: Java gateway process exited before sending its > port number. > E Stderr: > E Improperly specified VM option 'CompressedClassSpaceSize=100M > -XX:MaxMetaspaceSize=200M' > E Error: Could not create the Java Virtual Machine. > E Error: A fatal exception has occurred. Program will exit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
[ https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepyaman Datta updated FLINK-33225: Affects Version/s: 1.17.1 1.18.0 1.18.1 > Python API incorrectly passes `JVM_ARGS` as single argument > --- > > Key: FLINK-33225 > URL: https://issues.apache.org/jira/browse/FLINK-33225 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0, 1.17.1, 1.18.1 >Reporter: Deepyaman Datta >Priority: Major > Labels: github-pullrequest > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
[ https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepyaman Datta updated FLINK-33225: Labels: github-pullrequest (was: ) > Python API incorrectly passes `JVM_ARGS` as single argument > --- > > Key: FLINK-33225 > URL: https://issues.apache.org/jira/browse/FLINK-33225 > Project: Flink > Issue Type: Bug >Reporter: Deepyaman Datta >Priority: Major > Labels: github-pullrequest > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
Deepyaman Datta created FLINK-33225: --- Summary: Python API incorrectly passes `JVM_ARGS` as single argument Key: FLINK-33225 URL: https://issues.apache.org/jira/browse/FLINK-33225 Project: Flink Issue Type: Bug Reporter: Deepyaman Datta -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987 ] Deepyaman Datta edited comment on FLINK-32758 at 8/29/23 3:04 PM: -- [~dianfu] I'm happy with the `\!=1.8.0` constraint. was (Author: deepyaman): [~dianfu] I'm happy with the `!=1.8.0` constraint! > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} >
[jira] [Comment Edited] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987 ] Deepyaman Datta edited comment on FLINK-32758 at 8/29/23 3:04 PM: -- [~dianfu] I'm happy with the `!=1.8.0` constraint! was (Author: deepyaman): [~dianfu] I'm happy with the `\!=1.8.0` constraint. > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} >
[jira] [Comment Edited] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987 ] Deepyaman Datta edited comment on FLINK-32758 at 8/29/23 3:04 PM: -- [~dianfu] I'm happy with the `\!=1.8.0` constraint! was (Author: deepyaman): [~dianfu] I'm happy with the `!=1.8.0` constraint! > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} >
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759987#comment-17759987 ] Deepyaman Datta commented on FLINK-32758: - [~dianfu] I'm happy with the `!=1.8.0` constraint! > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}}
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759585#comment-17759585 ] Deepyaman Datta commented on FLINK-32758: - [~Sergey Nuyanzin] This looks to be related to [https://github.com/fastavro/fastavro/issues/701]; while we pin `cython<3` for PyFlink, `fastavro` is getting built separately with Cython 3. One possible solution is to do something like [https://stackoverflow.com/a/76837035/1093967,] where `cython<3` is installed globally in the environment and used for building all of the libraries (I think). I'm not sure how you all feel about that, but I try to raise a PR with that, if helpful. It seems the failing test is on nightly build that runs a lot more checks; I'm not sure how I can verify that a potential fix would work, if I try? Can I trigger these tests manually? The other possibility is to check why `fastavro>=1.8.1` isn't getting picked, and it's using `fastavro==1.8.0`. The newer versions have the Cython pin in their build requirements, and we wouldn't need to do a `pip wheel --no-build-isolation`. I can try to check this later today. > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via
[jira] [Created] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
Deepyaman Datta created FLINK-32758: --- Summary: PyFlink bounds are overly restrictive and outdated Key: FLINK-32758 URL: https://issues.apache.org/jira/browse/FLINK-32758 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.17.1 Reporter: Deepyaman Datta Hi! I am part of a team building the Flink backend for Ibis ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink under the hood for execution; however, PyFlink's requirements are incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and restrictive requirements prevent it from being used alongside most recent releases of Python data libraries. Some of the major libraries we (and likely others in the Python community interested in using PyFlink alongside other libraries) need compatibility with: * PyArrow (at least >=10.0.0, but there's no reason not to be also be compatible with latest) * pandas (should be compatible with 2.x series, but also probably with 1.4.x, released January 2022, and 1.5.x) * numpy (1.22 was released in December 2022) * Newer releases of Apache Beam * Newer releases of cython Furthermore, uncapped dependencies could be more generally preferable, as they avoid the need for frequent PyFlink releases as newer versions of libraries are released. A common (and great) argument for not upper-bounding dependencies, especially for libraries: [https://iscinumpy.dev/post/bound-version-constraints/] I am currently testing removing upper bounds in [https://github.com/apache/flink/pull/23141]; so far, builds pass without issue in [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], and I'm currently waiting on [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed dependencies results in: {{#}} {{# This file is autogenerated by pip-compile with Python 3.8}} {{# by the following command:}} {{#}} {{# pip-compile --config=pyproject.toml --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} {{#}} {{apache-beam==2.49.0}} {{ # via -r dev/dev-requirements.txt}} {{avro-python3==1.10.2}} {{ # via -r dev/dev-requirements.txt}} {{certifi==2023.7.22}} {{ # via requests}} {{charset-normalizer==3.2.0}} {{ # via requests}} {{cloudpickle==2.2.1}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{crcmod==1.7}} {{ # via apache-beam}} {{cython==3.0.0}} {{ # via -r dev/dev-requirements.txt}} {{dill==0.3.1.1}} {{ # via apache-beam}} {{dnspython==2.4.1}} {{ # via pymongo}} {{docopt==0.6.2}} {{ # via hdfs}} {{exceptiongroup==1.1.2}} {{ # via pytest}} {{fastavro==1.8.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{fasteners==0.18}} {{ # via apache-beam}} {{find-libpython==0.3.0}} {{ # via pemja}} {{grpcio==1.56.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # grpcio-tools}} {{grpcio-tools==1.56.2}} {{ # via -r dev/dev-requirements.txt}} {{hdfs==2.7.0}} {{ # via apache-beam}} {{httplib2==0.22.0}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{idna==3.4}} {{ # via requests}} {{iniconfig==2.0.0}} {{ # via pytest}} {{numpy==1.24.4}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{ # pyarrow}} {{objsize==0.6.1}} {{ # via apache-beam}} {{orjson==3.9.2}} {{ # via apache-beam}} {{packaging==23.1}} {{ # via pytest}} {{pandas==2.0.3}} {{ # via -r dev/dev-requirements.txt}} {{pemja==0.3.0 ; platform_system != "Windows"}} {{ # via -r dev/dev-requirements.txt}} {{pluggy==1.2.0}} {{ # via pytest}} {{proto-plus==1.22.3}} {{ # via apache-beam}} {{protobuf==4.23.4}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # grpcio-tools}} {{ # proto-plus}} {{py4j==0.10.9.7}} {{ # via -r dev/dev-requirements.txt}} {{pyarrow==11.0.0}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{pydot==1.4.2}} {{ # via apache-beam}} {{pymongo==4.4.1}} {{ # via apache-beam}} {{pyparsing==3.1.1}} {{ # via}} {{ # httplib2}} {{ # pydot}} {{pytest==7.4.0}} {{ # via -r dev/dev-requirements.txt}} {{python-dateutil==2.8.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{pytz==2023.3}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{regex==2023.6.3}} {{ # via apache-beam}} {{requests==2.31.0}} {{ # via}} {{ # apache-beam}} {{ # hdfs}} {{six==1.16.0}} {{ # via}} {{ # hdfs}} {{ # python-dateutil}}
[jira] [Commented] (FLINK-23159) Correlated sql subquery on the source created via fromValues() failed to compile
[ https://issues.apache.org/jira/browse/FLINK-23159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702027#comment-17702027 ] Deepyaman Datta commented on FLINK-23159: - Hello! I believe I'm affected by this issue, as shared on the Apache Flink Slack ([https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679089054243999).] If I'm understanding correctly, would [https://github.com/apache/flink/blob/release-1.17.0-rc3/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/SubQueryDecorrelator.java#L893-L901] need to be filled out (to not return `null`)? Is this something that would be reasonable for a new contributor to tackle? I'm unfortunately not at all familiar with the Flink codebase, but I can give it a shot, if reasonable. Alternatively, are there any suggested workarounds to this issue? > Correlated sql subquery on the source created via fromValues() failed to > compile > > > Key: FLINK-23159 > URL: https://issues.apache.org/jira/browse/FLINK-23159 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.0 >Reporter: Yun Gao >Priority: Major > > Correlated subquery like > {code:java} > import org.apache.flink.table.api.DataTypes; > import org.apache.flink.table.api.EnvironmentSettings; > import org.apache.flink.table.api.Table; > import org.apache.flink.table.api.TableEnvironment; > import org.apache.flink.table.types.DataType; > import org.apache.flink.types.Row; > import java.util.ArrayList; > import java.util.List; > public class SQLQueryTest { > public static void main(String[] args) { > EnvironmentSettings settings = > EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode() > .build(); > TableEnvironment tableEnvironment = TableEnvironment.create(settings); > DataType row = DataTypes.ROW( > DataTypes.FIELD("flag", DataTypes.STRING()), > DataTypes.FIELD("id", DataTypes.INT()), > DataTypes.FIELD("name", DataTypes.STRING()) > ); > Table table = tableEnvironment.fromValues(row, new > MyListSource("table1").builder()); > tableEnvironment.createTemporaryView("table1", table); > table = tableEnvironment.fromValues(row, new > MyListSource("table2").builder()); > tableEnvironment.createTemporaryView("table2", table); > String sql = "select t1.flag from table1 t1 where t1.name in (select > t2.name from table2 t2 where t2.id = t1.id)"; > tableEnvironment.explainSql(sql); > } > public static class MyListSource { > private String flag; > public MyListSource(String flag) { > this.flag = flag; > } > public List builder() { > List rows = new ArrayList<>(); > for (int i = 2; i < 3; i++) { > Row row = new Row(3); > row.setField(0, flag); > row.setField(1, i); > row.setField(2, "me"); > rows.add(row); > } > return rows; > } > } > } > {code} > would throws > {code:java} > Exception in thread "main" org.apache.flink.table.api.TableException: > unexpected correlate variable $cor0 in the plan > at > org.apache.flink.table.planner.plan.optimize.program.FlinkDecorrelateProgram.checkCorrelVariableExists(FlinkDecorrelateProgram.scala:57) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkDecorrelateProgram.optimize(FlinkDecorrelateProgram.scala:42) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) > at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at >