[jira] [Updated] (FLINK-35302) Flink REST server throws exception on unknown fields in RequestBody

2024-05-13 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-35302:

Fix Version/s: 1.20.0
   (was: 1.19.1)

> Flink REST server throws exception on unknown fields in RequestBody
> ---
>
> Key: FLINK-35302
> URL: https://issues.apache.org/jira/browse/FLINK-35302
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.19.0
>Reporter: Juntao Hu
>Assignee: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> As 
> [FLIP-401|https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance]
>  and FLINK-33268 mentioned, when an old version REST client receives response 
> from a new version REST server, with strict JSON mapper, the client will 
> throw exceptions on newly added fields, which is not convenient for 
> situations where a centralized client deals with REST servers of different 
> versions (e.g. k8s operator).
> But this incompatibility can also happens at server side, when a new version 
> REST client sends requests to an old version REST server with additional 
> fields. Making server flexible with unknown fields can save clients from 
> backward compatibility code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35112) Membership for Row class does not include field names

2024-05-10 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845526#comment-17845526
 ] 

Dian Fu edited comment on FLINK-35112 at 5/11/24 2:56 AM:
--

Fix in 
- master via b4d71144de8e3772257804b6ed8ad688076430d6

- release-1.19 via d16b20e4fb4fb906927188ca11af599edd0953c1


was (Author: dianfu):
Fix in master via b4d71144de8e3772257804b6ed8ad688076430d6

> Membership for Row class does not include field names
> -
>
> Key: FLINK-35112
> URL: https://issues.apache.org/jira/browse/FLINK-35112
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.1
>Reporter: Wouter Zorgdrager
>Assignee: Wouter Zorgdrager
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> In the Row class in PyFlink I cannot do a membership check for field names. 
> This minimal example will show the unexpected behavior:
> ```
> from pyflink.common import Row
> row = Row(name="Alice", age=11)
> # Expected to be True, but is False
> print("name" in row)
> person = Row("name", "age")
> # This is True, as expected
> print('name' in person)
> ```
> The related code in the Row class is:
> ```
>     def __contains__(self, item):
>         return item in self._values
> ```
> It should be relatively easy to fix with the following code:
> ```
>     def __contains__(self, item):
>         if hasattr(self, "_fields"):
>             return item in self._fields
>         else:
>             return item in self._values
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35112) Membership for Row class does not include field names

2024-05-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-35112:

Fix Version/s: 1.19.1

> Membership for Row class does not include field names
> -
>
> Key: FLINK-35112
> URL: https://issues.apache.org/jira/browse/FLINK-35112
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.1
>Reporter: Wouter Zorgdrager
>Assignee: Wouter Zorgdrager
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> In the Row class in PyFlink I cannot do a membership check for field names. 
> This minimal example will show the unexpected behavior:
> ```
> from pyflink.common import Row
> row = Row(name="Alice", age=11)
> # Expected to be True, but is False
> print("name" in row)
> person = Row("name", "age")
> # This is True, as expected
> print('name' in person)
> ```
> The related code in the Row class is:
> ```
>     def __contains__(self, item):
>         return item in self._values
> ```
> It should be relatively easy to fix with the following code:
> ```
>     def __contains__(self, item):
>         if hasattr(self, "_fields"):
>             return item in self._fields
>         else:
>             return item in self._values
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35112) Membership for Row class does not include field names

2024-05-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-35112.
---
Fix Version/s: 1.20.0
   Resolution: Fixed

Fix in master via b4d71144de8e3772257804b6ed8ad688076430d6

> Membership for Row class does not include field names
> -
>
> Key: FLINK-35112
> URL: https://issues.apache.org/jira/browse/FLINK-35112
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.1
>Reporter: Wouter Zorgdrager
>Assignee: Wouter Zorgdrager
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> In the Row class in PyFlink I cannot do a membership check for field names. 
> This minimal example will show the unexpected behavior:
> ```
> from pyflink.common import Row
> row = Row(name="Alice", age=11)
> # Expected to be True, but is False
> print("name" in row)
> person = Row("name", "age")
> # This is True, as expected
> print('name' in person)
> ```
> The related code in the Row class is:
> ```
>     def __contains__(self, item):
>         return item in self._values
> ```
> It should be relatively easy to fix with the following code:
> ```
>     def __contains__(self, item):
>         if hasattr(self, "_fields"):
>             return item in self._fields
>         else:
>             return item in self._values
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35112) Membership for Row class does not include field names

2024-05-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-35112:
---

Assignee: Wouter Zorgdrager

> Membership for Row class does not include field names
> -
>
> Key: FLINK-35112
> URL: https://issues.apache.org/jira/browse/FLINK-35112
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.1
>Reporter: Wouter Zorgdrager
>Assignee: Wouter Zorgdrager
>Priority: Minor
>  Labels: pull-request-available
>
> In the Row class in PyFlink I cannot do a membership check for field names. 
> This minimal example will show the unexpected behavior:
> ```
> from pyflink.common import Row
> row = Row(name="Alice", age=11)
> # Expected to be True, but is False
> print("name" in row)
> person = Row("name", "age")
> # This is True, as expected
> print('name' in person)
> ```
> The related code in the Row class is:
> ```
>     def __contains__(self, item):
>         return item in self._values
> ```
> It should be relatively easy to fix with the following code:
> ```
>     def __contains__(self, item):
>         if hasattr(self, "_fields"):
>             return item in self._fields
>         else:
>             return item in self._values
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35302) Flink REST server throws exception on unknown fields in RequestBody

2024-05-07 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-35302:
---

Assignee: Juntao Hu

> Flink REST server throws exception on unknown fields in RequestBody
> ---
>
> Key: FLINK-35302
> URL: https://issues.apache.org/jira/browse/FLINK-35302
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.19.0
>Reporter: Juntao Hu
>Assignee: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.1
>
>
> As 
> [FLIP-401|https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance]
>  and FLINK-33268 mentioned, when an old version REST client receives response 
> from a new version REST server, with strict JSON mapper, the client will 
> throw exceptions on newly added fields, which is not convenient for 
> situations where a centralized client deals with REST servers of different 
> versions (e.g. k8s operator).
> But this incompatibility can also happens at server side, when a new version 
> REST client sends requests to an old version REST server with additional 
> fields. Making server flexible with unknown fields can save clients from 
> backward compatibility code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35208) Respect pipeline.cached-files during processing Python dependencies

2024-04-28 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-35208.
---
Fix Version/s: 1.20.0
   Resolution: Fixed

Merged to master via 0f01a0e9a6856781d5a0e33b26172bb913ec1928

> Respect pipeline.cached-files during processing Python dependencies
> ---
>
> Key: FLINK-35208
> URL: https://issues.apache.org/jira/browse/FLINK-35208
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Currently, PyFlink will make use of distributed cache 
> (StreamExecutionEnvironment#cachedFiles) during handling the Python 
> dependencies(See 
> [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339]
>  for more details). 
> However, if pipeline.cached-files is configured, it will clear 
> StreamExecutionEnvironment#cachedFiles(see 
> [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132]
>  for more details) which may break the above functionalities.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35208) Respect pipeline.cached-files during processing Python dependencies

2024-04-22 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-35208:

Description: 
Currently, PyFlink will make use of distributed cache 
(StreamExecutionEnvironment#cachedFiles) during handling the Python 
dependencies(See 
[https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339]
 for more details). 

However, if pipeline.cached-files is configured, it will clear 
StreamExecutionEnvironment#cachedFiles(see 
[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132]
 for more details) which may break the above functionalities.

  was:
Currently, PyFlink will make use of distributed cache (update 
StreamExecutionEnvironment#cachedFiles) during handling the Python 
dependencies(See 
[https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339]
 for more details). 

However, if pipeline.cached-files is configured, it will clear 
StreamExecutionEnvironment#cachedFiles(see 
[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132]
 for more details) which may break the above functionalities.


> Respect pipeline.cached-files during processing Python dependencies
> ---
>
> Key: FLINK-35208
> URL: https://issues.apache.org/jira/browse/FLINK-35208
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>
> Currently, PyFlink will make use of distributed cache 
> (StreamExecutionEnvironment#cachedFiles) during handling the Python 
> dependencies(See 
> [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339]
>  for more details). 
> However, if pipeline.cached-files is configured, it will clear 
> StreamExecutionEnvironment#cachedFiles(see 
> [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132]
>  for more details) which may break the above functionalities.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35208) Respect pipeline.cached-files during processing Python dependencies

2024-04-22 Thread Dian Fu (Jira)
Dian Fu created FLINK-35208:
---

 Summary: Respect pipeline.cached-files during processing Python 
dependencies
 Key: FLINK-35208
 URL: https://issues.apache.org/jira/browse/FLINK-35208
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu


Currently, PyFlink will make use of distributed cache (update 
StreamExecutionEnvironment#cachedFiles) during handling the Python 
dependencies(See 
[https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339]
 for more details). 

However, if pipeline.cached-files is configured, it will clear 
StreamExecutionEnvironment#cachedFiles(see 
[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132]
 for more details) which may break the above functionalities.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35112) Membership for Row class does not include field names

2024-04-16 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837965#comment-17837965
 ] 

Dian Fu commented on FLINK-35112:
-

Great (y). Thanks [~wzorgdrager] 

> Membership for Row class does not include field names
> -
>
> Key: FLINK-35112
> URL: https://issues.apache.org/jira/browse/FLINK-35112
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.1
>Reporter: Wouter Zorgdrager
>Priority: Minor
>
> In the Row class in PyFlink I cannot do a membership check for field names. 
> This minimal example will show the unexpected behavior:
> ```
> from pyflink.common import Row
> row = Row(name="Alice", age=11)
> # Expected to be True, but is False
> print("name" in row)
> person = Row("name", "age")
> # This is True, as expected
> print('name' in person)
> ```
> The related code in the Row class is:
> ```
>     def __contains__(self, item):
>         return item in self._values
> ```
> It should be relatively easy to fix with the following code:
> ```
>     def __contains__(self, item):
>         if hasattr(self, "_fields"):
>             return item in self._fields
>         else:
>             return item in self._values
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35112) Membership for Row class does not include field names

2024-04-15 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837512#comment-17837512
 ] 

Dian Fu commented on FLINK-35112:
-

Sounds reasonable for me. [~wzorgdrager] Do you want to submit a PR for this 
issue?

> Membership for Row class does not include field names
> -
>
> Key: FLINK-35112
> URL: https://issues.apache.org/jira/browse/FLINK-35112
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.1
>Reporter: Wouter Zorgdrager
>Priority: Minor
>
> In the Row class in PyFlink I cannot do a membership check for field names. 
> This minimal example will show the unexpected behavior:
> ```
> from pyflink.common import Row
> row = Row(name="Alice", age=11)
> # Expected to be True, but is False
> print("name" in row)
> person = Row("name", "age")
> # This is True, as expected
> print('name' in person)
> ```
> The related code in the Row class is:
> ```
>     def __contains__(self, item):
>         return item in self._values
> ```
> It should be relatively easy to fix with the following code:
> ```
>     def __contains__(self, item):
>         if hasattr(self, "_fields"):
>             return item in self._fields
>         else:
>             return item in self._values
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34985) It doesn't support to access fields by name for map function in thread mode

2024-04-01 Thread Dian Fu (Jira)
Dian Fu created FLINK-34985:
---

 Summary: It doesn't support to access fields by name for map 
function in thread mode
 Key: FLINK-34985
 URL: https://issues.apache.org/jira/browse/FLINK-34985
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu


Reported in slack channel: 
[https://apache-flink.slack.com/archives/C065944F9M2/p1711640068929589]

```
hi all, I seem to be running into an issue when switching to thread mode in 
PyFlink. In an UDF the {{Row}} seems to get converted into a tuple and you 
cannot access fields by their name anymore. In process mode it works fine. This 
bug can easily be reproduced using this minimal example, which is close to the 
PyFlink docs:
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.common import Row
from pyflink.table import StreamTableEnvironment, DataTypes
from pyflink.table.udf import udf

env = StreamExecutionEnvironment.get_execution_environment()
t_env = StreamTableEnvironment.create(env)
t_env.get_config().set("parallelism.default", "1")


# This does work:
t_env.get_config().set("python.execution-mode", "process")

# This doesn't work:
#t_env.get_config().set("python.execution-mode", "thread")


def map_function(a: Row) -> Row:
return Row(a.a + 1, a.b * a.b)


# map operation with a python general scalar function
func = udf(
map_function,
result_type=DataTypes.ROW(
[
DataTypes.FIELD("a", DataTypes.BIGINT()),
DataTypes.FIELD("b", DataTypes.BIGINT()),
]
),
)
table = (
t_env.from_elements(
[(2, 4), (0, 0)],
schema=DataTypes.ROW(
[
DataTypes.FIELD("a", DataTypes.BIGINT()),
DataTypes.FIELD("b", DataTypes.BIGINT()),
]
),
)
.map(func)
.alias("a", "b")
.execute()
.print()
)```
 
The exception I get in this execution mode is:
2024-03-28 16:32:10 Caused by: pemja.core.PythonException: : 'tuple' object has no attribute 'a'
2024-03-28 16:32:10 at 
/usr/local/lib/python3.10/dist-packages/pyflink/fn_execution/embedded/operation_utils.process_element(operation_utils.py:72)
2024-03-28 16:32:10 at 
/usr/local/lib/python3.10/dist-packages/pyflink/fn_execution/table/operations.process_element(operations.py:102)
2024-03-28 16:32:10 at .(:1)
2024-03-28 16:32:10 at 
/opt/flink/wouter/minimal_example.map_function(minimal_example.py:19)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33220) PyFlink support for Datagen connector

2024-03-20 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829346#comment-17829346
 ] 

Dian Fu commented on FLINK-33220:
-

[~liu.chong]  I missed this ticket. Feel free to submit the PR and ping me~

> PyFlink support for Datagen connector
> -
>
> Key: FLINK-33220
> URL: https://issues.apache.org/jira/browse/FLINK-33220
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: Liu Chong
>Priority: Minor
>
> This is a simple Jira to propose the support of Datagen in PyFlink datastream 
> API as a built-in source connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33220) PyFlink support for Datagen connector

2024-03-20 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829346#comment-17829346
 ] 

Dian Fu edited comment on FLINK-33220 at 3/21/24 1:24 AM:
--

[~liu.chong]  Sorry, I missed this ticket. Feel free to submit the PR and ping 
me~


was (Author: dianfu):
[~liu.chong]  I missed this ticket. Feel free to submit the PR and ping me~

> PyFlink support for Datagen connector
> -
>
> Key: FLINK-33220
> URL: https://issues.apache.org/jira/browse/FLINK-33220
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: Liu Chong
>Priority: Minor
>
> This is a simple Jira to propose the support of Datagen in PyFlink datastream 
> API as a built-in source connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34202) python tests take suspiciously long in some of the cases

2024-02-18 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818326#comment-17818326
 ] 

Dian Fu commented on FLINK-34202:
-

[~lincoln.86xy] It will randomly select one Python version to test and all the 
Python versions will be tested finally, however, maybe not in one nightly CI. 
Personally I think this should be acceptable as the case that some test only 
failed in one specific Python version rarely happens. Even it does, it will 
still be found, just with one day or two day delay and this should be 
acceptable.

> python tests take suspiciously long in some of the cases
> 
>
> Key: FLINK-34202
> URL: https://issues.apache.org/jira/browse/FLINK-34202
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.2, 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Assignee: Xingbo Huang
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> [This release-1.18 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=3e4dd1a2-fe2f-5e5d-a581-48087e718d53=b4612f28-e3b5-5853-8a8b-610ae894217a]
>  has the python stage running into a timeout without any obvious reason. The 
> [python stage run for 
> JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06]
>  was also getting close to the 4h timeout.
> I'm creating this issue for documentation purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34202) python tests take suspiciously long in some of the cases

2024-02-17 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818242#comment-17818242
 ] 

Dian Fu commented on FLINK-34202:
-

[~hxbks2ks] Could you help to take a look at this issue?

> python tests take suspiciously long in some of the cases
> 
>
> Key: FLINK-34202
> URL: https://issues.apache.org/jira/browse/FLINK-34202
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.2, 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> [This release-1.18 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=3e4dd1a2-fe2f-5e5d-a581-48087e718d53=b4612f28-e3b5-5853-8a8b-610ae894217a]
>  has the python stage running into a timeout without any obvious reason. The 
> [python stage run for 
> JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06]
>  was also getting close to the 4h timeout.
> I'm creating this issue for documentation purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33613) Python UDF Runner process leak in Process Mode

2023-11-24 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-33613.
---
Fix Version/s: 1.19.0
   1.18.1
   1.17.3
   Resolution: Fixed

Fixed in:
- master via 977463cce3ea0f88e2f184c30720bf4e8e97fd4a
- release-1.18 via f6d005681e3f5f83ab1074660c4d9878dabb9176
- release-1.17 via c1818f530617af8996f4a74bb064e203186fd98e

> Python UDF Runner process leak in Process Mode
> --
>
> Key: FLINK-33613
> URL: https://issues.apache.org/jira/browse/FLINK-33613
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yu Chen
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
> Attachments: ps-ef.txt, streaming_word_count-1.py
>
>
> While working with PyFlink, we found that in Process Mode, the Python UDF 
> process may leak after a failover of the job. It leads to a rising number of 
> processes with their threads in the host machine, which eventually results in 
> failure to create new threads.
>  
> You can try to reproduce it with the attached test task 
> `streamin_word_count.py`.
> (Note that the job will continue failover, and you can watch the process 
> leaks by `ps -ef` on Taskmanager.
>  
> Our test environment:
>  * K8S Application Mode
>  * 4 Taskmanagers with 12 slots/TM
>  * Job's parallelism was set to 48 
> The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence 
> with slots of TM (12), but we found that there are 180 processes on one 
> Taskmanager after several failovers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33613) Python UDF Runner process leak in Process Mode

2023-11-23 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-33613:
---

Assignee: Dian Fu

> Python UDF Runner process leak in Process Mode
> --
>
> Key: FLINK-33613
> URL: https://issues.apache.org/jira/browse/FLINK-33613
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yu Chen
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Attachments: ps-ef.txt, streaming_word_count-1.py
>
>
> While working with PyFlink, we found that in Process Mode, the Python UDF 
> process may leak after a failover of the job. It leads to a rising number of 
> processes with their threads in the host machine, which eventually results in 
> failure to create new threads.
>  
> You can try to reproduce it with the attached test task 
> `streamin_word_count.py`.
> (Note that the job will continue failover, and you can watch the process 
> leaks by `ps -ef` on Taskmanager.
>  
> Our test environment:
>  * K8S Application Mode
>  * 4 Taskmanagers with 12 slots/TM
>  * Job's parallelism was set to 48 
> The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence 
> with slots of TM (12), but we found that there are 180 processes on one 
> Taskmanager after several failovers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-11-22 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-33225.
---
Fix Version/s: 1.19.0
   1.18.1
   Resolution: Fixed

Merged to
- master via b1bfd70ad8a9e4e1a710dc5775837ba7102d4b70
- release-1.18 via 73273ae1669c9f01b61667dedb85a7e745d6bbe2

> Python API incorrectly passes `JVM_ARGS` as single argument
> ---
>
> Key: FLINK-33225
> URL: https://issues.apache.org/jira/browse/FLINK-33225
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.18.0, 1.17.1, 1.18.1
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Major
>  Labels: github-pullrequest, pull-request-available
> Fix For: 1.19.0, 1.18.1
>
>
> In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
> `JVM_ARGS` need to be passed as an array. For example, the current behavior 
> of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M 
> -XX:MaxMetaspaceSize=200M'` is:
> {{>               raise RuntimeError(}}
> {{                    "Java gateway process exited before sending its port 
> number.\nStderr:\n"}}
> {{                    + stderr_info}}
> {{                )}}
> {{E               RuntimeError: Java gateway process exited before sending 
> its port number.}}
> {{E               Stderr:}}
> {{E               Improperly specified VM option 
> 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}}
> {{E               Error: Could not create the Java Virtual Machine.}}
> {{E               Error: A fatal exception has occurred. Program will exit.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument

2023-11-22 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-33225:
---

Assignee: Deepyaman Datta

> Python API incorrectly passes `JVM_ARGS` as single argument
> ---
>
> Key: FLINK-33225
> URL: https://issues.apache.org/jira/browse/FLINK-33225
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.18.0, 1.17.1, 1.18.1
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Major
>  Labels: github-pullrequest, pull-request-available
>
> In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, 
> `JVM_ARGS` need to be passed as an array. For example, the current behavior 
> of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M 
> -XX:MaxMetaspaceSize=200M'` is:
> {{>               raise RuntimeError(}}
> {{                    "Java gateway process exited before sending its port 
> number.\nStderr:\n"}}
> {{                    + stderr_info}}
> {{                )}}
> {{E               RuntimeError: Java gateway process exited before sending 
> its port number.}}
> {{E               Stderr:}}
> {{E               Improperly specified VM option 
> 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}}
> {{E               Error: Could not create the Java Virtual Machine.}}
> {{E               Error: A fatal exception has occurred. Program will exit.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33528) Externalize Python connector code

2023-11-15 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786595#comment-17786595
 ] 

Dian Fu commented on FLINK-33528:
-

Thanks very much for the efforts (y)

> Externalize Python connector code
> -
>
> Key: FLINK-33528
> URL: https://issues.apache.org/jira/browse/FLINK-33528
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Python, Connectors / Common
>Affects Versions: 1.18.0
>Reporter: Márton Balassi
>Assignee: Peter Vary
>Priority: Major
> Fix For: 1.19.0
>
>
> During the connector externalization effort end to end tests for the python 
> connectors were left in the main repository under:
> [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors]
> These include both python connector implementation and tests. Currently they 
> depend on a previously released version of the underlying connectors, 
> otherwise they would introduce a circular dependency given that they are in 
> the flink repo at the moment.
> This setup prevents us from propagating any breaking change to PublicEvolving 
> and Internal APIs used by the connectors as they lead to breaking the python 
> e2e tests. We run into this while implementing FLINK-25857.
> Note that we made the decision to turn off the Python test when merging 
> FLINK-25857, so now we are forced to fix this until 1.19 such that we can 
> reenable the test runs - now in the externalized connector repos.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"

2023-11-13 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-33529:

Affects Version/s: 1.17.2

> PyFlink fails with "No module named 'cloudpickle"
> -
>
> Key: FLINK-33529
> URL: https://issues.apache.org/jira/browse/FLINK-33529
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.0, 1.17.2
> Environment: Python 3.7.16 or Python 3.9
> YARN
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
> Attachments: batch_wc.py, flink1.17-get_site_packages.py, 
> flink1.18-get_site_packages.py
>
>
> PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same 
> program works fine on Flink 1.17. This is after the change 
> (https://issues.apache.org/jira/browse/FLINK-32034).
> *Repro:*
> {code}
> [hadoop@ip-1-2-3-4 ~]$ python --version
> Python 3.7.16
> [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink
> flink-1.18.0-1.amzn2.x86_64
> [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d
> [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output 
> s3://prabhuflinks3/OUT2/
> {code}
> *Error*
> {code}
> ModuleNotFoundError: No module named 'cloudpickle'
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656)
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281)
>   at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
>   at 
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92)
>   at 
> org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
> {code}
> *Analysis*
> 1. On Flink 1.17 and Python-3.7.16, 
> PythonEnvironmentManagerUtils#getSitePackagesPath used to return following 
> two paths
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp
> /tmp/lib/python3.7/site-packages
> /tmp/lib64/python3.7/site-packages
> {code}
> whereas Flink 1.18 (FLINK-32034) has changed the 
> PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is 
> returned
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp
> /tmp/lib64/python3.7/site-packages
> [root@ip-172-31-45-97 tmp]#
> {code}
> The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" 
> which is not returned by the getSitePackagesPath in Flink1.18 causing the 
> pyflink job failure.
> *Attached batch_wc.py, flink1.17-get_site_packages.py and 
> flink1.18-get_site_packages.py.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"

2023-11-13 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-33529.
---
Fix Version/s: 1.19.0
   1.18.1
   1.17.3
   Resolution: Fixed

> PyFlink fails with "No module named 'cloudpickle"
> -
>
> Key: FLINK-33529
> URL: https://issues.apache.org/jira/browse/FLINK-33529
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.0
> Environment: Python 3.7.16 or Python 3.9
> YARN
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
> Attachments: batch_wc.py, flink1.17-get_site_packages.py, 
> flink1.18-get_site_packages.py
>
>
> PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same 
> program works fine on Flink 1.17. This is after the change 
> (https://issues.apache.org/jira/browse/FLINK-32034).
> *Repro:*
> {code}
> [hadoop@ip-1-2-3-4 ~]$ python --version
> Python 3.7.16
> [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink
> flink-1.18.0-1.amzn2.x86_64
> [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d
> [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output 
> s3://prabhuflinks3/OUT2/
> {code}
> *Error*
> {code}
> ModuleNotFoundError: No module named 'cloudpickle'
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656)
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281)
>   at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
>   at 
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92)
>   at 
> org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
> {code}
> *Analysis*
> 1. On Flink 1.17 and Python-3.7.16, 
> PythonEnvironmentManagerUtils#getSitePackagesPath used to return following 
> two paths
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp
> /tmp/lib/python3.7/site-packages
> /tmp/lib64/python3.7/site-packages
> {code}
> whereas Flink 1.18 (FLINK-32034) has changed the 
> PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is 
> returned
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp
> /tmp/lib64/python3.7/site-packages
> [root@ip-172-31-45-97 tmp]#
> {code}
> The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" 
> which is not returned by the getSitePackagesPath in Flink1.18 causing the 
> pyflink job failure.
> *Attached batch_wc.py, flink1.17-get_site_packages.py and 
> flink1.18-get_site_packages.py.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"

2023-11-13 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785727#comment-17785727
 ] 

Dian Fu commented on FLINK-33529:
-

Fixed in:
 * master via 34d9594ab1e5412371b77912f120b7949c92dcdd
 * release-1.18 via 2844ea232accece60158b455079920fb2d78f448
 * release-1.17 via 01b1ce4f349315e1942b3290c0fa81ab0a6b183e

> PyFlink fails with "No module named 'cloudpickle"
> -
>
> Key: FLINK-33529
> URL: https://issues.apache.org/jira/browse/FLINK-33529
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.0
> Environment: Python 3.7.16 or Python 3.9
> YARN
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: pull-request-available
> Attachments: batch_wc.py, flink1.17-get_site_packages.py, 
> flink1.18-get_site_packages.py
>
>
> PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same 
> program works fine on Flink 1.17. This is after the change 
> (https://issues.apache.org/jira/browse/FLINK-32034).
> *Repro:*
> {code}
> [hadoop@ip-1-2-3-4 ~]$ python --version
> Python 3.7.16
> [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink
> flink-1.18.0-1.amzn2.x86_64
> [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d
> [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output 
> s3://prabhuflinks3/OUT2/
> {code}
> *Error*
> {code}
> ModuleNotFoundError: No module named 'cloudpickle'
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656)
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281)
>   at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
>   at 
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92)
>   at 
> org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
> {code}
> *Analysis*
> 1. On Flink 1.17 and Python-3.7.16, 
> PythonEnvironmentManagerUtils#getSitePackagesPath used to return following 
> two paths
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp
> /tmp/lib/python3.7/site-packages
> /tmp/lib64/python3.7/site-packages
> {code}
> whereas Flink 1.18 (FLINK-32034) has changed the 
> PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is 
> returned
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp
> /tmp/lib64/python3.7/site-packages
> [root@ip-172-31-45-97 tmp]#
> {code}
> The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" 
> which is not returned by the getSitePackagesPath in Flink1.18 causing the 
> pyflink job failure.
> *Attached batch_wc.py, flink1.17-get_site_packages.py and 
> flink1.18-get_site_packages.py.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"

2023-11-13 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-33529:
---

Assignee: Prabhu Joseph

> PyFlink fails with "No module named 'cloudpickle"
> -
>
> Key: FLINK-33529
> URL: https://issues.apache.org/jira/browse/FLINK-33529
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.0
> Environment: Python 3.7.16 or Python 3.9
> YARN
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: pull-request-available
> Attachments: batch_wc.py, flink1.17-get_site_packages.py, 
> flink1.18-get_site_packages.py
>
>
> PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same 
> program works fine on Flink 1.17. This is after the change 
> (https://issues.apache.org/jira/browse/FLINK-32034).
> *Repro:*
> {code}
> [hadoop@ip-1-2-3-4 ~]$ python --version
> Python 3.7.16
> [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink
> flink-1.18.0-1.amzn2.x86_64
> [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d
> [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output 
> s3://prabhuflinks3/OUT2/
> {code}
> *Error*
> {code}
> ModuleNotFoundError: No module named 'cloudpickle'
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656)
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281)
>   at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
>   at 
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92)
>   at 
> org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
> {code}
> *Analysis*
> 1. On Flink 1.17 and Python-3.7.16, 
> PythonEnvironmentManagerUtils#getSitePackagesPath used to return following 
> two paths
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp
> /tmp/lib/python3.7/site-packages
> /tmp/lib64/python3.7/site-packages
> {code}
> whereas Flink 1.18 (FLINK-32034) has changed the 
> PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is 
> returned
> {code}
> [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp
> /tmp/lib64/python3.7/site-packages
> [root@ip-172-31-45-97 tmp]#
> {code}
> The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" 
> which is not returned by the getSitePackagesPath in Flink1.18 causing the 
> pyflink job failure.
> *Attached batch_wc.py, flink1.17-get_site_packages.py and 
> flink1.18-get_site_packages.py.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33528) Externalize Python connector code

2023-11-13 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785708#comment-17785708
 ] 

Dian Fu commented on FLINK-33528:
-

Yes, I think so.

> Externalize Python connector code
> -
>
> Key: FLINK-33528
> URL: https://issues.apache.org/jira/browse/FLINK-33528
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Python, Connectors / Common
>Affects Versions: 1.18.0
>Reporter: Márton Balassi
>Priority: Major
> Fix For: 1.19.0
>
>
> During the connector externalization effort end to end tests for the python 
> connectors were left in the main repository under:
> [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors]
> These include both python connector implementation and tests. Currently they 
> depend on a previously released version of the underlying connectors, 
> otherwise they would introduce a circular dependency given that they are in 
> the flink repo at the moment.
> This setup prevents us from propagating any breaking change to PublicEvolving 
> and Internal APIs used by the connectors as they lead to breaking the python 
> e2e tests. We run into this while implementing FLINK-25857.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33528) Externalize Python connector code

2023-11-13 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785456#comment-17785456
 ] 

Dian Fu commented on FLINK-33528:
-

I think it's fair enough to externalize Python connectors.  Previously we have 
also found that the Java APIs of some connectors which reside in external 
repositories were changed in an incompatible way which breaks the Python 
connectors API which is located in Flink repository.

> Externalize Python connector code
> -
>
> Key: FLINK-33528
> URL: https://issues.apache.org/jira/browse/FLINK-33528
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Python, Connectors / Common
>Affects Versions: 1.18.0
>Reporter: Márton Balassi
>Priority: Major
> Fix For: 1.19.0
>
>
> During the connector externalization effort end to end tests for the python 
> connectors were left in the main repository under:
> [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors]
> These include both python connector implementation and tests. Currently they 
> depend on a previously released version of the underlying connectors, 
> otherwise they would introduce a circular dependency given that they are in 
> the flink repo at the moment.
> This setup prevents us from propagating any breaking change to PublicEvolving 
> and Internal APIs used by the connectors as they lead to breaking the python 
> e2e tests. We run into this while implementing FLINK-25857.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762927#comment-17762927
 ] 

Dian Fu commented on FLINK-32962:
-

+1 to backport to Flink 1.17 and 1.18, especially Flink 1.18.

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Assigned] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-32962:
---

Assignee: Aleksandr Pilipenko

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760240#comment-17760240
 ] 

Dian Fu commented on FLINK-32758:
-

Merged to:
- release-1.18 via 1f7796ee50cfbea4fb633692e6be01070ed45c6f and 
8551a39ee46054d3ec05f3d31758f7ad39b69a39
- release-1.17 via 30eeb91c3d2048b88e0a9903d9c973085df2c2ea and 
870ac98dcdb92774fed783254a3bf4d8ddc317aa

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   

[jira] [Closed] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32758.
---
Fix Version/s: 1.18.0
   1.17.2
   Resolution: Fixed

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.18.0, 1.17.2
>
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r 

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760238#comment-17760238
 ] 

Dian Fu commented on FLINK-32758:
-

Fixed in master via 5b5a0af15d57ed4424cf8dd744808433e397ebc4

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{  

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-29 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759969#comment-17759969
 ] 

Dian Fu commented on FLINK-32758:
-

Have verified that upgrade *{*}cibuildwheel{*}* doesn't work and exclude 
fastavro 1.8.0 works: fastavro>=1.1.0,!=1.8.0.

[~deepyaman] What's your thought?

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r 

[jira] [Commented] (FLINK-32981) Add python dynamic Flink home detection

2023-08-29 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759855#comment-17759855
 ] 

Dian Fu commented on FLINK-32981:
-

[~gaborgsomogyi] [~mbalassi] It seems that this PR breaks the PyFlink wheel 
package build (https://issues.apache.org/jira/browse/FLINK-32989). Could you 
help to take a look?

> Add python dynamic Flink home detection
> ---
>
> Key: FLINK-32981
> URL: https://issues.apache.org/jira/browse/FLINK-32981
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> During `pyflink` library compilation Flink home is calculated from the 
> provided `pyflink` version which is normally something like: `1.19.dev0`.  
> Such case `.dev0` is replaced to `-SNAPSHOT` which ends-up in hardcoded home 
> directory: 
> `../../flink-dist/target/flink-1.18-SNAPSHOT-bin/flink-1.18-SNAPSHOT`. This 
> is fine as long as one uses the basic version types described 
> [here](https://peps.python.org/pep-0440/#developmental-releases). In order to 
> support any kind of `pyflink` version one can dynamically find out the Flink 
> home directory through globbing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32989) PyFlink wheel package build failed

2023-08-29 Thread Dian Fu (Jira)
Dian Fu created FLINK-32989:
---

 Summary: PyFlink wheel package build failed
 Key: FLINK-32989
 URL: https://issues.apache.org/jira/browse/FLINK-32989
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.19.0
Reporter: Dian Fu


{code}
Compiling pyflink/fn_execution/coder_impl_fast.pyx because it changed. 
Compiling pyflink/fn_execution/table/aggregate_fast.pyx because it changed. 
Compiling pyflink/fn_execution/table/window_aggregate_fast.pyx because it 
changed. 
Compiling pyflink/fn_execution/stream_fast.pyx because it changed. 
Compiling pyflink/fn_execution/beam/beam_stream_fast.pyx because it changed. 
Compiling pyflink/fn_execution/beam/beam_coder_impl_fast.pyx because it 
changed. 
Compiling pyflink/fn_execution/beam/beam_operations_fast.pyx because it 
changed. 
[1/7] Cythonizing pyflink/fn_execution/beam/beam_coder_impl_fast.pyx 
[2/7] Cythonizing pyflink/fn_execution/beam/beam_operations_fast.pyx 
[3/7] Cythonizing pyflink/fn_execution/beam/beam_stream_fast.pyx 
[4/7] Cythonizing pyflink/fn_execution/coder_impl_fast.pyx 
[5/7] Cythonizing pyflink/fn_execution/stream_fast.pyx 
[6/7] Cythonizing pyflink/fn_execution/table/aggregate_fast.pyx 
[7/7] Cythonizing pyflink/fn_execution/table/window_aggregate_fast.pyx 
/home/vsts/work/1/s/flink-python/dev/.conda/envs/3.7/lib/python3.7/site-packages/Cython/Compiler/Main.py:369:
 FutureWarning: Cython directive 'language_level' not set, using 2 for now 
(Py2). This will change in a later release! File: 
/home/vsts/work/1/s/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd
 
 tree = Parsing.p_module(s, pxd, full_module_name) 
Exactly one Flink home directory must exist, but found: []
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740=logs=d15e2b2e-10cd-5f59-7734-42d57dc5564d=4a86776f-e6e1-598a-f75a-c43d8b819662



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759777#comment-17759777
 ] 

Dian Fu commented on FLINK-32758:
-

[~deepyaman] Actually the tests have passed on the CI:
!image-2023-08-29-10-19-37-977.png!

It's failing when building wheel package for MacOS.

It uses third-party platform **cibuildwheel** to build wheel packages, see 
[https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-python-wheels.yml#L41]
 for more details. Currently it's using version 2.8.0. I will try if the latest 
version (2.15.0) works on my CI.

 

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # 

[jira] [Updated] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-32758:

Attachment: image-2023-08-29-10-19-37-977.png

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{pydot==1.4.2}}
> {{    # 

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759774#comment-17759774
 ] 

Dian Fu commented on FLINK-32758:
-

[~Sergey Nuyanzin] [~deepyaman]  I have submitted a hotfix to temporary 
limiting fastavro < 1.8 to make the CI green (verified it on my CI): 
[https://github.com/apache/flink/commit/345dece9a8fd58d6ea1c829052fb2f3c68516b48]

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r 

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-24 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758790#comment-17758790
 ] 

Dian Fu commented on FLINK-32758:
-

Will backport to release-1.17 and release-1.8 branch after one official nightly 
test of the master branch.

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Major
>  Labels: pull-request-available
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> 

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-24 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758788#comment-17758788
 ] 

Dian Fu commented on FLINK-32758:
-

Merged to master via 0dd6f9745b8df005e3aef286ae73092696ca2799

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1
>Reporter: Deepyaman Datta
>Priority: Major
>  Labels: pull-request-available
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{pydot==1.4.2}}
> {{    # via apache-beam}}
> {{pymongo==4.4.1}}
> {{    # via apache-beam}}
> 

[jira] [Assigned] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-24 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-32758:
---

Assignee: Deepyaman Datta

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Major
>  Labels: pull-request-available
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{pydot==1.4.2}}
> {{    # via apache-beam}}
> {{pymongo==4.4.1}}
> {{    # via apache-beam}}
> {{pyparsing==3.1.1}}
> {{    # 

[jira] [Created] (FLINK-32724) Mark CEP API classes as Public / PublicEvolving

2023-08-01 Thread Dian Fu (Jira)
Dian Fu created FLINK-32724:
---

 Summary: Mark CEP API classes as Public / PublicEvolving
 Key: FLINK-32724
 URL: https://issues.apache.org/jira/browse/FLINK-32724
 Project: Flink
  Issue Type: Improvement
  Components: Library / CEP
Reporter: Dian Fu


Currently most CEP API classes, e.g. Pattern, PatternSelectFunction etc are not 
annotated as Public / PublicEvolving. We should improve this to make it clear 
which classes are public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32544) PythonFunctionFactoryTest fails on Java 17

2023-07-05 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-32544:
---

Assignee: Dian Fu

> PythonFunctionFactoryTest fails on Java 17
> --
>
> Key: FLINK-32544
> URL: https://issues.apache.org/jira/browse/FLINK-32544
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Legacy Components / Flink on Tez
>Affects Versions: 1.18.0
>Reporter: Chesnay Schepler
>Assignee: Dian Fu
>Priority: Major
>
> https://dev.azure.com/chesnay/flink/_build/results?buildId=3676=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068
> {code}
> Jul 05 10:17:23 Exception in thread "main" 
> java.lang.reflect.InaccessibleObjectException: Unable to make field private 
> static java.util.IdentityHashMap java.lang.ApplicationShutdownHooks.hooks 
> accessible: module java.base does not "opens java.lang" to unnamed module 
> @1880a322
> Jul 05 10:17:23   at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
> Jul 05 10:17:23   at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
> Jul 05 10:17:23   at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178)
> Jul 05 10:17:23   at 
> java.base/java.lang.reflect.Field.setAccessible(Field.java:172)
> Jul 05 10:17:23   at 
> org.apache.flink.client.python.PythonFunctionFactoryTest.closeStartedPythonProcess(PythonFunctionFactoryTest.java:115)
> Jul 05 10:17:23   at 
> org.apache.flink.client.python.PythonFunctionFactoryTest.cleanEnvironment(PythonFunctionFactoryTest.java:79)
> Jul 05 10:17:23   at 
> org.apache.flink.client.python.PythonFunctionFactoryTest.main(PythonFunctionFactoryTest.java:52)
> {code}
> Side-notes:
> * maybe re-evaluate if the test could be run through maven now
> * The shutdown hooks business is quite sketchy, and AFAICT would be 
> unnecessary if the test were an ITCase



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32536) Python tests fail with Arrow DirectBuffer exception

2023-07-05 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-32536:

Fix Version/s: 1.18.0

> Python tests fail with Arrow DirectBuffer exception
> ---
>
> Key: FLINK-32536
> URL: https://issues.apache.org/jira/browse/FLINK-32536
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Tests
>Affects Versions: 1.18.0
>Reporter: Chesnay Schepler
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> https://dev.azure.com/chesnay/flink/_build/results?buildId=3674=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068
> {code}
> 2023-07-04T12:54:15.5296754Z Jul 04 12:54:15 E   
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame.
> 2023-07-04T12:54:15.5299579Z Jul 04 12:54:15 E   : 
> java.lang.RuntimeException: Arrow depends on DirectByteBuffer.(long, 
> int) which is not available. Please set the system property 
> 'io.netty.tryReflectionSetAccessible' to 'true'.
> 2023-07-04T12:54:15.5302307Z Jul 04 12:54:15 Eat 
> org.apache.flink.table.runtime.arrow.ArrowUtils.checkArrowUsable(ArrowUtils.java:184)
> 2023-07-04T12:54:15.5302859Z Jul 04 12:54:15 Eat 
> org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame(ArrowUtils.java:546)
> 2023-07-04T12:54:15.5303177Z Jul 04 12:54:15 Eat 
> jdk.internal.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
> 2023-07-04T12:54:15.5303515Z Jul 04 12:54:15 Eat 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2023-07-04T12:54:15.5303929Z Jul 04 12:54:15 Eat 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
> 2023-07-04T12:54:15.5307338Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> 2023-07-04T12:54:15.5309888Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
> 2023-07-04T12:54:15.5310306Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> 2023-07-04T12:54:15.5337220Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> 2023-07-04T12:54:15.5341859Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> 2023-07-04T12:54:15.5342363Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> 2023-07-04T12:54:15.5344866Z Jul 04 12:54:15 Eat 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}
> {code}
> 2023-07-04T12:54:15.5663559Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_empty_to_pandas
> 2023-07-04T12:54:15.5663891Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_from_pandas
> 2023-07-04T12:54:15.5664299Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas
> 2023-07-04T12:54:15.5664655Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas_for_retract_table
> 2023-07-04T12:54:15.5665003Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_empty_to_pandas
> 2023-07-04T12:54:15.5665360Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_from_pandas
> 2023-07-04T12:54:15.5665704Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas
> 2023-07-04T12:54:15.5666045Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_for_retract_table
> 2023-07-04T12:54:15.5666415Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_with_event_time
> 2023-07-04T12:54:15.5666840Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_function
> 2023-07-04T12:54:15.5667189Z Jul 04 12:54:15 FAILED 
> 

[jira] [Closed] (FLINK-32536) Python tests fail with Arrow DirectBuffer exception

2023-07-05 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32536.
---
Resolution: Fixed

Fixed in master via 7f34df40c0de8e7bffd149196851f89012faf842

> Python tests fail with Arrow DirectBuffer exception
> ---
>
> Key: FLINK-32536
> URL: https://issues.apache.org/jira/browse/FLINK-32536
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Tests
>Affects Versions: 1.18.0
>Reporter: Chesnay Schepler
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
>
> https://dev.azure.com/chesnay/flink/_build/results?buildId=3674=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068
> {code}
> 2023-07-04T12:54:15.5296754Z Jul 04 12:54:15 E   
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame.
> 2023-07-04T12:54:15.5299579Z Jul 04 12:54:15 E   : 
> java.lang.RuntimeException: Arrow depends on DirectByteBuffer.(long, 
> int) which is not available. Please set the system property 
> 'io.netty.tryReflectionSetAccessible' to 'true'.
> 2023-07-04T12:54:15.5302307Z Jul 04 12:54:15 Eat 
> org.apache.flink.table.runtime.arrow.ArrowUtils.checkArrowUsable(ArrowUtils.java:184)
> 2023-07-04T12:54:15.5302859Z Jul 04 12:54:15 Eat 
> org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame(ArrowUtils.java:546)
> 2023-07-04T12:54:15.5303177Z Jul 04 12:54:15 Eat 
> jdk.internal.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
> 2023-07-04T12:54:15.5303515Z Jul 04 12:54:15 Eat 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2023-07-04T12:54:15.5303929Z Jul 04 12:54:15 Eat 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
> 2023-07-04T12:54:15.5307338Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> 2023-07-04T12:54:15.5309888Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
> 2023-07-04T12:54:15.5310306Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> 2023-07-04T12:54:15.5337220Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> 2023-07-04T12:54:15.5341859Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> 2023-07-04T12:54:15.5342363Z Jul 04 12:54:15 Eat 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> 2023-07-04T12:54:15.5344866Z Jul 04 12:54:15 Eat 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}
> {code}
> 2023-07-04T12:54:15.5663559Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_empty_to_pandas
> 2023-07-04T12:54:15.5663891Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_from_pandas
> 2023-07-04T12:54:15.5664299Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas
> 2023-07-04T12:54:15.5664655Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas_for_retract_table
> 2023-07-04T12:54:15.5665003Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_empty_to_pandas
> 2023-07-04T12:54:15.5665360Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_from_pandas
> 2023-07-04T12:54:15.5665704Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas
> 2023-07-04T12:54:15.5666045Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_for_retract_table
> 2023-07-04T12:54:15.5666415Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_with_event_time
> 2023-07-04T12:54:15.5666840Z Jul 04 12:54:15 FAILED 
> pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_function
> 2023-07-04T12:54:15.5667189Z Jul 04 12:54:15 FAILED 
> 

[jira] [Closed] (FLINK-32327) Python Kafka connector runs into strange NullPointerException

2023-07-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32327.
---
Fix Version/s: 1.18.0
   Resolution: Fixed

Fixed in master via 940cb746b91d5022f52c49d4e8e09e15ffea4709

> Python Kafka connector runs into strange NullPointerException
> -
>
> Key: FLINK-32327
> URL: https://issues.apache.org/jira/browse/FLINK-32327
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Chesnay Schepler
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> The following error occurs when running the python kafka tests:
> (this uses a slightly modified version of the code, but the error also 
> happens without it)
> {code:python}
>  def set_record_serializer(self, record_serializer: 
> 'KafkaRecordSerializationSchema') \
>  -> 'KafkaSinkBuilder':
>  """
>  Sets the :class:`KafkaRecordSerializationSchema` that transforms 
> incoming records to kafka
>  producer records.
>  
>  :param record_serializer: The 
> :class:`KafkaRecordSerializationSchema`.
>  """
>  # NOTE: If topic selector is a generated first-column selector, do 
> extra preprocessing
>  j_topic_selector = 
> get_field_value(record_serializer._j_serialization_schema,
> 'topicSelector')
>  
>  caching_name_suffix = 
> 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector'
>  if 
> j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix):
>  class_name = get_field_value(j_topic_selector, 'topicSelector')\
>  .getClass().getCanonicalName()
>  >   if class_name.startswith('com.sun.proxy') or 
> class_name.startswith('jdk.proxy'):
>  E   AttributeError: 'NoneType' object has no attribute 'startswith'
> {code}
> My assumption is that {{getCanonicalName}} returns {{null}} for some objects, 
> and this set of objects may have increased in Java 17. I tried adding a null 
> check, but that caused other tests to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32327) Python Kafka connector runs into strange NullPointerException

2023-07-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-32327:
---

Assignee: Dian Fu

> Python Kafka connector runs into strange NullPointerException
> -
>
> Key: FLINK-32327
> URL: https://issues.apache.org/jira/browse/FLINK-32327
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Chesnay Schepler
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
>
> The following error occurs when running the python kafka tests:
> (this uses a slightly modified version of the code, but the error also 
> happens without it)
> {code:python}
>  def set_record_serializer(self, record_serializer: 
> 'KafkaRecordSerializationSchema') \
>  -> 'KafkaSinkBuilder':
>  """
>  Sets the :class:`KafkaRecordSerializationSchema` that transforms 
> incoming records to kafka
>  producer records.
>  
>  :param record_serializer: The 
> :class:`KafkaRecordSerializationSchema`.
>  """
>  # NOTE: If topic selector is a generated first-column selector, do 
> extra preprocessing
>  j_topic_selector = 
> get_field_value(record_serializer._j_serialization_schema,
> 'topicSelector')
>  
>  caching_name_suffix = 
> 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector'
>  if 
> j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix):
>  class_name = get_field_value(j_topic_selector, 'topicSelector')\
>  .getClass().getCanonicalName()
>  >   if class_name.startswith('com.sun.proxy') or 
> class_name.startswith('jdk.proxy'):
>  E   AttributeError: 'NoneType' object has no attribute 'startswith'
> {code}
> My assumption is that {{getCanonicalName}} returns {{null}} for some objects, 
> and this set of objects may have increased in Java 17. I tried adding a null 
> check, but that caused other tests to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32327) Python Kafka connector runs into strange NullPointerException

2023-06-29 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738519#comment-17738519
 ] 

Dian Fu commented on FLINK-32327:
-

Thanks (y). I will find some time to investigate this issue next week.

> Python Kafka connector runs into strange NullPointerException
> -
>
> Key: FLINK-32327
> URL: https://issues.apache.org/jira/browse/FLINK-32327
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Chesnay Schepler
>Priority: Major
>
> The following error occurs when running the python kafka tests:
> (this uses a slightly modified version of the code, but the error also 
> happens without it)
> {code:python}
>  def set_record_serializer(self, record_serializer: 
> 'KafkaRecordSerializationSchema') \
>  -> 'KafkaSinkBuilder':
>  """
>  Sets the :class:`KafkaRecordSerializationSchema` that transforms 
> incoming records to kafka
>  producer records.
>  
>  :param record_serializer: The 
> :class:`KafkaRecordSerializationSchema`.
>  """
>  # NOTE: If topic selector is a generated first-column selector, do 
> extra preprocessing
>  j_topic_selector = 
> get_field_value(record_serializer._j_serialization_schema,
> 'topicSelector')
>  
>  caching_name_suffix = 
> 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector'
>  if 
> j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix):
>  class_name = get_field_value(j_topic_selector, 'topicSelector')\
>  .getClass().getCanonicalName()
>  >   if class_name.startswith('com.sun.proxy') or 
> class_name.startswith('jdk.proxy'):
>  E   AttributeError: 'NoneType' object has no attribute 'startswith'
> {code}
> My assumption is that {{getCanonicalName}} returns {{null}} for some objects, 
> and this set of objects may have increased in Java 17. I tried adding a null 
> check, but that caused other tests to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32327) Python Kafka connector runs into strange NullPointerException

2023-06-17 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733729#comment-17733729
 ] 

Dian Fu commented on FLINK-32327:
-

PS: I can help to dig into this problem if we know which ones are failing.

> Python Kafka connector runs into strange NullPointerException
> -
>
> Key: FLINK-32327
> URL: https://issues.apache.org/jira/browse/FLINK-32327
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Chesnay Schepler
>Priority: Major
>
> The following error occurs when running the python kafka tests:
> (this uses a slightly modified version of the code, but the error also 
> happens without it)
> {code:python}
>  def set_record_serializer(self, record_serializer: 
> 'KafkaRecordSerializationSchema') \
>  -> 'KafkaSinkBuilder':
>  """
>  Sets the :class:`KafkaRecordSerializationSchema` that transforms 
> incoming records to kafka
>  producer records.
>  
>  :param record_serializer: The 
> :class:`KafkaRecordSerializationSchema`.
>  """
>  # NOTE: If topic selector is a generated first-column selector, do 
> extra preprocessing
>  j_topic_selector = 
> get_field_value(record_serializer._j_serialization_schema,
> 'topicSelector')
>  
>  caching_name_suffix = 
> 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector'
>  if 
> j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix):
>  class_name = get_field_value(j_topic_selector, 'topicSelector')\
>  .getClass().getCanonicalName()
>  >   if class_name.startswith('com.sun.proxy') or 
> class_name.startswith('jdk.proxy'):
>  E   AttributeError: 'NoneType' object has no attribute 'startswith'
> {code}
> My assumption is that {{getCanonicalName}} returns {{null}} for some objects, 
> and this set of objects may have increased in Java 17. I tried adding a null 
> check, but that caused other tests to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32327) Python Kafka connector runs into strange NullPointerException

2023-06-17 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733728#comment-17733728
 ] 

Dian Fu commented on FLINK-32327:
-

[~chesnay] It seems that you have disabled the entire Python tests for Java 17. 
Have you seen other failing tests on Java 17 for Python?

> Python Kafka connector runs into strange NullPointerException
> -
>
> Key: FLINK-32327
> URL: https://issues.apache.org/jira/browse/FLINK-32327
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Chesnay Schepler
>Priority: Major
>
> The following error occurs when running the python kafka tests:
> (this uses a slightly modified version of the code, but the error also 
> happens without it)
> {code:python}
>  def set_record_serializer(self, record_serializer: 
> 'KafkaRecordSerializationSchema') \
>  -> 'KafkaSinkBuilder':
>  """
>  Sets the :class:`KafkaRecordSerializationSchema` that transforms 
> incoming records to kafka
>  producer records.
>  
>  :param record_serializer: The 
> :class:`KafkaRecordSerializationSchema`.
>  """
>  # NOTE: If topic selector is a generated first-column selector, do 
> extra preprocessing
>  j_topic_selector = 
> get_field_value(record_serializer._j_serialization_schema,
> 'topicSelector')
>  
>  caching_name_suffix = 
> 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector'
>  if 
> j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix):
>  class_name = get_field_value(j_topic_selector, 'topicSelector')\
>  .getClass().getCanonicalName()
>  >   if class_name.startswith('com.sun.proxy') or 
> class_name.startswith('jdk.proxy'):
>  E   AttributeError: 'NoneType' object has no attribute 'startswith'
> {code}
> My assumption is that {{getCanonicalName}} returns {{null}} for some objects, 
> and this set of objects may have increased in Java 17. I tried adding a null 
> check, but that caused other tests to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib

2023-06-17 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32136.
---
Fix Version/s: 1.18.0
   1.16.3
   1.17.2
 Assignee: Dian Fu
   Resolution: Fixed

Fixed in:
- master via f64563bc1a7ff698acd708b61e9e80ae9c3e848f
- release-1.17 via 1b3f25432a29005370b0f51aaa7d4ee79a5edd58
- release-1.16 via f9394025fb756c844ec5f2615971f227d40b9244

> Pyflink gateway server launch fails when purelib != platlib
> ---
>
> Key: FLINK-32136
> URL: https://issues.apache.org/jira/browse/FLINK-32136
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.13.3
>Reporter: William Ashley
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.16.3, 1.17.2
>
>
> On distros where python's {{purelib}} is different than {{platlib}} (e.g. 
> Amazon Linux 2, but from my research it's all of the Redhat-based ones), you 
> wind up with components of packages being installed across two different 
> locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and 
> {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}).
> {{_find_flink_home}} 
> [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60]
>  this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} 
> environment variable was the one being used. However, from 1.13.3, a 
> refactoring of {{launch_gateway_server_process}} 
> ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200]
>  
> [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280])
>  re-ordered some method calls. {{{}prepare_environment_variable{}}}'s 
> [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95]
>  of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that 
> matters, and it is the incorrect location.
> I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not 
> exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == 
> {{{}purelib{}}}).
> Repro steps on Amazon Linux 2
> {quote}{{yum -y install python3 java-11}}
> {{pip3 install apache-flink==1.13.3}}
> {{python3 -c 'from pyflink.table import EnvironmentSettings ; 
> EnvironmentSettings.new_instance()'}}
> {quote}
> The resulting error is
> {quote}{{The flink-python jar is not found in the opt folder of the 
> FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}}
> {{Error: Could not find or load main class 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Traceback (most recent call last):}}
> {{  File "", line 1, in }}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 214, in new_instance}}
> {{    return EnvironmentSettings.Builder()}}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 48, in {_}{{_}}init{{_}}{_}}}
> {{    gateway = get_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 62, in get_gateway}}
> {{    _gateway = launch_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 112, in launch_gateway}}
> {{    raise Exception("Java gateway process exited before sending its port 
> number")}}
> {{Exception: Java gateway process exited before sending its port number}}
> {quote}
> The flink home under /lib64/ does not contain the jar, but it is in the /lib/ 
> location
> {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink 
> -name "flink-python*.jar"}}
> {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name 
> "flink-python*.jar"}}
> {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}}
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib

2023-06-15 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733327#comment-17733327
 ] 

Dian Fu commented on FLINK-32136:
-

[~wash] I have submitted a PR. Could you help to review? It would be great if 
you could also help to verify it~

> Pyflink gateway server launch fails when purelib != platlib
> ---
>
> Key: FLINK-32136
> URL: https://issues.apache.org/jira/browse/FLINK-32136
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.13.3
>Reporter: William Ashley
>Priority: Major
>  Labels: pull-request-available
>
> On distros where python's {{purelib}} is different than {{platlib}} (e.g. 
> Amazon Linux 2, but from my research it's all of the Redhat-based ones), you 
> wind up with components of packages being installed across two different 
> locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and 
> {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}).
> {{_find_flink_home}} 
> [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60]
>  this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} 
> environment variable was the one being used. However, from 1.13.3, a 
> refactoring of {{launch_gateway_server_process}} 
> ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200]
>  
> [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280])
>  re-ordered some method calls. {{{}prepare_environment_variable{}}}'s 
> [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95]
>  of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that 
> matters, and it is the incorrect location.
> I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not 
> exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == 
> {{{}purelib{}}}).
> Repro steps on Amazon Linux 2
> {quote}{{yum -y install python3 java-11}}
> {{pip3 install apache-flink==1.13.3}}
> {{python3 -c 'from pyflink.table import EnvironmentSettings ; 
> EnvironmentSettings.new_instance()'}}
> {quote}
> The resulting error is
> {quote}{{The flink-python jar is not found in the opt folder of the 
> FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}}
> {{Error: Could not find or load main class 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Traceback (most recent call last):}}
> {{  File "", line 1, in }}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 214, in new_instance}}
> {{    return EnvironmentSettings.Builder()}}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 48, in {_}{{_}}init{{_}}{_}}}
> {{    gateway = get_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 62, in get_gateway}}
> {{    _gateway = launch_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 112, in launch_gateway}}
> {{    raise Exception("Java gateway process exited before sending its port 
> number")}}
> {{Exception: Java gateway process exited before sending its port number}}
> {quote}
> The flink home under /lib64/ does not contain the jar, but it is in the /lib/ 
> location
> {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink 
> -name "flink-python*.jar"}}
> {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name 
> "flink-python*.jar"}}
> {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}}
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib

2023-06-15 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733327#comment-17733327
 ] 

Dian Fu edited comment on FLINK-32136 at 6/16/23 5:25 AM:
--

[~wash] I have submitted a PR (https://github.com/apache/flink/pull/22802). 
Could you help to review? It would be great if you could also help to verify it~


was (Author: dianfu):
[~wash] I have submitted a PR. Could you help to review? It would be great if 
you could also help to verify it~

> Pyflink gateway server launch fails when purelib != platlib
> ---
>
> Key: FLINK-32136
> URL: https://issues.apache.org/jira/browse/FLINK-32136
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.13.3
>Reporter: William Ashley
>Priority: Major
>  Labels: pull-request-available
>
> On distros where python's {{purelib}} is different than {{platlib}} (e.g. 
> Amazon Linux 2, but from my research it's all of the Redhat-based ones), you 
> wind up with components of packages being installed across two different 
> locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and 
> {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}).
> {{_find_flink_home}} 
> [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60]
>  this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} 
> environment variable was the one being used. However, from 1.13.3, a 
> refactoring of {{launch_gateway_server_process}} 
> ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200]
>  
> [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280])
>  re-ordered some method calls. {{{}prepare_environment_variable{}}}'s 
> [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95]
>  of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that 
> matters, and it is the incorrect location.
> I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not 
> exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == 
> {{{}purelib{}}}).
> Repro steps on Amazon Linux 2
> {quote}{{yum -y install python3 java-11}}
> {{pip3 install apache-flink==1.13.3}}
> {{python3 -c 'from pyflink.table import EnvironmentSettings ; 
> EnvironmentSettings.new_instance()'}}
> {quote}
> The resulting error is
> {quote}{{The flink-python jar is not found in the opt folder of the 
> FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}}
> {{Error: Could not find or load main class 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Traceback (most recent call last):}}
> {{  File "", line 1, in }}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 214, in new_instance}}
> {{    return EnvironmentSettings.Builder()}}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 48, in {_}{{_}}init{{_}}{_}}}
> {{    gateway = get_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 62, in get_gateway}}
> {{    _gateway = launch_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 112, in launch_gateway}}
> {{    raise Exception("Java gateway process exited before sending its port 
> number")}}
> {{Exception: Java gateway process exited before sending its port number}}
> {quote}
> The flink home under /lib64/ does not contain the jar, but it is in the /lib/ 
> location
> {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink 
> -name "flink-python*.jar"}}
> {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name 
> "flink-python*.jar"}}
> {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}}
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32034) Python's DistUtils is deprecated as of 3.10

2023-06-15 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32034.
---
Fix Version/s: 1.18.0
   1.17.2
 Assignee: Colten Pilgreen
   Resolution: Fixed

Fixed in:
- master via 6ee1912e949cf290e5e1e620b1f4bae6552b428b
- release-1.17 via dd1dde377f775c43fcd95eed1bae91bf5fcfee2e

> Python's DistUtils is deprecated as of 3.10
> ---
>
> Key: FLINK-32034
> URL: https://issues.apache.org/jira/browse/FLINK-32034
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
> Environment: Kubernetes
> Java 11
> Python 3.10.9
>Reporter: Colten Pilgreen
>Assignee: Colten Pilgreen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.2
>
> Attachments: get_site_packages_path_script.py, 
> get_site_packages_path_script_shortened.py
>
>
> I have recent just went through an upgrade from 1.13 to 1.17, along with that 
> I upgraded the python version on our Flink Session server. Most everything 
> that is part of our workflow works, except for Python Dependency Management.
> After doing some digging, I found the reason is due to the DeprecationWarning 
> that is printed when trying to get the site packages path. The script is 
> GET_SITE_PACKAGES_PATH_SCRIPT and it is executed in the getSitePackagesPath 
> method in the PythonEnvironmentManagerUtils class. The issue is that the 
> DeprecationWarning is included into the PYTHONPATH environment variable which 
> is passed to the beam runner. The deprecation warning breaks Python's ability 
> to find the site packages due to characters that are not allowed in 
> filesystem paths.
>  
> Example of the PYTHONPATH environment variable:
> PYTHONPATH == :1: DeprecationWarning: The distutils package is 
> deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 
> 632 for potential 
> alternatives:/tmp/python-dist-c63e1464-925c-4289-bb71-c6f50e83186f/python-requirements/lib/python3.10/site-packages
> HADOOP_CONF_DIR == /opt/flink/conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib

2023-06-15 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733017#comment-17733017
 ] 

Dian Fu commented on FLINK-32136:
-

[~wash] Good catch! This seems like a critical problem. Would you like to open 
a PR to fix this issue?

> Pyflink gateway server launch fails when purelib != platlib
> ---
>
> Key: FLINK-32136
> URL: https://issues.apache.org/jira/browse/FLINK-32136
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.13.3
>Reporter: William Ashley
>Priority: Major
>
> On distros where python's {{purelib}} is different than {{platlib}} (e.g. 
> Amazon Linux 2, but from my research it's all of the Redhat-based ones), you 
> wind up with components of packages being installed across two different 
> locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and 
> {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}).
> {{_find_flink_home}} 
> [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60]
>  this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} 
> environment variable was the one being used. However, from 1.13.3, a 
> refactoring of {{launch_gateway_server_process}} 
> ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200]
>  
> [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280])
>  re-ordered some method calls. {{{}prepare_environment_variable{}}}'s 
> [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95]
>  of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that 
> matters, and it is the incorrect location.
> I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not 
> exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == 
> {{{}purelib{}}}).
> Repro steps on Amazon Linux 2
> {quote}{{yum -y install python3 java-11}}
> {{pip3 install apache-flink==1.13.3}}
> {{python3 -c 'from pyflink.table import EnvironmentSettings ; 
> EnvironmentSettings.new_instance()'}}
> {quote}
> The resulting error is
> {quote}{{The flink-python jar is not found in the opt folder of the 
> FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}}
> {{Error: Could not find or load main class 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.client.python.PythonGatewayServer}}
> {{Traceback (most recent call last):}}
> {{  File "", line 1, in }}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 214, in new_instance}}
> {{    return EnvironmentSettings.Builder()}}
> {{  File 
> "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py",
>  line 48, in {_}{{_}}init{{_}}{_}}}
> {{    gateway = get_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 62, in get_gateway}}
> {{    _gateway = launch_gateway()}}
> {{  File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", 
> line 112, in launch_gateway}}
> {{    raise Exception("Java gateway process exited before sending its port 
> number")}}
> {{Exception: Java gateway process exited before sending its port number}}
> {quote}
> The flink home under /lib64/ does not contain the jar, but it is in the /lib/ 
> location
> {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink 
> -name "flink-python*.jar"}}
> {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name 
> "flink-python*.jar"}}
> {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}}
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32040) The WatermarkStrategy defined with the Function(with_idleness) report an error

2023-06-14 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-32040:

Affects Version/s: 1.17.0
   1.16.0

> The WatermarkStrategy defined with the Function(with_idleness) report an error
> --
>
> Key: FLINK-32040
> URL: https://issues.apache.org/jira/browse/FLINK-32040
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Joekwal
>Priority: Major
>
> *version:* upgrade pyflink1.15.2 to pyflink1.16.1
>  
> *Report an error:*
> Record has Java Long.MIN_VALUE timestamp (= no timestamp marker). Is the time 
> characteristic set to 'ProcessingTime', or did you forget to call 
> 'data_stream.assign_timestamps_and_watermarks(...)'?
> The application before with version 1.15.2 has never reported the error.
>  
> *Example:*
> {code:java}
> ```python```
> class MyTimestampAssigner(TimestampAssigner):    
>def extract_timestamp(self, value, record_timestamp: int) -> int:        
>return value['version']
> sql="""
> select columns,version(milliseconds) from kafka_source
> """
> table = st_env.sql_query(sql)
> stream = st_env.to_changelog_stream(table)
> stream = stream.assign_timestamps_and_watermarks(            
> WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1))        
>     
> .with_timestamp_assigner(MyTimestampAssigner()).with_idleness(Duration.of_seconds(10)))
> stream = stream.key_by(CommonKeySelector()) \
> .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) \
> .process(WindowFunction(), typeInfo){code}
>  
> Try to debug to trace  
> ??pyflink.datastream.data_stream.DataStream.assign_timestamps_and_watermarks??
>  and find ??watermark_strategy._timestamp_assigner?? is none.
> *Solution:*
> Remove the function ??with_idleness(Duration.of_seconds(10))??
> {code:java}
> stream = stream.assign_timestamps_and_watermarks(
> WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1))
> .with_timestamp_assigner(MyTimestampAssigner())) {code}
> Is this a bug?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32040) The WatermarkStrategy defined with the Function(with_idleness) report an error

2023-06-14 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732524#comment-17732524
 ] 

Dian Fu commented on FLINK-32040:
-

Good catch! I think you are right that this is a bug. cc [~Juntao Hu] 

> The WatermarkStrategy defined with the Function(with_idleness) report an error
> --
>
> Key: FLINK-32040
> URL: https://issues.apache.org/jira/browse/FLINK-32040
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Joekwal
>Priority: Major
>
> *version:* upgrade pyflink1.15.2 to pyflink1.16.1
>  
> *Report an error:*
> Record has Java Long.MIN_VALUE timestamp (= no timestamp marker). Is the time 
> characteristic set to 'ProcessingTime', or did you forget to call 
> 'data_stream.assign_timestamps_and_watermarks(...)'?
> The application before with version 1.15.2 has never reported the error.
>  
> *Example:*
> {code:java}
> ```python```
> class MyTimestampAssigner(TimestampAssigner):    
>def extract_timestamp(self, value, record_timestamp: int) -> int:        
>return value['version']
> sql="""
> select columns,version(milliseconds) from kafka_source
> """
> table = st_env.sql_query(sql)
> stream = st_env.to_changelog_stream(table)
> stream = stream.assign_timestamps_and_watermarks(            
> WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1))        
>     
> .with_timestamp_assigner(MyTimestampAssigner()).with_idleness(Duration.of_seconds(10)))
> stream = stream.key_by(CommonKeySelector()) \
> .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) \
> .process(WindowFunction(), typeInfo){code}
>  
> Try to debug to trace  
> ??pyflink.datastream.data_stream.DataStream.assign_timestamps_and_watermarks??
>  and find ??watermark_strategy._timestamp_assigner?? is none.
> *Solution:*
> Remove the function ??with_idleness(Duration.of_seconds(10))??
> {code:java}
> stream = stream.assign_timestamps_and_watermarks(
> WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1))
> .with_timestamp_assigner(MyTimestampAssigner())) {code}
> Is this a bug?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32040) The WatermarkStrategy defined with the Function(with_idleness) report an error

2023-06-14 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-32040:

Priority: Major  (was: Blocker)

> The WatermarkStrategy defined with the Function(with_idleness) report an error
> --
>
> Key: FLINK-32040
> URL: https://issues.apache.org/jira/browse/FLINK-32040
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Joekwal
>Priority: Major
>
> *version:* upgrade pyflink1.15.2 to pyflink1.16.1
>  
> *Report an error:*
> Record has Java Long.MIN_VALUE timestamp (= no timestamp marker). Is the time 
> characteristic set to 'ProcessingTime', or did you forget to call 
> 'data_stream.assign_timestamps_and_watermarks(...)'?
> The application before with version 1.15.2 has never reported the error.
>  
> *Example:*
> {code:java}
> ```python```
> class MyTimestampAssigner(TimestampAssigner):    
>def extract_timestamp(self, value, record_timestamp: int) -> int:        
>return value['version']
> sql="""
> select columns,version(milliseconds) from kafka_source
> """
> table = st_env.sql_query(sql)
> stream = st_env.to_changelog_stream(table)
> stream = stream.assign_timestamps_and_watermarks(            
> WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1))        
>     
> .with_timestamp_assigner(MyTimestampAssigner()).with_idleness(Duration.of_seconds(10)))
> stream = stream.key_by(CommonKeySelector()) \
> .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) \
> .process(WindowFunction(), typeInfo){code}
>  
> Try to debug to trace  
> ??pyflink.datastream.data_stream.DataStream.assign_timestamps_and_watermarks??
>  and find ??watermark_strategy._timestamp_assigner?? is none.
> *Solution:*
> Remove the function ??with_idleness(Duration.of_seconds(10))??
> {code:java}
> stream = stream.assign_timestamps_and_watermarks(
> WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1))
> .with_timestamp_assigner(MyTimestampAssigner())) {code}
> Is this a bug?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32034) Python's DistUtils is deprecated as of 3.10

2023-06-14 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732518#comment-17732518
 ] 

Dian Fu commented on FLINK-32034:
-

[~coltenp] Thanks for reporting this issue and the finding. Would you like to 
create a PR for this issue?

> Python's DistUtils is deprecated as of 3.10
> ---
>
> Key: FLINK-32034
> URL: https://issues.apache.org/jira/browse/FLINK-32034
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
> Environment: Kubernetes
> Java 11
> Python 3.10.9
>Reporter: Colten Pilgreen
>Priority: Minor
> Attachments: get_site_packages_path_script.py, 
> get_site_packages_path_script_shortened.py
>
>
> I have recent just went through an upgrade from 1.13 to 1.17, along with that 
> I upgraded the python version on our Flink Session server. Most everything 
> that is part of our workflow works, except for Python Dependency Management.
> After doing some digging, I found the reason is due to the DeprecationWarning 
> that is printed when trying to get the site packages path. The script is 
> GET_SITE_PACKAGES_PATH_SCRIPT and it is executed in the getSitePackagesPath 
> method in the PythonEnvironmentManagerUtils class. The issue is that the 
> DeprecationWarning is included into the PYTHONPATH environment variable which 
> is passed to the beam runner. The deprecation warning breaks Python's ability 
> to find the site packages due to characters that are not allowed in 
> filesystem paths.
>  
> Example of the PYTHONPATH environment variable:
> PYTHONPATH == :1: DeprecationWarning: The distutils package is 
> deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 
> 632 for potential 
> alternatives:/tmp/python-dist-c63e1464-925c-4289-bb71-c6f50e83186f/python-requirements/lib/python3.10/site-packages
> HADOOP_CONF_DIR == /opt/flink/conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32207) Error import Pyflink.table.descriptors due to python3.10 version mismatch

2023-06-14 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32207.
---
Resolution: Duplicate

> Error import Pyflink.table.descriptors due to python3.10 version mismatch
> -
>
> Key: FLINK-32207
> URL: https://issues.apache.org/jira/browse/FLINK-32207
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
> Environment: Colab
> Python3.10 
>Reporter: Alireza Omidvar
>Priority: Major
> Fix For: 1.17.1
>
> Attachments: image (1).png, image (2).png
>
>
> Gentlemen,
>  
>  
>  
> I have problem with some apache-flink modules. I am running a 1.17.0 apache- 
> flink and I write test codes in Colab I faced a problem for import modules 
>  
>  
>  
>  
> from pyflink.table import DataTypes 
>  
> from pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 
>  
> from pyflink.table.catalog import FileSystem 
>  
>  
>  
>  
> not working for me (python version 3.10) 
>  
>  
> Any help is highly appreciated the strange is that other modules importing 
> fine.  I checked with your Github but didn't find these on yours too which 
> means modules are not inside your descriptor.py too. I think it needed 
> installation of connectors but it failed too. 
>  
>  
>  
>  
> Please see the link below: 
>  
>  
>  
>  
>  
> [https://github.com/aomidvar/scrapper-price-comparison/blob/d8a10f74101bf96974e769813c33b83d7a71f02b/kafkaconsumer1.ipynb]
>  
>  
>  
>  
> I am running a test after producing the stream 
> ([https://github.com/aomidvar/scrapper-price-comparison/blob/main/kafkaproducer1.ipynb])
>  to Confluent server and I like to do a flink job but the above mentioned 
> modules are not found with the following links in collab:
>  
>  
> That is not probably a bug. Only version of apache-flink now working on colab 
> is 1.17.0. I prefer 3.10 but installed a virtual python 3.8 env and between 
> different modules found out that Kafka and Json modules are not in 
> descriptors.py of version 1.17 Apache-flink default. But modules exist in 
> Apache-flink 1.13 version.
> [https://colab.research.google.com/drive/1aHKv8WA6RA10zTdwdzUubB5K0anEmOws?usp=sharing]
> [https://colab.research.google.com/drive/1eCHJlsb8AjdmJtPc95X3H4btmFVSoCL4?usp=sharing]
>  
> I've got this error for Json, Kafka ...
> ---
>  
>  ImportError Traceback (most recent call last)  
> in () 1 from pyflink.table import DataTypes > 2 from 
> pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 3 from 
> pyflink.table.catalog import FileSystem ImportError: cannot import name 
> 'Kafka' from 'pyflink.table.descriptors' 
> (/usr/local/lib/python3.10/dist-packages/pyflink/table/descriptors.py) 
>  
> ---
>  
>  NOTE: If your import is failing due to a missing package, you can manually 
> install dependencies using either !pip or !apt. To view examples of 
> installing some common dependencies, click the "Open Examples" button below.
>  
>  ---
>  
> I have doubt that if current error is related to a version and dependencies 
> then 
>  
> I have to ask the developer if I do this python 3.8 env is that possible to 
> get solved?
>  
>  
> Thanks for your time ,
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32207) Error import Pyflink.table.descriptors due to python3.10 version mismatch

2023-06-14 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732515#comment-17732515
 ] 

Dian Fu commented on FLINK-32207:
-

This seems duplicate with FLINK-32206. Closing this ticket~. Feel free to 
reopen it if I misunderstood the problem.

> Error import Pyflink.table.descriptors due to python3.10 version mismatch
> -
>
> Key: FLINK-32207
> URL: https://issues.apache.org/jira/browse/FLINK-32207
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
> Environment: Colab
> Python3.10 
>Reporter: Alireza Omidvar
>Priority: Major
> Fix For: 1.17.1
>
> Attachments: image (1).png, image (2).png
>
>
> Gentlemen,
>  
>  
>  
> I have problem with some apache-flink modules. I am running a 1.17.0 apache- 
> flink and I write test codes in Colab I faced a problem for import modules 
>  
>  
>  
>  
> from pyflink.table import DataTypes 
>  
> from pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 
>  
> from pyflink.table.catalog import FileSystem 
>  
>  
>  
>  
> not working for me (python version 3.10) 
>  
>  
> Any help is highly appreciated the strange is that other modules importing 
> fine.  I checked with your Github but didn't find these on yours too which 
> means modules are not inside your descriptor.py too. I think it needed 
> installation of connectors but it failed too. 
>  
>  
>  
>  
> Please see the link below: 
>  
>  
>  
>  
>  
> [https://github.com/aomidvar/scrapper-price-comparison/blob/d8a10f74101bf96974e769813c33b83d7a71f02b/kafkaconsumer1.ipynb]
>  
>  
>  
>  
> I am running a test after producing the stream 
> ([https://github.com/aomidvar/scrapper-price-comparison/blob/main/kafkaproducer1.ipynb])
>  to Confluent server and I like to do a flink job but the above mentioned 
> modules are not found with the following links in collab:
>  
>  
> That is not probably a bug. Only version of apache-flink now working on colab 
> is 1.17.0. I prefer 3.10 but installed a virtual python 3.8 env and between 
> different modules found out that Kafka and Json modules are not in 
> descriptors.py of version 1.17 Apache-flink default. But modules exist in 
> Apache-flink 1.13 version.
> [https://colab.research.google.com/drive/1aHKv8WA6RA10zTdwdzUubB5K0anEmOws?usp=sharing]
> [https://colab.research.google.com/drive/1eCHJlsb8AjdmJtPc95X3H4btmFVSoCL4?usp=sharing]
>  
> I've got this error for Json, Kafka ...
> ---
>  
>  ImportError Traceback (most recent call last)  
> in () 1 from pyflink.table import DataTypes > 2 from 
> pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 3 from 
> pyflink.table.catalog import FileSystem ImportError: cannot import name 
> 'Kafka' from 'pyflink.table.descriptors' 
> (/usr/local/lib/python3.10/dist-packages/pyflink/table/descriptors.py) 
>  
> ---
>  
>  NOTE: If your import is failing due to a missing package, you can manually 
> install dependencies using either !pip or !apt. To view examples of 
> installing some common dependencies, click the "Open Examples" button below.
>  
>  ---
>  
> I have doubt that if current error is related to a version and dependencies 
> then 
>  
> I have to ask the developer if I do this python 3.8 env is that possible to 
> get solved?
>  
>  
> Thanks for your time ,
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32206) ModuleNotFoundError for Pyflink.table.descriptors wheel version mismatch

2023-06-14 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732513#comment-17732513
 ] 

Dian Fu commented on FLINK-32206:
-

 'Kafka' has already been removed from 'pyflink.table.descriptors'. I guess you 
are referring an outdated example.

> ModuleNotFoundError for Pyflink.table.descriptors wheel version mismatch
> 
>
> Key: FLINK-32206
> URL: https://issues.apache.org/jira/browse/FLINK-32206
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Kafka, Connectors / MongoDB
>Affects Versions: 1.17.0
>Reporter: Alireza Omidvar
>Priority: Major
> Attachments: image (1).png, image (2).png
>
>
> Gentlemen,
> I have problem with some apache-flink modules. I am running a 1.17.0 apache- 
> flink and I write test codes in Colab I faced a problem on importing Kafka, 
> Json and FileSystem  modules 
>  
> from pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 
> from pyflink.table.catalog import FileSystem 
>  
> not working for me (python version 3.10) 
>  
> Any help is highly appreciated the strange is that other modules importing 
> fine.  I checked with your Github but didn't find these on official version 
> too which means modules are not inside the descriptor.py in newer version. 
>  
> Please see the link below: 
>  
> [https://github.com/aomidvar/scrapper-price-comparison/blob/d8a10f74101bf96974e769813c33b83d7a71f02b/kafkaconsumer1.ipynb]
>  
>  
> I am running a test after producing the stream 
> ([https://github.com/aomidvar/scrapper-price-comparison/blob/main/kafkaproducer1.ipynb])
>  to Confluent server and I like to do a flink job but the above mentioned 
> modules are not found with the following links in collab:
>  
> That is probably an easy fix bug. Only version of apache-flink now working on 
> colab is 1.17.0. I prefer 3.10 but installed a virtual python 3.8 env and 
> between different modules found out that Kafka and Json modules are not in 
> descriptors.py of version 1.17 Apache-flink default. But modules exist in 
> Apache-flink 1.13 version.
> [https://colab.research.google.com/drive/1aHKv8WA6RA10zTdwdzUubB5K0anEmOws?usp=sharing]
> [https://colab.research.google.com/drive/1eCHJlsb8AjdmJtPc95X3H4btmFVSoCL4?usp=sharing]
>  
> I've got this error for Json, Kafka ...
> ---
>  
>  ImportError Traceback (most recent call last)  
> in () 1 from pyflink.table import DataTypes > 2 from 
> pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 3 from 
> pyflink.table.catalog import FileSystem ImportError: cannot import name 
> 'Kafka' from 'pyflink.table.descriptors' 
> (/usr/local/lib/python3.10/dist-packages/pyflink/table/descriptors.py) 
>  
> ---
>  
>  NOTE: If your import is failing due to a missing package, you can manually 
> install dependencies using either !pip or !apt. To view examples of 
> installing some common dependencies, click the "Open Examples" button below.
>  
>  ---
>  
> I have doubt that if current error is related to a version and dependencies 
> then 
>  
> I have to ask the developer if I do this python 3.8 env is that possible to 
> get solved?
>  
>  
> Thanks for your time ,
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-31916) Python API only respects deprecated env.java.opts key

2023-06-14 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu resolved FLINK-31916.
-
Resolution: Fixed

Merged to master via 626d70d86d3bee944edf548606b38042f31e09b6

> Python API only respects deprecated env.java.opts key
> -
>
> Key: FLINK-31916
> URL: https://issues.apache.org/jira/browse/FLINK-31916
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Runtime / Configuration
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> pyflink_gateway_server.py is only reading the deprecated env.java.opts from 
> the configuration.
> This key should only be used as a fallback, with env.java.opts.tm/jm/client 
> being the actual keys to support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32056) Update the used Pulsar connector in flink-python to 4.0.0

2023-05-23 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32056.
---
Fix Version/s: 1.18.0
   1.17.2
   Resolution: Fixed

Fixed in:
- master via fbf7b91424ec626ae56dd2477347a7759db6d5fe
- release-1.17 via d3a3755a7eef5708871580671169fd6bd2babf28

> Update the used Pulsar connector in flink-python to 4.0.0
> -
>
> Key: FLINK-32056
> URL: https://issues.apache.org/jira/browse/FLINK-32056
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Pulsar
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.2
>
>
> flink-python still references and tests flink-connector-pulsar:3.0.0, while 
> it should be using flink-connector-pulsar:4.0.0. That's because the newer 
> version is the only version compatible with Flink 1.17 and it doesn't rely on 
> flink-shaded. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32131) Support specifying hadoop config dir for Python HiveCatalog

2023-05-21 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-32131.
---
Resolution: Fixed

Merged to master via 5a89d22c146f451f12fd0c6de64804d315e1f4b6

> Support specifying hadoop config dir for Python HiveCatalog
> ---
>
> Key: FLINK-32131
> URL: https://issues.apache.org/jira/browse/FLINK-32131
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Hadoop config directory could be specified for HiveCatalog in Java, however, 
> this is still not supported in Python HiveCatalog. This JIRA is to align them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32131) Support specifying hadoop config dir for Python HiveCatalog

2023-05-18 Thread Dian Fu (Jira)
Dian Fu created FLINK-32131:
---

 Summary: Support specifying hadoop config dir for Python 
HiveCatalog
 Key: FLINK-32131
 URL: https://issues.apache.org/jira/browse/FLINK-32131
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.18.0


Hadoop config directory could be specified for HiveCatalog in Java, however, 
this is still not supported in Python HiveCatalog. This JIRA is to align them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-20794) Support to select distinct columns in the Table API

2023-05-14 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-20794.
---
Fix Version/s: (was: 1.18.0)
   Resolution: Not A Problem

> Support to select distinct columns in the Table API
> ---
>
> Key: FLINK-20794
> URL: https://issues.apache.org/jira/browse/FLINK-20794
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Dian Fu
>Assignee: Vishaal Selvaraj
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Currently, there is no corresponding functionality in Table API for the 
> following SQL:
> {code:java}
> SELECT DISTINCT users FROM Orders
> {code}
> For example, for the following job:
> {code:java}
> table.select("distinct a")
> {code}
> It will thrown the following exception:
> {code:java}
> org.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' 
> foundorg.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' founddistinct a         ^
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47)
>  at 
> org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40)
>  at 
> org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20794) Support to select distinct columns in the Table API

2023-05-14 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722623#comment-17722623
 ] 

Dian Fu commented on FLINK-20794:
-

Thanks for the analysis. Since string based interfaces have been removed, I 
guess we could close this ticket~

> Support to select distinct columns in the Table API
> ---
>
> Key: FLINK-20794
> URL: https://issues.apache.org/jira/browse/FLINK-20794
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Dian Fu
>Assignee: Vishaal Selvaraj
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: screenshot-1.png
>
>
> Currently, there is no corresponding functionality in Table API for the 
> following SQL:
> {code:java}
> SELECT DISTINCT users FROM Orders
> {code}
> For example, for the following job:
> {code:java}
> table.select("distinct a")
> {code}
> It will thrown the following exception:
> {code:java}
> org.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' 
> foundorg.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' founddistinct a         ^
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47)
>  at 
> org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40)
>  at 
> org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor

2023-05-04 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719599#comment-17719599
 ] 

Dian Fu edited comment on FLINK-31968 at 5/5/23 2:27 AM:
-

[~gradysnik] Thanks for the confirmation. I'm closing this ticket since it 
should have been addressed in FLINK-28786. Feel free to reopen it if this issue 
still happens after 1.17.1.


was (Author: dianfu):
[~gradysnik] Thanks for the confirmation. I'm closing this ticket since it has 
been addressed in FLINK-28786. Feel free to reopen it if this issue still 
happens after 1.17.1.

> [PyFlink] 1.17.0 version on M1 processor
> 
>
> Key: FLINK-31968
> URL: https://issues.apache.org/jira/browse/FLINK-31968
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yury Smirnov
>Priority: Major
> Attachments: 1170_output.log
>
>
> PyFlink version 1.17.0
> NumPy version 1.21.4 and 1.21.6
> Python versions 3.8, 3.9, 3.9
> Slack thread: 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] 
> While running any of [PyFlink 
> Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples]
>  , getting error:
> {code:java}
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python 
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py 
> Traceback (most recent call last):
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 76, in 
>     basic_operations()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 53, in basic_operations
>     show(ds.map(update_tel), env)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 28, in show
>     env.execute()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py",
>  line 764, in execute
>     return 
> JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py",
>  line 326, in get_return_value
>     raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute.
> : org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>     at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> 

[jira] [Commented] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor

2023-05-04 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719599#comment-17719599
 ] 

Dian Fu commented on FLINK-31968:
-

[~gradysnik] Thanks for the confirmation. I'm closing this ticket since it has 
been addressed in FLINK-28786. Feel free to reopen it if this issue still 
happens after 1.17.1.

> [PyFlink] 1.17.0 version on M1 processor
> 
>
> Key: FLINK-31968
> URL: https://issues.apache.org/jira/browse/FLINK-31968
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yury Smirnov
>Priority: Major
> Attachments: 1170_output.log
>
>
> PyFlink version 1.17.0
> NumPy version 1.21.4 and 1.21.6
> Python versions 3.8, 3.9, 3.9
> Slack thread: 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] 
> While running any of [PyFlink 
> Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples]
>  , getting error:
> {code:java}
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python 
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py 
> Traceback (most recent call last):
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 76, in 
>     basic_operations()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 53, in basic_operations
>     show(ds.map(update_tel), env)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 28, in show
>     env.execute()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py",
>  line 764, in execute
>     return 
> JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py",
>  line 326, in get_return_value
>     raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute.
> : org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>     at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
>     at akka.dispatch.OnComplete.internal(Future.scala:300)
>     at akka.dispatch.OnComplete.internal(Future.scala:297)
>     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
>     at 

[jira] [Closed] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor

2023-05-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31968.
---
Resolution: Duplicate

> [PyFlink] 1.17.0 version on M1 processor
> 
>
> Key: FLINK-31968
> URL: https://issues.apache.org/jira/browse/FLINK-31968
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Yury Smirnov
>Priority: Major
> Attachments: 1170_output.log
>
>
> PyFlink version 1.17.0
> NumPy version 1.21.4 and 1.21.6
> Python versions 3.8, 3.9, 3.9
> Slack thread: 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] 
> While running any of [PyFlink 
> Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples]
>  , getting error:
> {code:java}
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python 
> /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py 
> Traceback (most recent call last):
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 76, in 
>     basic_operations()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 53, in basic_operations
>     show(ds.map(update_tel), env)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", 
> line 28, in show
>     env.execute()
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py",
>  line 764, in execute
>     return 
> JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py",
>  line 326, in get_return_value
>     raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute.
> : org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>     at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>     at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>     at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
>     at akka.dispatch.OnComplete.internal(Future.scala:300)
>     at akka.dispatch.OnComplete.internal(Future.scala:297)
>     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
>     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
>     at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>     at 
> 

[jira] [Updated] (FLINK-31988) Implement Python wrapper for new KDS source

2023-05-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31988:

Component/s: API / Python
 Connectors / Kinesis

> Implement Python wrapper for new KDS source
> ---
>
> Key: FLINK-31988
> URL: https://issues.apache.org/jira/browse/FLINK-31988
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Connectors / Kinesis
>Reporter: Hong Liang Teoh
>Priority: Major
>
> *What?*
> - Implement Python wrapper for KDS source
> - Write tests for this KDS source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20794) Support to select distinct columns in the Table API

2023-05-04 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719245#comment-17719245
 ] 

Dian Fu commented on FLINK-20794:
-

[~supercmmetry] Thanks for taking this ticket. Have assigned it to you~

> Support to select distinct columns in the Table API
> ---
>
> Key: FLINK-20794
> URL: https://issues.apache.org/jira/browse/FLINK-20794
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Dian Fu
>Assignee: Vishaal Selvaraj
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently, there is no corresponding functionality in Table API for the 
> following SQL:
> {code:java}
> SELECT DISTINCT users FROM Orders
> {code}
> For example, for the following job:
> {code:java}
> table.select("distinct a")
> {code}
> It will thrown the following exception:
> {code:java}
> org.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' 
> foundorg.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' founddistinct a         ^
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47)
>  at 
> org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40)
>  at 
> org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-20794) Support to select distinct columns in the Table API

2023-05-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-20794:
---

Assignee: Vishaal Selvaraj

> Support to select distinct columns in the Table API
> ---
>
> Key: FLINK-20794
> URL: https://issues.apache.org/jira/browse/FLINK-20794
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Dian Fu
>Assignee: Vishaal Selvaraj
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently, there is no corresponding functionality in Table API for the 
> following SQL:
> {code:java}
> SELECT DISTINCT users FROM Orders
> {code}
> For example, for the following job:
> {code:java}
> table.select("distinct a")
> {code}
> It will thrown the following exception:
> {code:java}
> org.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' 
> foundorg.apache.flink.table.api.ExpressionParserException: Could not parse 
> expression at column 10: ',' expected but 'a' founddistinct a         ^
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710)
>  at 
> org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47)
>  at 
> org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40)
>  at 
> org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip

2023-04-27 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716967#comment-17716967
 ] 

Dian Fu edited comment on FLINK-28786 at 4/27/23 10:04 AM:
---

Fixed in:
 - master via a3368635e3d06f764d144f8c8e2e06e499e79665
 - release-1.17 via 52c9742eed7128284278b07c40785bf1c4e30139
 - release-1.16 via ae998dda4ee60e2268a8c4f8bfdbbd46cc1a0746


was (Author: dianfu):
Fixed in:
- master via a3368635e3d06f764d144f8c8e2e06e499e79665
- release-1.7 via 52c9742eed7128284278b07c40785bf1c4e30139

> Cannot run PyFlink 1.16 on MacOS with M1 chip
> -
>
> Key: FLINK-28786
> URL: https://issues.apache.org/jira/browse/FLINK-28786
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.16.0
>Reporter: Ran Tao
>Assignee: Huang Xingbo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
>
> I have tested it with 2 m1 machines. i will reproduce the bug 100%.
> 1.m1 machine
> macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> 1.m1 machine
> macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> reproduce step:
> 1.python -m pip install -r flink-python/dev/dev-requirements.txt
> 2.cd flink-python; python setup.py sdist bdist_wheel; cd 
> apache-flink-libraries; python setup.py sdist; cd ..;
> 3.python -m pip install apache-flink-libraries/dist/*.tar.gz
> 4.python -m pip install dist/*.whl
> when run 
> [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/]
>  it will cause
> {code:java}
> :219: RuntimeWarning: 
> apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate 
> binary incompatibility. Expected 24 from C header, got 32 from PyObject
> Traceback (most recent call last):
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 129, in 
> word_count(known_args.input, known_args.output)
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 49, in word_count
> t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 121, in create
> return TableEnvironment(j_tenv)
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 100, in __init__
> self._open()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1637, in _open
> startup_loopback_server()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1628, in startup_loopback_server
> from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 44, in 
> from pyflink.fn_execution.beam import beam_sdk_worker_main  # noqa # 
> pylint: disable=unused-import
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_sdk_worker_main.py",
>  line 21, in 
> import pyflink.fn_execution.beam.beam_operations # noqa # pylint: 
> disable=unused-import
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_operations.py",
>  line 27, in 
> from pyflink.fn_execution.state_impl import RemoteKeyedStateBackend, 
> RemoteOperatorStateBackend
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/state_impl.py",
>  line 33, in 
> from pyflink.fn_execution.beam.beam_coders import FlinkCoder
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_coders.py",
>  line 27, in 
> from pyflink.fn_execution.beam import beam_coder_impl_fast as 
> beam_coder_impl
>   File "pyflink/fn_execution/beam/beam_coder_impl_fast.pyx", line 1, in init 
> pyflink.fn_execution.beam.beam_coder_impl_fast
> KeyError: '__pyx_vtable__'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip

2023-04-27 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-28786.
---
Fix Version/s: 1.17.1
   (was: 1.17.0)
   Resolution: Fixed

> Cannot run PyFlink 1.16 on MacOS with M1 chip
> -
>
> Key: FLINK-28786
> URL: https://issues.apache.org/jira/browse/FLINK-28786
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.16.0
>Reporter: Ran Tao
>Assignee: Huang Xingbo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.17.1
>
>
> I have tested it with 2 m1 machines. i will reproduce the bug 100%.
> 1.m1 machine
> macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> 1.m1 machine
> macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> reproduce step:
> 1.python -m pip install -r flink-python/dev/dev-requirements.txt
> 2.cd flink-python; python setup.py sdist bdist_wheel; cd 
> apache-flink-libraries; python setup.py sdist; cd ..;
> 3.python -m pip install apache-flink-libraries/dist/*.tar.gz
> 4.python -m pip install dist/*.whl
> when run 
> [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/]
>  it will cause
> {code:java}
> :219: RuntimeWarning: 
> apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate 
> binary incompatibility. Expected 24 from C header, got 32 from PyObject
> Traceback (most recent call last):
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 129, in 
> word_count(known_args.input, known_args.output)
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 49, in word_count
> t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 121, in create
> return TableEnvironment(j_tenv)
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 100, in __init__
> self._open()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1637, in _open
> startup_loopback_server()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1628, in startup_loopback_server
> from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 44, in 
> from pyflink.fn_execution.beam import beam_sdk_worker_main  # noqa # 
> pylint: disable=unused-import
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_sdk_worker_main.py",
>  line 21, in 
> import pyflink.fn_execution.beam.beam_operations # noqa # pylint: 
> disable=unused-import
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_operations.py",
>  line 27, in 
> from pyflink.fn_execution.state_impl import RemoteKeyedStateBackend, 
> RemoteOperatorStateBackend
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/state_impl.py",
>  line 33, in 
> from pyflink.fn_execution.beam.beam_coders import FlinkCoder
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_coders.py",
>  line 27, in 
> from pyflink.fn_execution.beam import beam_coder_impl_fast as 
> beam_coder_impl
>   File "pyflink/fn_execution/beam/beam_coder_impl_fast.pyx", line 1, in init 
> pyflink.fn_execution.beam.beam_coder_impl_fast
> KeyError: '__pyx_vtable__'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip

2023-04-26 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716967#comment-17716967
 ] 

Dian Fu commented on FLINK-28786:
-

Fixed in:
- master via a3368635e3d06f764d144f8c8e2e06e499e79665
- release-1.7 via 52c9742eed7128284278b07c40785bf1c4e30139

> Cannot run PyFlink 1.16 on MacOS with M1 chip
> -
>
> Key: FLINK-28786
> URL: https://issues.apache.org/jira/browse/FLINK-28786
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.16.0
>Reporter: Ran Tao
>Assignee: Huang Xingbo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
>
> I have tested it with 2 m1 machines. i will reproduce the bug 100%.
> 1.m1 machine
> macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> 1.m1 machine
> macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> reproduce step:
> 1.python -m pip install -r flink-python/dev/dev-requirements.txt
> 2.cd flink-python; python setup.py sdist bdist_wheel; cd 
> apache-flink-libraries; python setup.py sdist; cd ..;
> 3.python -m pip install apache-flink-libraries/dist/*.tar.gz
> 4.python -m pip install dist/*.whl
> when run 
> [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/]
>  it will cause
> {code:java}
> :219: RuntimeWarning: 
> apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate 
> binary incompatibility. Expected 24 from C header, got 32 from PyObject
> Traceback (most recent call last):
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 129, in 
> word_count(known_args.input, known_args.output)
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 49, in word_count
> t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 121, in create
> return TableEnvironment(j_tenv)
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 100, in __init__
> self._open()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1637, in _open
> startup_loopback_server()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1628, in startup_loopback_server
> from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 44, in 
> from pyflink.fn_execution.beam import beam_sdk_worker_main  # noqa # 
> pylint: disable=unused-import
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_sdk_worker_main.py",
>  line 21, in 
> import pyflink.fn_execution.beam.beam_operations # noqa # pylint: 
> disable=unused-import
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_operations.py",
>  line 27, in 
> from pyflink.fn_execution.state_impl import RemoteKeyedStateBackend, 
> RemoteOperatorStateBackend
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/state_impl.py",
>  line 33, in 
> from pyflink.fn_execution.beam.beam_coders import FlinkCoder
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_coders.py",
>  line 27, in 
> from pyflink.fn_execution.beam import beam_coder_impl_fast as 
> beam_coder_impl
>   File "pyflink/fn_execution/beam/beam_coder_impl_fast.pyx", line 1, in init 
> pyflink.fn_execution.beam.beam_coder_impl_fast
> KeyError: '__pyx_vtable__'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31949) The state of CountTumblingWindowAssigner of Python DataStream API were never purged

2023-04-26 Thread Dian Fu (Jira)
Dian Fu created FLINK-31949:
---

 Summary: The state of CountTumblingWindowAssigner of Python 
DataStream API were never purged
 Key: FLINK-31949
 URL: https://issues.apache.org/jira/browse/FLINK-31949
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu


Posted By Urs from user-mailing list:
{code:java}
"In FLINK-26444, a couple of convenience window assigners were added to
the Python Datastream API, including CountTumblingWindowAssigner. This
assigner uses a CountTrigger by default, which produces TriggerResult.FIRE.

As such, using this window assigner on a data stream will always produce
a "state leak" since older count windows will always be retained without
any chance to work on the elements again."
{code}
See [https://lists.apache.org/thread/ql8x283xzgd98z0vsqr9npl5j74hscsm] for more 
details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip

2023-04-26 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716757#comment-17716757
 ] 

Dian Fu edited comment on FLINK-28786 at 4/26/23 2:41 PM:
--

Reopen it as one user reported an issue related to Mac M1 on 1.17.0 
(https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679904702297129):

{code}
Traceback (most recent call last):
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 287, in _execute
response = task()
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 360, in 
lambda: self.create_worker().do_instruction(request), request)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 597, in do_instruction
getattr(request, request_type), request.instruction_id)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 634, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1004, in process_bundle
element.data)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 226, in process_encoded
input_stream, True)
  File "apache_beam/coders/coder_impl.py", line 1519, in 
apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream
  File "apache_beam/coders/coder_impl.py", line 1520, in 
apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream
  File "apache_beam/coders/coder_impl.py", line 135, in 
apache_beam.coders.coder_impl.CoderImpl.decode_from_stream
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py",
 line 34, in decode_from_stream
return self._value_coder.decode_from_stream(in_stream, nested)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py",
 line 58, in decode_from_stream
return self._value_coder.decode_from_stream(data_input_stream)
TypeError: Argument 'input_stream' has incorrect type (expected 
pyflink.fn_execution.stream_fast.LengthPrefixInputStream, got BeamInputStream)
{code}


was (Author: dianfu):
Reopen it as one user reported an issue related to Mac M1 
(https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679904702297129):

{code}
Traceback (most recent call last):
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 287, in _execute
response = task()
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 360, in 
lambda: self.create_worker().do_instruction(request), request)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 597, in do_instruction
getattr(request, request_type), request.instruction_id)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 634, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1004, in process_bundle
element.data)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 226, in process_encoded
input_stream, True)
  File "apache_beam/coders/coder_impl.py", line 1519, in 
apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream
  File "apache_beam/coders/coder_impl.py", line 1520, in 
apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream
  File "apache_beam/coders/coder_impl.py", line 135, in 
apache_beam.coders.coder_impl.CoderImpl.decode_from_stream
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py",
 line 34, in decode_from_stream
return self._value_coder.decode_from_stream(in_stream, nested)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py",
 line 58, in decode_from_stream
return self._value_coder.decode_from_stream(data_input_stream)
TypeError: Argument 'input_stream' has incorrect type (expected 
pyflink.fn_execution.stream_fast.LengthPrefixInputStream, got BeamInputStream)
{code}

> Cannot run PyFlink 1.16 on MacOS with M1 chip
> -
>
> Key: FLINK-28786
> URL: https://issues.apache.org/jira/browse/FLINK-28786
> Project: 

[jira] [Reopened] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip

2023-04-26 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reopened FLINK-28786:
-

Reopen it as one user reported an issue related to Mac M1 
(https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679904702297129):

{code}
Traceback (most recent call last):
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 287, in _execute
response = task()
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 360, in 
lambda: self.create_worker().do_instruction(request), request)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 597, in do_instruction
getattr(request, request_type), request.instruction_id)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 634, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1004, in process_bundle
element.data)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 226, in process_encoded
input_stream, True)
  File "apache_beam/coders/coder_impl.py", line 1519, in 
apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream
  File "apache_beam/coders/coder_impl.py", line 1520, in 
apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream
  File "apache_beam/coders/coder_impl.py", line 135, in 
apache_beam.coders.coder_impl.CoderImpl.decode_from_stream
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py",
 line 34, in decode_from_stream
return self._value_coder.decode_from_stream(in_stream, nested)
  File 
"/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py",
 line 58, in decode_from_stream
return self._value_coder.decode_from_stream(data_input_stream)
TypeError: Argument 'input_stream' has incorrect type (expected 
pyflink.fn_execution.stream_fast.LengthPrefixInputStream, got BeamInputStream)
{code}

> Cannot run PyFlink 1.16 on MacOS with M1 chip
> -
>
> Key: FLINK-28786
> URL: https://issues.apache.org/jira/browse/FLINK-28786
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.16.0
>Reporter: Ran Tao
>Assignee: Huang Xingbo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
>
> I have tested it with 2 m1 machines. i will reproduce the bug 100%.
> 1.m1 machine
> macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> 1.m1 machine
> macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9
> reproduce step:
> 1.python -m pip install -r flink-python/dev/dev-requirements.txt
> 2.cd flink-python; python setup.py sdist bdist_wheel; cd 
> apache-flink-libraries; python setup.py sdist; cd ..;
> 3.python -m pip install apache-flink-libraries/dist/*.tar.gz
> 4.python -m pip install dist/*.whl
> when run 
> [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/]
>  it will cause
> {code:java}
> :219: RuntimeWarning: 
> apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate 
> binary incompatibility. Expected 24 from C header, got 32 from PyObject
> Traceback (most recent call last):
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 129, in 
> word_count(known_args.input, known_args.output)
>   File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", 
> line 49, in word_count
> t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 121, in create
> return TableEnvironment(j_tenv)
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 100, in __init__
> self._open()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1637, in _open
> startup_loopback_server()
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py",
>  line 1628, in startup_loopback_server
> from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 44, in 
> from pyflink.fn_execution.beam 

[jira] [Updated] (FLINK-31905) Exception thrown when accessing nested field of the result of Python UDF with complex result type

2023-04-24 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31905:

Description: 
For the following job:
{code}
import logging, sys

from pyflink.common import Row
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import Schema, DataTypes, TableDescriptor, 
StreamTableEnvironment
from pyflink.table.expressions import col, row
from pyflink.table.udf import ACC, T, udaf, AggregateFunction, udf

logging.basicConfig(stream=sys.stdout, level=logging.ERROR, 
format="%(message)s")


class EmitLastState(AggregateFunction):
"""
Aggregator that emits the latest state for the purpose of
enabling parallelism on CDC tables.
"""

def create_accumulator(self) -> ACC:
return Row(None, None)

def accumulate(self, accumulator: ACC, *args):
key, obj = args
if (accumulator[0] is None) or (key > accumulator[0]):
accumulator[0] = key
accumulator[1] = obj

def retract(self, accumulator: ACC, *args):
pass

def get_value(self, accumulator: ACC) -> T:
return accumulator[1]


some_complex_inner_type = DataTypes.ROW(
[
DataTypes.FIELD("f0", DataTypes.STRING()),
DataTypes.FIELD("f1", DataTypes.STRING())
]
)

some_complex_type = DataTypes.ROW(
[
DataTypes.FIELD(k, DataTypes.ARRAY(some_complex_inner_type))
for k in ("f0", "f1", "f2")
]
+ [
DataTypes.FIELD("f3", DataTypes.DATE()),
DataTypes.FIELD("f4", DataTypes.VARCHAR(32)),
DataTypes.FIELD("f5", DataTypes.VARCHAR(2)),
]
)

@udf(input_types=DataTypes.STRING(), result_type=some_complex_type)
def complex_udf(s):
return Row(f0=None, f1=None, f2=None, f3=None, f4=None, f5=None)


if __name__ == "__main__":
env = StreamExecutionEnvironment.get_execution_environment()
table_env = StreamTableEnvironment.create(env)
table_env.get_config().set('pipeline.classpaths', 
'file:///Users/dianfu/code/src/workspace/pyflink-examples/flink-sql-connector-postgres-cdc-2.1.1.jar')

# Create schema
_schema = {
"p_key": DataTypes.INT(False),
"modified_id": DataTypes.INT(False),
"content": DataTypes.STRING()
}
schema = Schema.new_builder().from_fields(
*zip(*[(k, v) for k, v in _schema.items()])
).\
primary_key("p_key").\
build()

# Create table descriptor
descriptor = TableDescriptor.for_connector("postgres-cdc").\
option("hostname", "host.docker.internal").\
option("port", "5432").\
option("database-name", "flink_issue").\
option("username", "root").\
option("password", "root").\
option("debezium.plugin.name", "pgoutput").\
option("schema-name", "flink_schema").\
option("table-name", "flink_table").\
option("slot.name", "flink_slot").\
schema(schema).\
build()

table_env.create_temporary_table("flink_table", descriptor)

# Create changelog stream
stream = table_env.from_path("flink_table")\

# Define UDAF
accumulator_type = DataTypes.ROW(
[
DataTypes.FIELD("f0", DataTypes.INT(False)),
DataTypes.FIELD("f1", DataTypes.ROW([DataTypes.FIELD(k, v) for k, v 
in _schema.items()])),
]
)
result_type = DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in 
_schema.items()])
emit_last = udaf(EmitLastState(), accumulator_type=accumulator_type, 
result_type=result_type)

# Emit last state based on modified_id to enable parallel processing
stream = stream.\
group_by(col("p_key")).\
select(
col("p_key"),
emit_last(col("modified_id"),row(*(col(k) for k in 
_schema.keys())).cast(result_type)).alias("tmp_obj")
)

# Select the elements of the objects
stream = stream.select(*(col("tmp_obj").get(k).alias(k) for k in 
_schema.keys()))

# We apply a UDF which parses the xml and returns a complex nested structure
stream = stream.select(col("p_key"), 
complex_udf(col("content")).alias("nested_obj"))

# We select an element from the nested structure in order to flatten it
# The next line is the line causing issues, commenting the next line will 
make the pipeline work
stream = stream.select(col("p_key"), col("nested_obj").get("f0"))

# Interestingly, the below part does work...
# stream = stream.select(col("nested_obj").get("f0"))

table_env.to_changelog_stream(stream).print()

# Execute
env.execute_async()
{code}

{code}
py4j.protocol.Py4JJavaError: An error occurred while calling 
o8.toChangelogStream.
: java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
at 
java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
at 

[jira] [Created] (FLINK-31905) Exception thrown when accessing nested field of the result of Python UDF with complex type

2023-04-24 Thread Dian Fu (Jira)
Dian Fu created FLINK-31905:
---

 Summary: Exception thrown when accessing nested field of the 
result of Python UDF with complex type
 Key: FLINK-31905
 URL: https://issues.apache.org/jira/browse/FLINK-31905
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu


For the following job:
{code}
import logging, sys

from pyflink.common import Row
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import Schema, DataTypes, TableDescriptor, 
StreamTableEnvironment
from pyflink.table.expressions import col, row
from pyflink.table.udf import ACC, T, udaf, AggregateFunction, udf

logging.basicConfig(stream=sys.stdout, level=logging.ERROR, 
format="%(message)s")


class EmitLastState(AggregateFunction):
"""
Aggregator that emits the latest state for the purpose of
enabling parallelism on CDC tables.
"""

def create_accumulator(self) -> ACC:
return Row(None, None)

def accumulate(self, accumulator: ACC, *args):
key, obj = args
if (accumulator[0] is None) or (key > accumulator[0]):
accumulator[0] = key
accumulator[1] = obj

def retract(self, accumulator: ACC, *args):
pass

def get_value(self, accumulator: ACC) -> T:
return accumulator[1]


some_complex_inner_type = DataTypes.ROW(
[
DataTypes.FIELD("f0", DataTypes.STRING()),
DataTypes.FIELD("f1", DataTypes.STRING())
]
)

some_complex_type = DataTypes.ROW(
[
DataTypes.FIELD(k, DataTypes.ARRAY(some_complex_inner_type))
for k in ("f0", "f1", "f2")
]
+ [
DataTypes.FIELD("f3", DataTypes.DATE()),
DataTypes.FIELD("f4", DataTypes.VARCHAR(32)),
DataTypes.FIELD("f5", DataTypes.VARCHAR(2)),
]
)

@udf(input_types=DataTypes.STRING(), result_type=some_complex_type)
def complex_udf(s):
return Row(f0=None, f1=None, f2=None, f3=None, f4=None, f5=None)


if __name__ == "__main__":
env = StreamExecutionEnvironment.get_execution_environment()
table_env = StreamTableEnvironment.create(env)
table_env.get_config().set('pipeline.classpaths', 
'file:///Users/dianfu/code/src/workspace/pyflink-examples/flink-sql-connector-postgres-cdc-2.1.1.jar')

# Create schema
_schema = {
"p_key": DataTypes.INT(False),
"modified_id": DataTypes.INT(False),
"content": DataTypes.STRING()
}
schema = Schema.new_builder().from_fields(
*zip(*[(k, v) for k, v in _schema.items()])
).\
primary_key("p_key").\
build()

# Create table descriptor
descriptor = TableDescriptor.for_connector("postgres-cdc").\
option("hostname", "host.docker.internal").\
option("port", "5432").\
option("database-name", "flink_issue").\
option("username", "root").\
option("password", "root").\
option("debezium.plugin.name", "pgoutput").\
option("schema-name", "flink_schema").\
option("table-name", "flink_table").\
option("slot.name", "flink_slot").\
schema(schema).\
build()

table_env.create_temporary_table("flink_table", descriptor)

# Create changelog stream
stream = table_env.from_path("flink_table")\

# Define UDAF
accumulator_type = DataTypes.ROW(
[
DataTypes.FIELD("f0", DataTypes.INT(False)),
DataTypes.FIELD("f1", DataTypes.ROW([DataTypes.FIELD(k, v) for k, v 
in _schema.items()])),
]
)
result_type = DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in 
_schema.items()])
emit_last = udaf(EmitLastState(), accumulator_type=accumulator_type, 
result_type=result_type)

# Emit last state based on modified_id to enable parallel processing
stream = stream.\
group_by(col("p_key")).\
select(
col("p_key"),
emit_last(col("modified_id"),row(*(col(k) for k in 
_schema.keys())).cast(result_type)).alias("tmp_obj")
)

# Select the elements of the objects
stream = stream.select(*(col("tmp_obj").get(k).alias(k) for k in 
_schema.keys()))

# We apply a UDF which parses the xml and returns a complex nested structure
stream = stream.select(col("p_key"), 
complex_udf(col("content")).alias("nested_obj"))

# We select an element from the nested structure in order to flatten it
# The next line is the line causing issues, commenting the next line will 
make the pipeline work
stream = stream.select(col("p_key"), col("nested_obj").get("f0"))

# Interestingly, the below part does work...
# stream = stream.select(col("nested_obj").get("f0"))

table_env.to_changelog_stream(stream).print()

# Execute
env.execute_async()
{code}

{code}
py4j.protocol.Py4JJavaError: An error occurred while calling 
o8.toChangelogStream.
: java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
at 

[jira] [Updated] (FLINK-31905) Exception thrown when accessing nested field of the result of Python UDF with complex result type

2023-04-24 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31905:

Summary: Exception thrown when accessing nested field of the result of 
Python UDF with complex result type  (was: Exception thrown when accessing 
nested field of the result of Python UDF with complex type)

> Exception thrown when accessing nested field of the result of Python UDF with 
> complex result type
> -
>
> Key: FLINK-31905
> URL: https://issues.apache.org/jira/browse/FLINK-31905
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Priority: Major
>
> For the following job:
> {code}
> import logging, sys
> from pyflink.common import Row
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import Schema, DataTypes, TableDescriptor, 
> StreamTableEnvironment
> from pyflink.table.expressions import col, row
> from pyflink.table.udf import ACC, T, udaf, AggregateFunction, udf
> logging.basicConfig(stream=sys.stdout, level=logging.ERROR, 
> format="%(message)s")
> class EmitLastState(AggregateFunction):
> """
> Aggregator that emits the latest state for the purpose of
> enabling parallelism on CDC tables.
> """
> def create_accumulator(self) -> ACC:
> return Row(None, None)
> def accumulate(self, accumulator: ACC, *args):
> key, obj = args
> if (accumulator[0] is None) or (key > accumulator[0]):
> accumulator[0] = key
> accumulator[1] = obj
> def retract(self, accumulator: ACC, *args):
> pass
> def get_value(self, accumulator: ACC) -> T:
> return accumulator[1]
> some_complex_inner_type = DataTypes.ROW(
> [
> DataTypes.FIELD("f0", DataTypes.STRING()),
> DataTypes.FIELD("f1", DataTypes.STRING())
> ]
> )
> some_complex_type = DataTypes.ROW(
> [
> DataTypes.FIELD(k, DataTypes.ARRAY(some_complex_inner_type))
> for k in ("f0", "f1", "f2")
> ]
> + [
> DataTypes.FIELD("f3", DataTypes.DATE()),
> DataTypes.FIELD("f4", DataTypes.VARCHAR(32)),
> DataTypes.FIELD("f5", DataTypes.VARCHAR(2)),
> ]
> )
> @udf(input_types=DataTypes.STRING(), result_type=some_complex_type)
> def complex_udf(s):
> return Row(f0=None, f1=None, f2=None, f3=None, f4=None, f5=None)
> if __name__ == "__main__":
> env = StreamExecutionEnvironment.get_execution_environment()
> table_env = StreamTableEnvironment.create(env)
> table_env.get_config().set('pipeline.classpaths', 
> 'file:///Users/dianfu/code/src/workspace/pyflink-examples/flink-sql-connector-postgres-cdc-2.1.1.jar')
> # Create schema
> _schema = {
> "p_key": DataTypes.INT(False),
> "modified_id": DataTypes.INT(False),
> "content": DataTypes.STRING()
> }
> schema = Schema.new_builder().from_fields(
> *zip(*[(k, v) for k, v in _schema.items()])
> ).\
> primary_key("p_key").\
> build()
> # Create table descriptor
> descriptor = TableDescriptor.for_connector("postgres-cdc").\
> option("hostname", "host.docker.internal").\
> option("port", "5432").\
> option("database-name", "flink_issue").\
> option("username", "root").\
> option("password", "root").\
> option("debezium.plugin.name", "pgoutput").\
> option("schema-name", "flink_schema").\
> option("table-name", "flink_table").\
> option("slot.name", "flink_slot").\
> schema(schema).\
> build()
> table_env.create_temporary_table("flink_table", descriptor)
> # Create changelog stream
> stream = table_env.from_path("flink_table")\
> # Define UDAF
> accumulator_type = DataTypes.ROW(
> [
> DataTypes.FIELD("f0", DataTypes.INT(False)),
> DataTypes.FIELD("f1", DataTypes.ROW([DataTypes.FIELD(k, v) for k, 
> v in _schema.items()])),
> ]
> )
> result_type = DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in 
> _schema.items()])
> emit_last = udaf(EmitLastState(), accumulator_type=accumulator_type, 
> result_type=result_type)
> # Emit last state based on modified_id to enable parallel processing
> stream = stream.\
> group_by(col("p_key")).\
> select(
> col("p_key"),
> emit_last(col("modified_id"),row(*(col(k) for k in 
> _schema.keys())).cast(result_type)).alias("tmp_obj")
> )
> # Select the elements of the objects
> stream = stream.select(*(col("tmp_obj").get(k).alias(k) for k in 
> _schema.keys()))
> # We apply a UDF which parses the xml and returns a complex nested 
> structure
> stream = stream.select(col("p_key"), 
> 

[jira] [Created] (FLINK-31861) Introduce ByteArraySchema which serialize/deserialize data of type byte array

2023-04-19 Thread Dian Fu (Jira)
Dian Fu created FLINK-31861:
---

 Summary: Introduce ByteArraySchema which serialize/deserialize 
data of type byte array
 Key: FLINK-31861
 URL: https://issues.apache.org/jira/browse/FLINK-31861
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Reporter: Dian Fu


The aim of this ticket is to introduce ByteArraySchema which 
serialize/deserialize data of type byte array. In this case, users could get 
the raw bytes from a data source. See 
[https://apache-flink.slack.com/archives/C03G7LJTS2G/p1681928862762699] for 
more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28528) Table.getSchema fails on table with watermark

2023-04-06 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709597#comment-17709597
 ] 

Dian Fu commented on FLINK-28528:
-

The exception happens when converting TableSchema to ResolvedSchema [1]. To 
resolve this issue, I guess we may need to support ResolvedSchema in PyFlink 
instead of TableSchema which is already deprecated in Java.

[1] 
https://github.com/apache/flink/blob/b30b12d97a00ee5a338b2bd4233df1e2b38ce87f/flink-table/flink-table-common/src/main/java/org/apache/flink/table/api/TableSchema.java#L430

> Table.getSchema fails on table with watermark
> -
>
> Key: FLINK-28528
> URL: https://issues.apache.org/jira/browse/FLINK-28528
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.1
>Reporter: Xuannan Su
>Assignee: Xingbo Huang
>Priority: Major
>
> The bug can be reproduced with the following test. The test can pass if we 
> use the commented way to define the watermark.
> {code:python}
> def test_flink_2(self):
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env)
> table = t_env.from_descriptor(
> TableDescriptor.for_connector("filesystem")
> .schema(
> Schema.new_builder()
> .column("name", DataTypes.STRING())
> .column("cost", DataTypes.INT())
> .column("distance", DataTypes.INT())
> .column("time", DataTypes.TIMESTAMP(3))
> .watermark("time", expr.col("time") - expr.lit(60).seconds)
> # .watermark("time", "`time` - INTERVAL '60' SECOND")
> .build()
> )
> .format("csv")
> .option("path", "./input.csv")
> .build()
> )
> print(table.get_schema())
> {code}
> It causes the following exception
> {code:none}
> E   pyflink.util.exceptions.TableException: 
> org.apache.flink.table.api.TableException: Expression 'minus(time, 6)' is 
> not string serializable. Currently, only expressions that originated from a 
> SQL expression have a well-defined string representation.
> E at 
> org.apache.flink.table.expressions.ResolvedExpression.asSerializableString(ResolvedExpression.java:51)
> E at 
> org.apache.flink.table.api.TableSchema.lambda$fromResolvedSchema$13(TableSchema.java:455)
> E at 
> java.util.Collections$SingletonList.forEach(Collections.java:4824)
> E at 
> org.apache.flink.table.api.TableSchema.fromResolvedSchema(TableSchema.java:451)
> E at org.apache.flink.table.api.Table.getSchema(Table.java:101)
> E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> E at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E at java.lang.reflect.Method.invoke(Method.java:498)
> E at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> E at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> E at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> E at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> E at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> E at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> E at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28528) Table.getSchema fails on table with watermark

2023-04-06 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709595#comment-17709595
 ] 

Dian Fu commented on FLINK-28528:
-

The root cause is that currently table.get_schema() only supports expressions 
defined via SQL string. For example, the following statements should work:
* .watermark("time", "`time` - INTERVAL '60' SECOND")
* .column_by_expression('TM', 'parse_bq_datetime_udf(window_start)') 

> Table.getSchema fails on table with watermark
> -
>
> Key: FLINK-28528
> URL: https://issues.apache.org/jira/browse/FLINK-28528
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.1
>Reporter: Xuannan Su
>Assignee: Xingbo Huang
>Priority: Major
>
> The bug can be reproduced with the following test. The test can pass if we 
> use the commented way to define the watermark.
> {code:python}
> def test_flink_2(self):
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env)
> table = t_env.from_descriptor(
> TableDescriptor.for_connector("filesystem")
> .schema(
> Schema.new_builder()
> .column("name", DataTypes.STRING())
> .column("cost", DataTypes.INT())
> .column("distance", DataTypes.INT())
> .column("time", DataTypes.TIMESTAMP(3))
> .watermark("time", expr.col("time") - expr.lit(60).seconds)
> # .watermark("time", "`time` - INTERVAL '60' SECOND")
> .build()
> )
> .format("csv")
> .option("path", "./input.csv")
> .build()
> )
> print(table.get_schema())
> {code}
> It causes the following exception
> {code:none}
> E   pyflink.util.exceptions.TableException: 
> org.apache.flink.table.api.TableException: Expression 'minus(time, 6)' is 
> not string serializable. Currently, only expressions that originated from a 
> SQL expression have a well-defined string representation.
> E at 
> org.apache.flink.table.expressions.ResolvedExpression.asSerializableString(ResolvedExpression.java:51)
> E at 
> org.apache.flink.table.api.TableSchema.lambda$fromResolvedSchema$13(TableSchema.java:455)
> E at 
> java.util.Collections$SingletonList.forEach(Collections.java:4824)
> E at 
> org.apache.flink.table.api.TableSchema.fromResolvedSchema(TableSchema.java:451)
> E at org.apache.flink.table.api.Table.getSchema(Table.java:101)
> E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> E at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E at java.lang.reflect.Method.invoke(Method.java:498)
> E at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> E at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> E at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> E at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> E at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> E at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> E at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28528) Table.getSchema fails on table with watermark

2023-04-06 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709591#comment-17709591
 ] 

Dian Fu commented on FLINK-28528:
-

A similar issue reported in slack channel: 
https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680795285070859

> Table.getSchema fails on table with watermark
> -
>
> Key: FLINK-28528
> URL: https://issues.apache.org/jira/browse/FLINK-28528
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.1
>Reporter: Xuannan Su
>Assignee: Xingbo Huang
>Priority: Major
>
> The bug can be reproduced with the following test. The test can pass if we 
> use the commented way to define the watermark.
> {code:python}
> def test_flink_2(self):
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env)
> table = t_env.from_descriptor(
> TableDescriptor.for_connector("filesystem")
> .schema(
> Schema.new_builder()
> .column("name", DataTypes.STRING())
> .column("cost", DataTypes.INT())
> .column("distance", DataTypes.INT())
> .column("time", DataTypes.TIMESTAMP(3))
> .watermark("time", expr.col("time") - expr.lit(60).seconds)
> # .watermark("time", "`time` - INTERVAL '60' SECOND")
> .build()
> )
> .format("csv")
> .option("path", "./input.csv")
> .build()
> )
> print(table.get_schema())
> {code}
> It causes the following exception
> {code:none}
> E   pyflink.util.exceptions.TableException: 
> org.apache.flink.table.api.TableException: Expression 'minus(time, 6)' is 
> not string serializable. Currently, only expressions that originated from a 
> SQL expression have a well-defined string representation.
> E at 
> org.apache.flink.table.expressions.ResolvedExpression.asSerializableString(ResolvedExpression.java:51)
> E at 
> org.apache.flink.table.api.TableSchema.lambda$fromResolvedSchema$13(TableSchema.java:455)
> E at 
> java.util.Collections$SingletonList.forEach(Collections.java:4824)
> E at 
> org.apache.flink.table.api.TableSchema.fromResolvedSchema(TableSchema.java:451)
> E at org.apache.flink.table.api.Table.getSchema(Table.java:101)
> E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> E at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E at java.lang.reflect.Method.invoke(Method.java:498)
> E at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> E at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> E at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> E at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> E at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> E at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> E at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF

2023-04-03 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31707:

Fix Version/s: 1.16.2

> Constant string cannot be used as input arguments of Pandas UDAF
> 
>
> Key: FLINK-31707
> URL: https://issues.apache.org/jira/browse/FLINK-31707
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> It will throw exceptions as following when using constant strings in Pandas 
> UDAF:
> {code}
> E   raise ValueError("field_type %s is not supported." % 
> field_type)
> E   ValueError: field_type type_name: CHAR
> E   char_info {
> E length: 3
> E   }
> Eis not supported.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF

2023-04-03 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17708188#comment-17708188
 ] 

Dian Fu edited comment on FLINK-31707 at 4/4/23 5:26 AM:
-

Fixed in:
- master via 7c6d8b0134cbcdc60d56b87d39ff2f28c310b1eb
- release-1.17 via 9c5ca0590806932e4e8f9d3f942f0a2a5442fe2d
- release-1.16 via 3291e4d6f9afff40e1e9718e23388610577de741


was (Author: dianfu):
Fixed in:
- master via 7c6d8b0134cbcdc60d56b87d39ff2f28c310b1eb
- release-1.17 via 9c5ca0590806932e4e8f9d3f942f0a2a5442fe2d

> Constant string cannot be used as input arguments of Pandas UDAF
> 
>
> Key: FLINK-31707
> URL: https://issues.apache.org/jira/browse/FLINK-31707
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.1
>
>
> It will throw exceptions as following when using constant strings in Pandas 
> UDAF:
> {code}
> E   raise ValueError("field_type %s is not supported." % 
> field_type)
> E   ValueError: field_type type_name: CHAR
> E   char_info {
> E length: 3
> E   }
> Eis not supported.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF

2023-04-03 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31707.
---
Fix Version/s: 1.18.0
   1.17.1
   Resolution: Fixed

Fixed in:
- master via 7c6d8b0134cbcdc60d56b87d39ff2f28c310b1eb
- release-1.17 via 9c5ca0590806932e4e8f9d3f942f0a2a5442fe2d

> Constant string cannot be used as input arguments of Pandas UDAF
> 
>
> Key: FLINK-31707
> URL: https://issues.apache.org/jira/browse/FLINK-31707
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.1
>
>
> It will throw exceptions as following when using constant strings in Pandas 
> UDAF:
> {code}
> E   raise ValueError("field_type %s is not supported." % 
> field_type)
> E   ValueError: field_type type_name: CHAR
> E   char_info {
> E length: 3
> E   }
> Eis not supported.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF

2023-04-03 Thread Dian Fu (Jira)
Dian Fu created FLINK-31707:
---

 Summary: Constant string cannot be used as input arguments of 
Pandas UDAF
 Key: FLINK-31707
 URL: https://issues.apache.org/jira/browse/FLINK-31707
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu


It will throw exceptions as following when using constant strings in Pandas 
UDAF:
{code}
E   raise ValueError("field_type %s is not supported." % 
field_type)
E   ValueError: field_type type_name: CHAR
E   char_info {
E length: 3
E   }
Eis not supported.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31690) The current key is not set for KeyedCoProcessOperator

2023-04-02 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31690.
---
Fix Version/s: 1.16.2
   1.18.0
   1.17.1
   Resolution: Fixed

Fixed in:
- master via 6c1ffe544e31bb67df94175a559f2f40362795a4
- release-1.17 via e3d612e7e98bedde42c365df3f2ed2a2ca76aefa
- release-1.16 via 01cdaee25cdc41773a5c42638f4c5209373b5aa4

> The current key is not set for KeyedCoProcessOperator
> -
>
> Key: FLINK-31690
> URL: https://issues.apache.org/jira/browse/FLINK-31690
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> See https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680294701254239 for 
> more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31690) The current key is not set for KeyedCoProcessOperator

2023-04-02 Thread Dian Fu (Jira)
Dian Fu created FLINK-31690:
---

 Summary: The current key is not set for KeyedCoProcessOperator
 Key: FLINK-31690
 URL: https://issues.apache.org/jira/browse/FLINK-31690
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu


See https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680294701254239 for 
more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31532) Support StreamExecutionEnvironment.socketTextStream in Python DataStream API

2023-03-21 Thread Dian Fu (Jira)
Dian Fu created FLINK-31532:
---

 Summary: Support StreamExecutionEnvironment.socketTextStream in 
Python DataStream API
 Key: FLINK-31532
 URL: https://issues.apache.org/jira/browse/FLINK-31532
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu


Currently, StreamExecutionEnvironment.socketTextStream is still missing in 
Python DataStream API. It would be great to support it. It may be helpful to in 
special cases, e.g. testing, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31503) "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Python

2023-03-19 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31503.
---
Fix Version/s: 1.16.2
   1.18.0
   1.17.1
   Resolution: Fixed

Fixed in:
- master via de258f3ce01e720d23bec67c20892133f89293d3
- release-1.17 via 3163b8f9caa53e9487ce062eba2c3d399dfe08a2
- release-1.16 via d438b3bdc48a0456088594700d438725d0fb1480

> "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
> org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is 
> thrown when executing Python UDFs in SQL Client 
> --
>
> Key: FLINK-31503
> URL: https://issues.apache.org/jira/browse/FLINK-31503
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> The following exception will be thrown when executing SQL statements 
> containing Python UDFs in SQL Client:
> {code}
> Caused by: java.util.ServiceConfigurationError: 
> org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
> org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype
> at java.util.ServiceLoader.fail(ServiceLoader.java:239)
> at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
> at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:415)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:507)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSortedSet$Builder.addAll(ImmutableSortedSet.java:528)
> at 
> org.apache.beam.sdk.util.common.ReflectHelpers.loadServicesOrdered(ReflectHelpers.java:199)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.initializeRegistry(PipelineOptionsFactory.java:2089)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2083)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2047)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory.resetCache(PipelineOptionsFactory.java:581)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory.(PipelineOptionsFactory.java:547)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:241)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31503) "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Pytho

2023-03-19 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31503:

Affects Version/s: 1.15.0

> "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
> org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is 
> thrown when executing Python UDFs in SQL Client 
> --
>
> Key: FLINK-31503
> URL: https://issues.apache.org/jira/browse/FLINK-31503
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.0
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> The following exception will be thrown when executing SQL statements 
> containing Python UDFs in SQL Client:
> {code}
> Caused by: java.util.ServiceConfigurationError: 
> org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
> org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype
> at java.util.ServiceLoader.fail(ServiceLoader.java:239)
> at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
> at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:415)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:507)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSortedSet$Builder.addAll(ImmutableSortedSet.java:528)
> at 
> org.apache.beam.sdk.util.common.ReflectHelpers.loadServicesOrdered(ReflectHelpers.java:199)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.initializeRegistry(PipelineOptionsFactory.java:2089)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2083)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2047)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory.resetCache(PipelineOptionsFactory.java:581)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory.(PipelineOptionsFactory.java:547)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:241)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31503) "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Pyt

2023-03-17 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701861#comment-17701861
 ] 

Dian Fu commented on FLINK-31503:
-

If your Flink version doesn't contain this fix, you could simply work around 
this issue by adding the following configuration:

SET 'classloader.parent-first-patterns.additional' = 'org.apache.beam.';

> "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
> org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is 
> thrown when executing Python UDFs in SQL Client 
> --
>
> Key: FLINK-31503
> URL: https://issues.apache.org/jira/browse/FLINK-31503
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
>
> The following exception will be thrown when executing SQL statements 
> containing Python UDFs in SQL Client:
> {code}
> Caused by: java.util.ServiceConfigurationError: 
> org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
> org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype
> at java.util.ServiceLoader.fail(ServiceLoader.java:239)
> at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
> at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:415)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:507)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSortedSet$Builder.addAll(ImmutableSortedSet.java:528)
> at 
> org.apache.beam.sdk.util.common.ReflectHelpers.loadServicesOrdered(ReflectHelpers.java:199)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.initializeRegistry(PipelineOptionsFactory.java:2089)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2083)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2047)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory.resetCache(PipelineOptionsFactory.java:581)
> at 
> org.apache.beam.sdk.options.PipelineOptionsFactory.(PipelineOptionsFactory.java:547)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:241)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31478) TypeError: a bytes-like object is required, not 'JavaList' is thrown when ds.execute_and_collect() is called on a KeyedStream

2023-03-17 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31478.
---
Fix Version/s: 1.16.2
   1.18.0
   1.17.1
   Resolution: Fixed

Fixed in:
- master via 5e059efee864e17939a33f29272a848d00598531
- release-1.17 via ec5a09b3ce56426d1bdc8eeac4bf52cac9be015b
- release-1.16 via cadf4b35fb6f20c8cba310fa54626d0b9bae1361

> TypeError: a bytes-like object is required, not 'JavaList' is thrown when 
> ds.execute_and_collect() is called on a KeyedStream
> -
>
> Key: FLINK-31478
> URL: https://issues.apache.org/jira/browse/FLINK-31478
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> {code}
> 
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #  http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
> 
> import argparse
> import logging
> import sys
> from pyflink.common import WatermarkStrategy, Encoder, Types
> from pyflink.datastream import StreamExecutionEnvironment, 
> RuntimeExecutionMode
> from pyflink.datastream.connectors.file_system import (FileSource, 
> StreamFormat, FileSink,
>OutputFileConfig, 
> RollingPolicy)
> word_count_data = ["To be, or not to be,--that is the question:--",
>"Whether 'tis nobler in the mind to suffer",
>"The slings and arrows of outrageous fortune",
>"Or to take arms against a sea of troubles,",
>"And by opposing end them?--To die,--to sleep,--",
>"No more; and by a sleep to say we end",
>"The heartache, and the thousand natural shocks",
>"That flesh is heir to,--'tis a consummation",
>"Devoutly to be wish'd. To die,--to sleep;--",
>"To sleep! perchance to dream:--ay, there's the rub;",
>"For in that sleep of death what dreams may come,",
>"When we have shuffled off this mortal coil,",
>"Must give us pause: there's the respect",
>"That makes calamity of so long life;",
>"For who would bear the whips and scorns of time,",
>"The oppressor's wrong, the proud man's contumely,",
>"The pangs of despis'd love, the law's delay,",
>"The insolence of office, and the spurns",
>"That patient merit of the unworthy takes,",
>"When he himself might his quietus make",
>"With a bare bodkin? who would these fardels bear,",
>"To grunt and sweat under a weary life,",
>"But that the dread of something after death,--",
>"The undiscover'd country, from whose bourn",
>"No traveller returns,--puzzles the will,",
>"And makes us rather bear those ills we have",
>"Than fly to others that we know not of?",
>"Thus conscience does make cowards of us all;",
>"And thus the native hue of resolution",
>"Is sicklied o'er with the pale cast of thought;",
>"And enterprises of great pith and moment,",
>"With this regard, their currents turn awry,",
>"And lose the name of action.--Soft you now!",
>"The fair Ophelia!--Nymph, in thy orisons",
>"Be all my sins remember'd."]
> def word_count(input_path, output_path):
> env = StreamExecutionEnvironment.get_execution_environment()
> 

[jira] [Created] (FLINK-31503) "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Pytho

2023-03-17 Thread Dian Fu (Jira)
Dian Fu created FLINK-31503:
---

 Summary: "org.apache.beam.sdk.options.PipelineOptionsRegistrar: 
Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a 
subtype" is thrown when executing Python UDFs in SQL Client 
 Key: FLINK-31503
 URL: https://issues.apache.org/jira/browse/FLINK-31503
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu


The following exception will be thrown when executing SQL statements containing 
Python UDFs in SQL Client:
{code}
Caused by: java.util.ServiceConfigurationError: 
org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider 
org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:415)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:507)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSortedSet$Builder.addAll(ImmutableSortedSet.java:528)
at 
org.apache.beam.sdk.util.common.ReflectHelpers.loadServicesOrdered(ReflectHelpers.java:199)
at 
org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.initializeRegistry(PipelineOptionsFactory.java:2089)
at 
org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2083)
at 
org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2047)
at 
org.apache.beam.sdk.options.PipelineOptionsFactory.resetCache(PipelineOptionsFactory.java:581)
at 
org.apache.beam.sdk.options.PipelineOptionsFactory.(PipelineOptionsFactory.java:547)
at 
org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:241)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31478) TypeError: a bytes-like object is required, not 'JavaList' is thrown when ds.execute_and_collect() is called on a KeyedStream

2023-03-15 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31478:

Description: 
{code}

#  Licensed to the Apache Software Foundation (ASF) under one
#  or more contributor license agreements.  See the NOTICE file
#  distributed with this work for additional information
#  regarding copyright ownership.  The ASF licenses this file
#  to you under the Apache License, Version 2.0 (the
#  "License"); you may not use this file except in compliance
#  with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import logging
import sys

from pyflink.common import WatermarkStrategy, Encoder, Types
from pyflink.datastream import StreamExecutionEnvironment, RuntimeExecutionMode
from pyflink.datastream.connectors.file_system import (FileSource, 
StreamFormat, FileSink,
   OutputFileConfig, 
RollingPolicy)


word_count_data = ["To be, or not to be,--that is the question:--",
   "Whether 'tis nobler in the mind to suffer",
   "The slings and arrows of outrageous fortune",
   "Or to take arms against a sea of troubles,",
   "And by opposing end them?--To die,--to sleep,--",
   "No more; and by a sleep to say we end",
   "The heartache, and the thousand natural shocks",
   "That flesh is heir to,--'tis a consummation",
   "Devoutly to be wish'd. To die,--to sleep;--",
   "To sleep! perchance to dream:--ay, there's the rub;",
   "For in that sleep of death what dreams may come,",
   "When we have shuffled off this mortal coil,",
   "Must give us pause: there's the respect",
   "That makes calamity of so long life;",
   "For who would bear the whips and scorns of time,",
   "The oppressor's wrong, the proud man's contumely,",
   "The pangs of despis'd love, the law's delay,",
   "The insolence of office, and the spurns",
   "That patient merit of the unworthy takes,",
   "When he himself might his quietus make",
   "With a bare bodkin? who would these fardels bear,",
   "To grunt and sweat under a weary life,",
   "But that the dread of something after death,--",
   "The undiscover'd country, from whose bourn",
   "No traveller returns,--puzzles the will,",
   "And makes us rather bear those ills we have",
   "Than fly to others that we know not of?",
   "Thus conscience does make cowards of us all;",
   "And thus the native hue of resolution",
   "Is sicklied o'er with the pale cast of thought;",
   "And enterprises of great pith and moment,",
   "With this regard, their currents turn awry,",
   "And lose the name of action.--Soft you now!",
   "The fair Ophelia!--Nymph, in thy orisons",
   "Be all my sins remember'd."]


def word_count(input_path, output_path):
env = StreamExecutionEnvironment.get_execution_environment()
env.set_runtime_mode(RuntimeExecutionMode.BATCH)
# write all the data to one file
env.set_parallelism(1)

# define the source
if input_path is not None:
ds = env.from_source(

source=FileSource.for_record_stream_format(StreamFormat.text_line_format(),
   input_path)
 .process_static_file_set().build(),
watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
source_name="file_source"
)
else:
print("Executing word_count example with default input data set.")
print("Use --input to specify file input.")
ds = env.from_collection(word_count_data)

def split(line):
yield from line.split()

# compute word count
ds = ds.flat_map(split) \
   .map(lambda i: (i, 1), output_type=Types.TUPLE([Types.STRING(), 
Types.INT()])) \
   .key_by(lambda i: i[0])
   # .reduce(lambda i, j: (i[0], i[1] + j[1]))

# define the sink
if 

  1   2   3   4   5   6   7   8   9   10   >