[jira] [Updated] (FLINK-19913) The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are inconsistent

2020-11-01 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-19913:

Description: 
The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are 
inconsistent. In doc:
{code:java}
INTERVAL DAY(p1) TO SECOND(p2)

The type can be declared using the above combinations where p1 is the number of 
digits of days (day precision) and p2 is the number of digits of fractional 
seconds (fractional precision). p1 must have a value between 1 and 6 (both 
inclusive). p2 must have a value between 0 and 9 (both inclusive). If no p1 is 
specified, it is equal to 2 by default. If no p2 is specified, it is equal to 6 
by default. 

{code}
In code:
{code:java}
 case typeName if DAY_INTERVAL_TYPES.contains(typeName) =>
if (relDataType.getPrecision > 3) {
  throw new TableException(
s"DAY_INTERVAL_TYPES precision is not supported: 
${relDataType.getPrecision}")
}
{code}

BTW: We can also refer to Oracle's definition of support for INTERVAL:
 
[https://oracle-base.com/articles/misc/oracle-dates-timestamps-and-intervals#interval]

  was:
The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are 
inconsistent. In doc:
{code:java}
INTERVAL DAY(p1) TO SECOND(p2)

The type can be declared using the above combinations where p1 is the number of 
digits of days (day precision) and p2 is the number of digits of fractional 
seconds (fractional precision). p1 must have a value between 1 and 6 (both 
inclusive). p2 must have a value between 0 and 9 (both inclusive). If no p1 is 
specified, it is equal to 2 by default. If no p2 is specified, it is equal to 6 
by default. 

{code}

In code:
{code}
 case typeName if DAY_INTERVAL_TYPES.contains(typeName) =>
if (relDataType.getPrecision > 3) {
  throw new TableException(
s"DAY_INTERVAL_TYPES precision is not supported: 
${relDataType.getPrecision}")
}
{code}

We can also refer to Oracle's definition of support for INTERVAL:
https://oracle-base.com/articles/misc/oracle-dates-timestamps-and-intervals#interval




> The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are 
> inconsistent
> ---
>
> Key: FLINK-19913
> URL: https://issues.apache.org/jira/browse/FLINK-19913
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.12.0, 1.11.1
>Reporter: sunjincheng
>Priority: Major
>
> The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are 
> inconsistent. In doc:
> {code:java}
> INTERVAL DAY(p1) TO SECOND(p2)
> The type can be declared using the above combinations where p1 is the number 
> of digits of days (day precision) and p2 is the number of digits of 
> fractional seconds (fractional precision). p1 must have a value between 1 and 
> 6 (both inclusive). p2 must have a value between 0 and 9 (both inclusive). If 
> no p1 is specified, it is equal to 2 by default. If no p2 is specified, it is 
> equal to 6 by default. 
> {code}
> In code:
> {code:java}
>  case typeName if DAY_INTERVAL_TYPES.contains(typeName) =>
> if (relDataType.getPrecision > 3) {
>   throw new TableException(
> s"DAY_INTERVAL_TYPES precision is not supported: 
> ${relDataType.getPrecision}")
> }
> {code}
> BTW: We can also refer to Oracle's definition of support for INTERVAL:
>  
> [https://oracle-base.com/articles/misc/oracle-dates-timestamps-and-intervals#interval]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19913) The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are inconsistent

2020-11-01 Thread sunjincheng (Jira)
sunjincheng created FLINK-19913:
---

 Summary: The precision in document and code of `INTERVAL DAY(p1) 
TO SECOND(p2)` are inconsistent
 Key: FLINK-19913
 URL: https://issues.apache.org/jira/browse/FLINK-19913
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.11.1, 1.12.0
Reporter: sunjincheng


The precision in document and code of `INTERVAL DAY(p1) TO SECOND(p2)` are 
inconsistent. In doc:
{code:java}
INTERVAL DAY(p1) TO SECOND(p2)

The type can be declared using the above combinations where p1 is the number of 
digits of days (day precision) and p2 is the number of digits of fractional 
seconds (fractional precision). p1 must have a value between 1 and 6 (both 
inclusive). p2 must have a value between 0 and 9 (both inclusive). If no p1 is 
specified, it is equal to 2 by default. If no p2 is specified, it is equal to 6 
by default. 

{code}

In code:
{code}
 case typeName if DAY_INTERVAL_TYPES.contains(typeName) =>
if (relDataType.getPrecision > 3) {
  throw new TableException(
s"DAY_INTERVAL_TYPES precision is not supported: 
${relDataType.getPrecision}")
}
{code}

We can also refer to Oracle's definition of support for INTERVAL:
https://oracle-base.com/articles/misc/oracle-dates-timestamps-and-intervals#interval





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19337) Make a little bit improvement for PyFlink package structure and class name.

2020-09-21 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-19337.
---
Resolution: Fixed

> Make a little bit improvement for PyFlink package structure and class name.
> ---
>
> Key: FLINK-19337
> URL: https://issues.apache.org/jira/browse/FLINK-19337
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: image-2020-09-22-11-55-34-451.png
>
>
> Would be great to make a little bit improvement for PyFlink package structure 
> and class name. I prefer `streaming` for data stream root package . and 
> remove the prefix "DataStreamPython " of class name. What do you think?
> !image-2020-09-22-11-55-34-451.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19337) Make a little bit improvement for PyFlink package structure and class name.

2020-09-21 Thread sunjincheng (Jira)
sunjincheng created FLINK-19337:
---

 Summary: Make a little bit improvement for PyFlink package 
structure and class name.
 Key: FLINK-19337
 URL: https://issues.apache.org/jira/browse/FLINK-19337
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.12.0
 Attachments: image-2020-09-22-11-55-34-451.png

Would be great to make a little bit improvement for PyFlink package structure 
and class name. I prefer `streaming` for data stream root package . and remove 
the prefix "DataStreamPython " of class name. What do you think?

!image-2020-09-22-11-55-34-451.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17877) Add support for Python 3.8

2020-09-07 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-17877.
---
Fix Version/s: 1.12.0
 Assignee: sunjincheng
   Resolution: Fixed

> Add support for Python 3.8
> --
>
> Key: FLINK-17877
> URL: https://issues.apache.org/jira/browse/FLINK-17877
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.10.1
>Reporter: Robert Metzger
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.12.0
>
>
> While trying out PyFlink on Ubuntu 20.04, I noticed that PyFlink does not 
> support Python 3.8 yet. 
> It fails with the following error
> {code}
> ubuntu@ubn-latest-testing:~$ python -m pip install apache-flink
> Collecting apache-flink
>   Downloading apache-flink-1.10.1.tar.gz (284.5 MB)
>  || 284.5 MB 803 bytes/s
> Collecting apache-beam==2.15.0
>   Downloading apache-beam-2.15.0.zip (1.6 MB)
>  || 1.6 MB 24.1 MB/s
> Collecting avro-python3<=1.9.1,>=1.8.1
>   Downloading avro-python3-1.9.1.tar.gz (36 kB)
> Collecting cloudpickle==1.2.2
>   Downloading cloudpickle-1.2.2-py2.py3-none-any.whl (25 kB)
> Collecting py4j==0.10.8.1
>   Downloading py4j-0.10.8.1-py2.py3-none-any.whl (196 kB)
>  || 196 kB 51.2 MB/s
> Collecting pyarrow<0.14.0,>=0.11.1
>   Downloading pyarrow-0.13.0.tar.gz (4.8 MB)
>  || 4.8 MB 37.6 MB/s
>   Installing build dependencies ... done
>   Getting requirements to build wheel ... done
> Preparing wheel metadata ... done
> Collecting python-dateutil==2.8.0
>   Downloading python_dateutil-2.8.0-py2.py3-none-any.whl (226 kB)
>  || 226 kB 39.6 MB/s
> Collecting crcmod<2.0,>=1.7
>   Downloading crcmod-1.7.tar.gz (89 kB)
>  || 89 kB 14.0 MB/s
> Collecting dill<0.2.10,>=0.2.9
>   Downloading dill-0.2.9.tar.gz (150 kB)
>  || 150 kB 47.4 MB/s
> Collecting fastavro<0.22,>=0.21.4
>   Downloading fastavro-0.21.24.tar.gz (496 kB)
>  || 496 kB 36.6 MB/s
> Collecting future<1.0.0,>=0.16.0
>   Downloading future-0.18.2.tar.gz (829 kB)
>  || 829 kB 41.0 MB/s
> Collecting grpcio<2,>=1.8
>   Downloading grpcio-1.29.0-cp38-cp38-manylinux2010_x86_64.whl (3.0 MB)
>  || 3.0 MB 47.4 MB/s
> Collecting hdfs<3.0.0,>=2.1.0
>   Downloading hdfs-2.5.8.tar.gz (41 kB)
>  || 41 kB 1.8 MB/s
> Collecting httplib2<=0.12.0,>=0.8
>   Downloading httplib2-0.12.0.tar.gz (218 kB)
>  || 218 kB 48.7 MB/s
> Collecting mock<3.0.0,>=1.0.1
>   Downloading mock-2.0.0-py2.py3-none-any.whl (56 kB)
>  || 56 kB 8.1 MB/s
> Collecting oauth2client<4,>=2.0.1
>   Downloading oauth2client-3.0.0.tar.gz (77 kB)
>  || 77 kB 11.1 MB/s
> Collecting protobuf<4,>=3.5.0.post1
>   Downloading protobuf-3.12.1-cp38-cp38-manylinux1_x86_64.whl (1.3 MB)
>  || 1.3 MB 3.0 MB/s
> Collecting pydot<2,>=1.2.0
>   Downloading pydot-1.4.1-py2.py3-none-any.whl (19 kB)
> Collecting pymongo<4.0.0,>=3.8.0
>   Downloading pymongo-3.10.1-cp38-cp38-manylinux2014_x86_64.whl (480 kB)
>  || 480 kB 35.1 MB/s
> Collecting pytz>=2018.3
>   Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB)
>  || 510 kB 62.6 MB/s
> Collecting pyyaml<4.0.0,>=3.12
>   Downloading PyYAML-3.13.tar.gz (270 kB)
>  || 270 kB 31.0 MB/s
> Requirement already satisfied: six>=1.0.0 in /usr/lib/python3/dist-packages 
> (from pyarrow<0.14.0,>=0.11.1->apache-flink) (1.14.0)
> Collecting numpy>=1.14
>   Downloading numpy-1.18.4-cp38-cp38-manylinux1_x86_64.whl (20.7 MB)
>  || 20.7 MB 28.4 MB/s
> Collecting docopt
>   Downloading docopt-0.6.2.tar.gz (25 kB)
> Requirement already satisfied: requests>=2.7.0 in 
> /usr/lib/python3/dist-packages (from 
> hdfs<3.0.0,>=2.1.0->apache-beam==2.15.0->apache-flink) (2.22.0)
> Collecting pbr>=0.11
>   Downloading pbr-5.4.5-py2.py3-none-any.whl (110 kB)
>  || 110 kB 44.2 MB/s
> Requirement already satisfied: pyasn1-modules>=0.0.5 in 
> /usr/lib/python3/dist-packages (from 
> oauth2client<4,>=2.0.1->apache-beam==2.15.0->apache-flink) (0.2.1)
> Requirement already satisfied: pyasn1>=0.1.7 in 
> /usr/lib/python3/dist-packages (from 
> oauth2client<4,>=2.0.1->apache-beam==2.15.0->apache-flink) (0.4.2)
> Collecting rsa>=3.1.4
>   Downloading 

[jira] [Closed] (FLINK-9786) Publish Python libraries to pypi

2020-09-07 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-9786.
--
Fix Version/s: 1.10.0
   Resolution: Done

We can install pyflink by "pip install apache-flink" now. close this issue.

> Publish Python libraries to pypi
> 
>
> Key: FLINK-9786
> URL: https://issues.apache.org/jira/browse/FLINK-9786
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Jakob Homan
>Priority: Major
> Fix For: 1.10.0
>
>
> Right now the Python libraries are only available either via the source or 
> within the python api jar.  Python uses [pypi|https://pypi.org/] to publish 
> artifacts and track dependencies.  We should publish the Python library there 
> as part of each version that's released.  This would allow users to bring in 
> the Python code via standard dependency management techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19131) Add py38 support in PyFlink

2020-09-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-19131:

Labels: pull-request-available  (was: )

> Add py38 support in PyFlink
> ---
>
> Key: FLINK-19131
> URL: https://issues.apache.org/jira/browse/FLINK-19131
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> At present, py39 has been released, and many open source projects have 
> supported PY38, such as, beam, arrow, pandas, etc. so, would be great to 
> support Py38 in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19131) Add py38 support in PyFlink

2020-09-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-19131:

Labels:   (was: python)

> Add py38 support in PyFlink
> ---
>
> Key: FLINK-19131
> URL: https://issues.apache.org/jira/browse/FLINK-19131
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.12.0
>
>
> At present, py39 has been released, and many open source projects have 
> supported PY38, such as, beam, arrow, pandas, etc. so, would be great to 
> support Py38 in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19131) Add py38 support in PyFlink

2020-09-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-19131:

Labels: python  (was: pull-request-available)

> Add py38 support in PyFlink
> ---
>
> Key: FLINK-19131
> URL: https://issues.apache.org/jira/browse/FLINK-19131
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: python
> Fix For: 1.12.0
>
>
> At present, py39 has been released, and many open source projects have 
> supported PY38, such as, beam, arrow, pandas, etc. so, would be great to 
> support Py38 in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19131) Add py38 support in PyFlink

2020-09-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-19131:

Environment: (was: At present, py39 has been released, and many open 
source projects have supported PY38, such as, beam, arrow, pandas, etc. so, 
would be great to support Py38 in PyFlink.)

> Add py38 support in PyFlink
> ---
>
> Key: FLINK-19131
> URL: https://issues.apache.org/jira/browse/FLINK-19131
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19131) Add py38 support in PyFlink

2020-09-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-19131:

Description: At present, py39 has been released, and many open source 
projects have supported PY38, such as, beam, arrow, pandas, etc. so, would be 
great to support Py38 in PyFlink.

> Add py38 support in PyFlink
> ---
>
> Key: FLINK-19131
> URL: https://issues.apache.org/jira/browse/FLINK-19131
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.12.0
>
>
> At present, py39 has been released, and many open source projects have 
> supported PY38, such as, beam, arrow, pandas, etc. so, would be great to 
> support Py38 in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19118) Support Expression in the operations of the Python Table API

2020-09-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-19118.
---
Resolution: Fixed

Fixed in Master: a8cc62a901dabe6c4d877b97db6024715b68174a

> Support Expression in the operations of the Python Table API
> 
>
> Key: FLINK-19118
> URL: https://issues.apache.org/jira/browse/FLINK-19118
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently, it only supports string in the operations of the Python Table API. 
> For example:
> {code}
> >>> tab.group_by("key").select("key, value.avg")
> {code}
> After introducing the Expression class in FLINK-19114, it's possible to 
> support
> to use Expression in the operations in the Python Table API, e.g.
> {code}
> >>> tab.group_by(col("key")).select(col("key"), col("value").avg)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19131) Add py38 support in PyFlink

2020-09-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-19131:
---

 Summary: Add py38 support in PyFlink
 Key: FLINK-19131
 URL: https://issues.apache.org/jira/browse/FLINK-19131
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
 Environment: At present, py39 has been released, and many open source 
projects have supported PY38, such as, beam, arrow, pandas, etc. so, would be 
great to support Py38 in PyFlink.
Reporter: sunjincheng
Assignee: sunjincheng
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18910) Create the new document structure for Python documentation according to FLIP-133

2020-08-16 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18910:
---

Assignee: Wei Zhong

> Create the new document structure for Python documentation according to 
> FLIP-133
> 
>
> Key: FLINK-18910
> URL: https://issues.apache.org/jira/browse/FLINK-18910
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
>
> Create the following catalog structure under the "Application Development" 
> catalog:
> *Application Development*
>   *-* *Python API* 
>        *-* Getting Started
>        *-* User Guide
>           *-* Table API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18775) Rework PyFlink Documentation

2020-08-11 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18775:
---

Assignee: Wei Zhong  (was: sunjincheng)

> Rework PyFlink Documentation
> 
>
> Key: FLINK-18775
> URL: https://issues.apache.org/jira/browse/FLINK-18775
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: Wei Zhong
>Priority: Major
>  Labels: beginner
> Fix For: 1.11.0, 1.12.0, 1.11.1, 1.11.2
>
>
> Since the release of Flink 1.11, users of PyFlink have continued to grow. 
> According to the feedback we received, current Flink documentation is not 
> very friendly to PyFlink users. There are two shortcomings:
>  # Python related content is mixed in the Java/Scala documentation, which 
> makes it difficult for users who only focus on PyFlink to read.
>  # There is already a "Python Table API" section in the Table API document to 
> store PyFlink documents, but the number of articles is small and the content 
> is fragmented. It is difficult for beginners to learn from it.
> In addition, 
> [FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
>  introduced the Python DataStream API. Many documents will be added for those 
> new APIs. In order to increase the readability and maintainability of the 
> PyFlink document, we would like to rework it via this umbrella JIRA.
>  
> The detail can be found in 
> [FLIP-133|https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-18816.
---
Fix Version/s: 1.11.2
   1.12.0
   Resolution: Fixed

Fixed in Master: 456d5ba5619c79f05f61979d7967e6db95f9ab6d
Fixed in release-1.11: 043d93f9e2e8d5a86ac4ed9bd7c8c5c19f69d05c

> Correct API usage in Pyflink Dependency Management page
> ---
>
> Key: FLINK-18816
> URL: https://issues.apache.org/jira/browse/FLINK-18816
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: Zhenhua Yang
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.12.0, 1.11.2
>
>
> Correct the API usage in the doc of page[1]. Changes:
> From `table_env.get_config().set_configuration` to 
> `table_env.get_config().get_configuration().set_string`.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171240#comment-17171240
 ] 

sunjincheng commented on FLINK-18816:
-

Thanks for the PR [~huaouo] 

> Correct API usage in Pyflink Dependency Management page
> ---
>
> Key: FLINK-18816
> URL: https://issues.apache.org/jira/browse/FLINK-18816
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Priority: Major
>  Labels: beginner, pull-request-available
>
> Correct the API usage in the doc of page[1]. Changes:
> From `table_env.get_config().set_configuration` to 
> `table_env.get_config().get_configuration().set_string`.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18816:
---

Assignee: Zhenhua Yang

> Correct API usage in Pyflink Dependency Management page
> ---
>
> Key: FLINK-18816
> URL: https://issues.apache.org/jira/browse/FLINK-18816
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: Zhenhua Yang
>Priority: Major
>  Labels: beginner, pull-request-available
>
> Correct the API usage in the doc of page[1]. Changes:
> From `table_env.get_config().set_configuration` to 
> `table_env.get_config().get_configuration().set_string`.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18816:
---

Assignee: (was: sunjincheng)

> Correct API usage in Pyflink Dependency Management page
> ---
>
> Key: FLINK-18816
> URL: https://issues.apache.org/jira/browse/FLINK-18816
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Priority: Major
>  Labels: beginner
>
> Correct the API usage in the doc of page[1]. Changes:
> From `table_env.get_config().set_configuration` to 
> `table_env.get_config().get_configuration().set_string`.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18816:
---

Assignee: sunjincheng

> Correct API usage in Pyflink Dependency Management page
> ---
>
> Key: FLINK-18816
> URL: https://issues.apache.org/jira/browse/FLINK-18816
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: beginner
>
> Correct the API usage in the doc of page[1]. Changes:
> From `table_env.get_config().set_configuration` to 
> `table_env.get_config().get_configuration().set_string`.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18816:

Labels: beginner  (was: )

> Correct API usage in Pyflink Dependency Management page
> ---
>
> Key: FLINK-18816
> URL: https://issues.apache.org/jira/browse/FLINK-18816
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Priority: Major
>  Labels: beginner
>
> Correct the API usage in the doc of page[1]. Changes:
> From `table_env.get_config().set_configuration` to 
> `table_env.get_config().get_configuration().set_string`.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18816) Correct API usage in Pyflink Dependency Management page

2020-08-04 Thread sunjincheng (Jira)
sunjincheng created FLINK-18816:
---

 Summary: Correct API usage in Pyflink Dependency Management page
 Key: FLINK-18816
 URL: https://issues.apache.org/jira/browse/FLINK-18816
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Affects Versions: 1.11.1, 1.11.0
Reporter: sunjincheng


Correct the API usage in the doc of page[1]. Changes:

>From `table_env.get_config().set_configuration` to 
>`table_env.get_config().get_configuration().set_string`.

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18775) Rework PyFlink Documentation

2020-07-30 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18775:

Description: 
Since the release of Flink 1.11, users of PyFlink have continued to grow. 
According to the feedback we received, current Flink documentation is not very 
friendly to PyFlink users. There are two shortcomings:
 # Python related content is mixed in the Java/Scala documentation, which makes 
it difficult for users who only focus on PyFlink to read.
 # There is already a "Python Table API" section in the Table API document to 
store PyFlink documents, but the number of articles is small and the content is 
fragmented. It is difficult for beginners to learn from it.

In addition, 
[FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
 introduced the Python DataStream API. Many documents will be added for those 
new APIs. In order to increase the readability and maintainability of the 
PyFlink document, we would like to rework it via this umbrella JIRA.

 

The detail can be found in 
[FLIP-133|https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation]

  was:
Since the release of Flink 1.11, users of PyFlink have continued to grow. 
According to the feedback we received, current Flink documentation is not very 
friendly to PyFlink users. There are two shortcomings:
 # Python related content is mixed in the Java/Scala documentation, which makes 
it difficult for users who only focus on PyFlink to read.
 # There is already a "Python Table API" section in the Table API document to 
store PyFlink documents, but the number of articles is small and the content is 
fragmented. It is difficult for beginners to learn from it.

In addition, 
[FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
 introduced the Python DataStream API. Many documents will be added for those 
new APIs. In order to increase the readability and maintainability of the 
PyFlink document, we would like to rework it via this umbrella JIRA.

 

The detail can be found in 
[FLIP-133|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation]]


> Rework PyFlink Documentation
> 
>
> Key: FLINK-18775
> URL: https://issues.apache.org/jira/browse/FLINK-18775
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: beginner
> Fix For: 1.11.0, 1.12.0, 1.11.1, 1.11.2
>
>
> Since the release of Flink 1.11, users of PyFlink have continued to grow. 
> According to the feedback we received, current Flink documentation is not 
> very friendly to PyFlink users. There are two shortcomings:
>  # Python related content is mixed in the Java/Scala documentation, which 
> makes it difficult for users who only focus on PyFlink to read.
>  # There is already a "Python Table API" section in the Table API document to 
> store PyFlink documents, but the number of articles is small and the content 
> is fragmented. It is difficult for beginners to learn from it.
> In addition, 
> [FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
>  introduced the Python DataStream API. Many documents will be added for those 
> new APIs. In order to increase the readability and maintainability of the 
> PyFlink document, we would like to rework it via this umbrella JIRA.
>  
> The detail can be found in 
> [FLIP-133|https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18775) Rework PyFlink Documentation

2020-07-30 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18775:

Description: 
Since the release of Flink 1.11, users of PyFlink have continued to grow. 
According to the feedback we received, current Flink documentation is not very 
friendly to PyFlink users. There are two shortcomings:
 # Python related content is mixed in the Java/Scala documentation, which makes 
it difficult for users who only focus on PyFlink to read.
 # There is already a "Python Table API" section in the Table API document to 
store PyFlink documents, but the number of articles is small and the content is 
fragmented. It is difficult for beginners to learn from it.

In addition, 
[FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
 introduced the Python DataStream API. Many documents will be added for those 
new APIs. In order to increase the readability and maintainability of the 
PyFlink document, we would like to rework it via this umbrella JIRA.

 

The detail can be found in 
[FLIP-133|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation]]

  was:
Since the release of Flink 1.11, users of PyFlink have continued to grow. 
According to the feedback we received, current Flink documentation is not very 
friendly to PyFlink users. There are two shortcomings:
 # Python related content is mixed in the Java/Scala documentation, which makes 
it difficult for users who only focus on PyFlink to read.
 # There is already a "Python Table API" section in the Table API document to 
store PyFlink documents, but the number of articles is small and the content is 
fragmented. It is difficult for beginners to learn from it.

In addition, 
[FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
 introduced the Python DataStream API. Many documents will be added for those 
new APIs. In order to increase the readability and maintainability of the 
PyFlink document, we would like to rework it via this umbrella JIRA.


> Rework PyFlink Documentation
> 
>
> Key: FLINK-18775
> URL: https://issues.apache.org/jira/browse/FLINK-18775
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: beginner
> Fix For: 1.11.0, 1.12.0, 1.11.1, 1.11.2
>
>
> Since the release of Flink 1.11, users of PyFlink have continued to grow. 
> According to the feedback we received, current Flink documentation is not 
> very friendly to PyFlink users. There are two shortcomings:
>  # Python related content is mixed in the Java/Scala documentation, which 
> makes it difficult for users who only focus on PyFlink to read.
>  # There is already a "Python Table API" section in the Table API document to 
> store PyFlink documents, but the number of articles is small and the content 
> is fragmented. It is difficult for beginners to learn from it.
> In addition, 
> [FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
>  introduced the Python DataStream API. Many documents will be added for those 
> new APIs. In order to increase the readability and maintainability of the 
> PyFlink document, we would like to rework it via this umbrella JIRA.
>  
> The detail can be found in 
> [FLIP-133|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18775) Rework PyFlink Documentation

2020-07-30 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18775:
---

Assignee: sunjincheng

> Rework PyFlink Documentation
> 
>
> Key: FLINK-18775
> URL: https://issues.apache.org/jira/browse/FLINK-18775
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0, 1.11.1
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: beginner
> Fix For: 1.11.0, 1.12.0, 1.11.1, 1.11.2
>
>
> Since the release of Flink 1.11, users of PyFlink have continued to grow. 
> According to the feedback we received, current Flink documentation is not 
> very friendly to PyFlink users. There are two shortcomings:
>  # Python related content is mixed in the Java/Scala documentation, which 
> makes it difficult for users who only focus on PyFlink to read.
>  # There is already a "Python Table API" section in the Table API document to 
> store PyFlink documents, but the number of articles is small and the content 
> is fragmented. It is difficult for beginners to learn from it.
> In addition, 
> [FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
>  introduced the Python DataStream API. Many documents will be added for those 
> new APIs. In order to increase the readability and maintainability of the 
> PyFlink document, we would like to rework it via this umbrella JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18775) Rework PyFlink Documentation

2020-07-30 Thread sunjincheng (Jira)
sunjincheng created FLINK-18775:
---

 Summary: Rework PyFlink Documentation
 Key: FLINK-18775
 URL: https://issues.apache.org/jira/browse/FLINK-18775
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Affects Versions: 1.11.1, 1.11.0
Reporter: sunjincheng
 Fix For: 1.12.0, 1.11.2, 1.11.1, 1.11.0


Since the release of Flink 1.11, users of PyFlink have continued to grow. 
According to the feedback we received, current Flink documentation is not very 
friendly to PyFlink users. There are two shortcomings:
 # Python related content is mixed in the Java/Scala documentation, which makes 
it difficult for users who only focus on PyFlink to read.
 # There is already a "Python Table API" section in the Table API document to 
store PyFlink documents, but the number of articles is small and the content is 
fragmented. It is difficult for beginners to learn from it.

In addition, 
[FLIP-130|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298]
 introduced the Python DataStream API. Many documents will be added for those 
new APIs. In order to increase the readability and maintainability of the 
PyFlink document, we would like to rework it via this umbrella JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18356) Exit code 137 returned from process when testing pyflink

2020-06-18 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139150#comment-17139150
 ] 

sunjincheng commented on FLINK-18356:
-

It seems that there is a shortage of resources. It may be caused by OOM. I 
added builds system to the  component/s .

[https://success.docker.com/article/what-causes-a-container-to-exit-with-code-137]

> Exit code 137 returned from process when testing pyflink
> 
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Build System / Azure Pipelines
>Affects Versions: 1.12.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18356) Exit code 137 returned from process when testing pyflink

2020-06-18 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18356:

Component/s: Build System / Azure Pipelines

> Exit code 137 returned from process when testing pyflink
> 
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Build System / Azure Pipelines
>Affects Versions: 1.12.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18099) Release guard for PyFlink

2020-06-10 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-18099.
---
Resolution: Fixed

> Release guard for PyFlink
> -
>
> Key: FLINK-18099
> URL: https://issues.apache.org/jira/browse/FLINK-18099
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> We will add all of the check items(sub jiars) for ensure successful release 
> of PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-05 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126642#comment-17126642
 ] 

sunjincheng commented on FLINK-16497:
-

Thank you for active discussion!My original intention thinking is what is the 
most natural way to deal with stream computing. For now, sounds good to me for 
reduce the flush size to 100, and flush interval 1s as a comprehensive 
consideration.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18105) Test submitting Java SQL job with Python UDF

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18105:
---

Assignee: Wei Zhong

> Test submitting Java SQL job with Python UDF
> 
>
> Key: FLINK-18105
> URL: https://issues.apache.org/jira/browse/FLINK-18105
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Wei Zhong
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Test submitting Java SQL job with Python UDF, Python job with UDF via flink 
> run (include yarn perjob, yarn session, standalone session) with Java 
> dependency management (include pipeline.jar pipeline.classpaths) with Python 
> dependency management (include pyfs, pyreq, pyarch, pyexec, 
> PYFLINK_CLIENT_EXECUTABLE)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18104) Test pyflink on windows

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-18104:
---

Assignee: Wei Zhong

> Test pyflink on windows
> ---
>
> Key: FLINK-18104
> URL: https://issues.apache.org/jira/browse/FLINK-18104
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Wei Zhong
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Test pyflink on windows as follows:
>  * maven test
>  * maven build
>  * python build
>  * python test,
>  * python install
>  * python udf example with dependencies
> Feel free to add any other check items.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18108) Wheel package consistency checks

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18108:
---

 Summary: Wheel package consistency checks
 Key: FLINK-18108
 URL: https://issues.apache.org/jira/browse/FLINK-18108
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
Assignee: Huang Xingbo
 Fix For: 1.11.0


Checks wheel packages consistent with built from source code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18107) Performance tests for PyFlink UDFs

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18107:

Description: 
Check items as follows:
 * Performance tests for Python UDF
 * Performance tests for Pandas UDF

> Performance tests for PyFlink UDFs
> --
>
> Key: FLINK-18107
> URL: https://issues.apache.org/jira/browse/FLINK-18107
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Huang Xingbo
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Check items as follows:
>  * Performance tests for Python UDF
>  * Performance tests for Pandas UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18107) Performance tests for PyFlink UDFs

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18107:
---

 Summary: Performance tests for PyFlink UDFs
 Key: FLINK-18107
 URL: https://issues.apache.org/jira/browse/FLINK-18107
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
Assignee: Huang Xingbo
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18107) Performance tests for PyFlink UDFs

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18107:

Priority: Blocker  (was: Major)

> Performance tests for PyFlink UDFs
> --
>
> Key: FLINK-18107
> URL: https://issues.apache.org/jira/browse/FLINK-18107
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Huang Xingbo
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18106) Tests Python UDTF support

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18106:
---

 Summary: Tests Python UDTF support
 Key: FLINK-18106
 URL: https://issues.apache.org/jira/browse/FLINK-18106
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
Assignee: Huang Xingbo
 Fix For: 1.11.0


Check items as follows:
 * test Python UDTF (inner join/left join) in Blink Planner in batch mode
 * test Python UDTF (inner join/left join) in Blink Planner in streaming mode
 * test Python UDTF (inner join/left join) in Flink Planner in batch mode
 * test Python UDTF (inner join/left join) in Flink Planner in streaming mode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18105) Test submitting Java SQL job with Python UDF

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18105:
---

 Summary: Test submitting Java SQL job with Python UDF
 Key: FLINK-18105
 URL: https://issues.apache.org/jira/browse/FLINK-18105
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


Test submitting Java SQL job with Python UDF, Python job with UDF via flink run 
(include yarn perjob, yarn session, standalone session) with Java dependency 
management (include pipeline.jar pipeline.classpaths) with Python dependency 
management (include pyfs, pyreq, pyarch, pyexec, PYFLINK_CLIENT_EXECUTABLE)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18104) Test pyflink on windows

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18104:
---

 Summary: Test pyflink on windows
 Key: FLINK-18104
 URL: https://issues.apache.org/jira/browse/FLINK-18104
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


Test pyflink on windows as follows:
 * maven test
 * maven build
 * python build
 * python test,
 * python install
 * python udf example with dependencies

Feel free to add any other check items.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18103) Test Pandas DataFrame and Flink Table conversion

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18103:
---

 Summary: Test Pandas DataFrame and Flink Table conversion
 Key: FLINK-18103
 URL: https://issues.apache.org/jira/browse/FLINK-18103
 Project: Flink
  Issue Type: Test
Reporter: sunjincheng
 Fix For: 1.11.0


1) Test convert a Pandas DataFrame to Flink Table
2) Test convert a Flink Table to a Pandas DataFrame



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18102) Test Pandas UDF support

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18102:
---

 Summary: Test Pandas UDF support
 Key: FLINK-18102
 URL: https://issues.apache.org/jira/browse/FLINK-18102
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


Test that Pandas UDF functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18101) Test Python Pipeline API including Transformer and Estimator

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18101:
---

 Summary: Test Python Pipeline API including Transformer and 
Estimator
 Key: FLINK-18101
 URL: https://issues.apache.org/jira/browse/FLINK-18101
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


Test Python Pipeline API including Transformer and Estimator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18099) Release guard for PyFlink

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18099:

Labels: release-testing  (was: )

> Release guard for PyFlink
> -
>
> Key: FLINK-18099
> URL: https://issues.apache.org/jira/browse/FLINK-18099
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> We will add all of the check items(sub jiars) for ensure successful release 
> of PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18099) Release guard for PyFlink

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18099:

Fix Version/s: 1.11.0

> Release guard for PyFlink
> -
>
> Key: FLINK-18099
> URL: https://issues.apache.org/jira/browse/FLINK-18099
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.11.0
>
>
> We will add all of the check items(sub jiars) for ensure successful release 
> of PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18099) Release guard for PyFlink

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18099:

Priority: Blocker  (was: Major)

> Release guard for PyFlink
> -
>
> Key: FLINK-18099
> URL: https://issues.apache.org/jira/browse/FLINK-18099
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
> Fix For: 1.11.0
>
>
> We will add all of the check items(sub jiars) for ensure successful release 
> of PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18100) Test different user-defined metrics for Python UDF

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18100:

Labels: release-testing  (was: )

> Test different user-defined metrics for Python UDF
> --
>
> Key: FLINK-18100
> URL: https://issues.apache.org/jira/browse/FLINK-18100
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Test different user-defined metrics for Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18100) Test different user-defined metrics for Python UDF

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18100:

Priority: Blocker  (was: Major)

> Test different user-defined metrics for Python UDF
> --
>
> Key: FLINK-18100
> URL: https://issues.apache.org/jira/browse/FLINK-18100
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Blocker
> Fix For: 1.11.0
>
>
> Test different user-defined metrics for Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18100) Test different user-defined metrics for Python UDF

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18100:

Fix Version/s: 1.11.0

> Test different user-defined metrics for Python UDF
> --
>
> Key: FLINK-18100
> URL: https://issues.apache.org/jira/browse/FLINK-18100
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Major
> Fix For: 1.11.0
>
>
> Test different user-defined metrics for Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18100) Test different user-defined metrics for Python UDF

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18100:
---

 Summary: Test different user-defined metrics for Python UDF
 Key: FLINK-18100
 URL: https://issues.apache.org/jira/browse/FLINK-18100
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: sunjincheng


Test different user-defined metrics for Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18099) Release guard for PyFlink

2020-06-03 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18099:

Description: We will add all of the check items(sub jiars) for ensure 
successful release of PyFlink.  (was: We would add all the check items(sub 
jiars) for ensure successful release of PyFlink.)

> Release guard for PyFlink
> -
>
> Key: FLINK-18099
> URL: https://issues.apache.org/jira/browse/FLINK-18099
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>
> We will add all of the check items(sub jiars) for ensure successful release 
> of PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18099) Release guard for PyFlink

2020-06-03 Thread sunjincheng (Jira)
sunjincheng created FLINK-18099:
---

 Summary: Release guard for PyFlink
 Key: FLINK-18099
 URL: https://issues.apache.org/jira/browse/FLINK-18099
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: sunjincheng
Assignee: sunjincheng


We would add all the check items(sub jiars) for ensure successful release of 
PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-01 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120927#comment-17120927
 ] 

sunjincheng edited comment on FLINK-16497 at 6/1/20, 11:00 AM:
---

Hi [~libenchao] , Thanks for your reply!

I think is is difficult for us to set an optimal default value from the 
perspective of performance, which is related to specific business and storage. 
So at the design level, for stream computing scenarios, real-time insertion (1 
row) is a better semantic expression. So I think about it in terms of semantics 
and real-time. Even though the time of 1s is very short, beginners will still 
feel that it is real-time calculation, but not real-time write to storage, but 
mini-batch (1s may have multiple records), So, for now, I still prefer 1 row as 
default value. :)

What do you think?


was (Author: sunjincheng121):
Hi [~libenchao] , Thanks for your reply!

I think is is difficult for us to set an optimal default value from the 
perspective of performance, which is related to specific business and storage. 
So at the design level, for flow computing scenarios, real-time insertion (1 
row) is a better semantic expression. So I think about it in terms of semantics 
and real-time. Even though the time of 1s is very short, beginners will still 
feel that it is real-time calculation, but not real-time write to storage, but 
mini-batch (1s may have multiple records), So, for now, I still prefer 1 row as 
default value. :)

What do you think?

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-01 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120927#comment-17120927
 ] 

sunjincheng commented on FLINK-16497:
-

Hi [~libenchao] , Thanks for your reply!

I think is is difficult for us to set an optimal default value from the 
perspective of performance, which is related to specific business and storage. 
So at the design level, for flow computing scenarios, real-time insertion (1 
row) is a better semantic expression. So I think about it in terms of semantics 
and real-time. Even though the time of 1s is very short, beginners will still 
feel that it is real-time calculation, but not real-time write to storage, but 
mini-batch (1s may have multiple records), So, for now, I still prefer 1 row as 
default value. :)

What do you think?

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-05-31 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120532#comment-17120532
 ] 

sunjincheng commented on FLINK-16497:
-

+1 for thi,s as I mentioned in FLINK-18041, and I prefer 1 row as default as 
it's pretty friendly for user testing.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18041) Make a little bit improvement for DEFAULT_FLUSH_MAX_SIZE and DEFAULT_FLUSH_INTERVAL_MILLS of AbstractJdbcOutputFormat

2020-05-30 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-18041:

Affects Version/s: 1.11.0

> Make a little bit improvement for DEFAULT_FLUSH_MAX_SIZE and 
> DEFAULT_FLUSH_INTERVAL_MILLS of AbstractJdbcOutputFormat
> -
>
> Key: FLINK-18041
> URL: https://issues.apache.org/jira/browse/FLINK-18041
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Affects Versions: 1.10.0, 1.10.1, 1.11.0
>Reporter: sunjincheng
>Priority: Major
>
> Recently, when some users are learning Flink, they encounter a big problem, 
> that is, why Flink can't write to MySQL in real time.
> The root cause is that there is little data when testing and the default 
> value for DEFAULT_FLUSH_MAX_SIZE is 5000,  and DEFAULT_FLUSH_INTERVAL_MILLS 
> is 0.
> [https://github.com/apache/flink/blob/release-1.10/flink-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/AbstractJDBCOutputFormat.java]
> So, I think would be great to make a little bit improvement  for the default 
> value. Such as 
> DEFAULT_FLUSH_MAX_SIZE = 1. 
> What do you think? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18041) Make a little bit improvement for DEFAULT_FLUSH_MAX_SIZE and DEFAULT_FLUSH_INTERVAL_MILLS of AbstractJdbcOutputFormat

2020-05-30 Thread sunjincheng (Jira)
sunjincheng created FLINK-18041:
---

 Summary: Make a little bit improvement for DEFAULT_FLUSH_MAX_SIZE 
and DEFAULT_FLUSH_INTERVAL_MILLS of AbstractJdbcOutputFormat
 Key: FLINK-18041
 URL: https://issues.apache.org/jira/browse/FLINK-18041
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Affects Versions: 1.10.1, 1.10.0
Reporter: sunjincheng


Recently, when some users are learning Flink, they encounter a big problem, 
that is, why Flink can't write to MySQL in real time.

The root cause is that there is little data when testing and the default value 
for DEFAULT_FLUSH_MAX_SIZE is 5000,  and DEFAULT_FLUSH_INTERVAL_MILLS is 0.

[https://github.com/apache/flink/blob/release-1.10/flink-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/AbstractJDBCOutputFormat.java]

So, I think would be great to make a little bit improvement  for the default 
value. Such as 

DEFAULT_FLUSH_MAX_SIZE = 1. 

What do you think? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-29 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119453#comment-17119453
 ] 

sunjincheng commented on FLINK-17923:
-

+1 from my points of view. 

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
>   ... 11 more
> Caused by: org.apache.flink.runtime.memory.MemoryAllocationException: Could 
> not created the shared memory resource of size 536870920. Not enough memory 
> left to reserve from the slot's managed memory.
>

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119209#comment-17119209
 ] 

sunjincheng commented on FLINK-17923:
-

We can have another way to manage the resource for docker and external mode, 
but for now TM should manage the resource as we only support the Python worker 
to run in process mode. So, we can fix this issue by #5. And I think you are 
right that may be we should set the default value of managed should be set as 
false if we prefer to use docker or external mode for Python worker in the 
future.

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> 

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118341#comment-17118341
 ] 

sunjincheng commented on FLINK-17923:
-

Thanks for the PR Dian Fu.  Overall, It's looks good. If others also agree to 
this solution for 1.11, we can merge it as soon as possible and also CP to 1.10 
branch. What to you think [~xintongsong] [~zhuzh] [~pnowojski] [~zjwang]

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
>   ... 11 more
> 

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-27 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117629#comment-17117629
 ] 

sunjincheng commented on FLINK-17923:
-

Our consensus is that The final solution is both Python and RocksDB should be 
managed by Resource Management(using managed memory) . I think both #4 and #5 
works for PyFlink, I prefer #5 due to it's much flexible, if we cannot have #3 
in 1.11.0 release.

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
>   ... 11 more
> Caused by: 

[jira] [Comment Edited] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-26 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116772#comment-17116772
 ] 

sunjincheng edited comment on FLINK-17923 at 5/26/20, 2:20 PM:
---

At present, users can't start jobs as long as they use rocksDB + Python UDF. 
The core scenario of our Flink is stream computing. In stream computing, as 
long as it's an analytical application, it needs to use AGG. In this case, if 
it's a Python User, the demand for Python UDF is our core function of 
1.10/1.11. At present, we have china users waiting to use this feature.

We discussed the details of using option 3 today. Later [~zhuzh] will share the 
design document with you. We can discuss the design first and evaluate whether 
put this fixing to 1.11 is reasonable.


was (Author: sunjincheng121):
At present, users can't start jobs as long as they use rocksDB + Python UDF. 
The core scenario of our Flink is stream computing. In stream computing, as 
long as it's an analytical application, it needs to use AGG. In this case, if 
it's a Python User, the demand for Python UDF is our core function of 
1.10/1.11. At present, we have china users waiting to use this feature.

We discussed the details of using scheme 3 today. Later [~zhuzh] will share the 
design document with you. We can discuss the design first and evaluate whether 
put this fixing to 1.11 is reasonable.

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> 

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-26 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116772#comment-17116772
 ] 

sunjincheng commented on FLINK-17923:
-

At present, users can't start jobs as long as they use rocksDB + Python UDF. 
The core scenario of our Flink is stream computing. In stream computing, as 
long as it's an analytical application, it needs to use AGG. In this case, if 
it's a Python User, the demand for Python UDF is our core function of 
1.10/1.11. At present, we have china users waiting to use this feature.

We discussed the details of using scheme 3 today. Later [~zhuzh] will share the 
design document with you. We can discuss the design first and evaluate whether 
put this fixing to 1.11 is reasonable.

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-26 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116614#comment-17116614
 ] 

sunjincheng commented on FLINK-17923:
-

It's a pity that we do not find this issue earlier(We also need to improve the 
e2e test for PyFlink after fixing this issue). This is a very critical problem 
for PyFlink as it means that Python UDF could not be used in most streaming 
jobs(with state).So I think we should address this problem in 1.11. 

 

We( [~zhuzh] [~xintongsong] [~yunta] [~dian.fu] and me) have a further 
discussion about this problem and will update the status later.

 

Appreciate if you can pay attention to this [~pnowojski]  and [~zjwang] . 

 

 

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> 

[jira] [Comment Edited] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-26 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116614#comment-17116614
 ] 

sunjincheng edited comment on FLINK-17923 at 5/26/20, 10:40 AM:


It's a pity that we do not find this issue earlier(We also need to improve the 
e2e test for PyFlink after fixing this issue). This is a very critical problem 
for PyFlink as it means that Python UDF could not be used in most streaming 
jobs(with state).So I think we should address this problem in 1.11. 

We( [~zhuzh] [~xintongsong] [~yunta] [~dian.fu] and me) have a further 
discussion about this problem and will update the status later.

Appreciate if you can pay attention to this [~pnowojski]  and [~zjwang] .  


was (Author: sunjincheng121):
It's a pity that we do not find this issue earlier(We also need to improve the 
e2e test for PyFlink after fixing this issue). This is a very critical problem 
for PyFlink as it means that Python UDF could not be used in most streaming 
jobs(with state).So I think we should address this problem in 1.11. 

 

We( [~zhuzh] [~xintongsong] [~yunta] [~dian.fu] and me) have a further 
discussion about this problem and will update the status later.

 

Appreciate if you can pay attention to this [~pnowojski]  and [~zjwang] . 

 

 

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> 

[jira] [Updated] (FLINK-17856) Adds the feature of monitoring directory changes for FileSystem table source connector

2020-05-20 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17856:

Description: 
Great thanks to Violeta for report this issue.
 !image-2020-05-21-11-44-55-187.png|width=870,height=172!

I think it is makes sense to add this feature for filesystem table source 
connector, What do you think?

 

  was:
Great thanks to Violeta for report this issue.
!image-2020-05-21-11-44-55-187.png|width=870,height=172!

I think is makes sense to add this feature for filesystem table source 
connector, What do you think?

 


> Adds the feature of monitoring directory changes for FileSystem table source 
> connector 
> ---
>
> Key: FLINK-17856
> URL: https://issues.apache.org/jira/browse/FLINK-17856
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem, Table SQL / API
>Reporter: sunjincheng
>Priority: Major
> Attachments: image-2020-05-21-11-42-46-947.png, 
> image-2020-05-21-11-44-55-187.png
>
>
> Great thanks to Violeta for report this issue.
>  !image-2020-05-21-11-44-55-187.png|width=870,height=172!
> I think it is makes sense to add this feature for filesystem table source 
> connector, What do you think?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17856) Adds the feature of monitoring directory changes for FileSystem table source connector

2020-05-20 Thread sunjincheng (Jira)
sunjincheng created FLINK-17856:
---

 Summary: Adds the feature of monitoring directory changes for 
FileSystem table source connector 
 Key: FLINK-17856
 URL: https://issues.apache.org/jira/browse/FLINK-17856
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / FileSystem, Table SQL / API
Reporter: sunjincheng
 Attachments: image-2020-05-21-11-42-46-947.png, 
image-2020-05-21-11-44-55-187.png

Great thanks to Violeta for report this issue.
!image-2020-05-21-11-44-55-187.png|width=870,height=172!

I think is makes sense to add this feature for filesystem table source 
connector, What do you think?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17454) test_configuration.py ConfigurationTests::test_add_all failed on travis

2020-05-13 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17454:

Release Note:   (was: [FLINK-17454][python] Specify a port number for 
gateway callback server from python gateway)

> test_configuration.py ConfigurationTests::test_add_all failed on travis
> ---
>
> Key: FLINK-17454
> URL: https://issues.apache.org/jira/browse/FLINK-17454
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Piotr Nowojski
>Assignee: shuiqiangchen
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=383=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=455fddbf-5921-5b71-25ac-92992ad80b28
> {code:java}
> === short test summary info 
> 
> FAILED 
> pyflink/common/tests/test_configuration.py::ConfigurationTests::test_add_all
> == 1 failed, 499 passed, 19 skipped, 97 warnings in 182.59s (0:03:02) 
> ==
> ERROR: InvocationError for command 
> /__w/1/s/flink-python/.tox/py37-cython/bin/pytest --durations=0 (exited with 
> code 1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17454) test_configuration.py ConfigurationTests::test_add_all failed on travis

2020-05-11 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-17454.
---
Release Note: [FLINK-17454][python] Specify a port number for gateway 
callback server from python gateway
  Resolution: Fixed

Merged in to Master: 5f744d3f81bcfb8f77164a5ec9caa4594851d4bf

> test_configuration.py ConfigurationTests::test_add_all failed on travis
> ---
>
> Key: FLINK-17454
> URL: https://issues.apache.org/jira/browse/FLINK-17454
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Piotr Nowojski
>Assignee: shuiqiangchen
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=383=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=455fddbf-5921-5b71-25ac-92992ad80b28
> {code:java}
> === short test summary info 
> 
> FAILED 
> pyflink/common/tests/test_configuration.py::ConfigurationTests::test_add_all
> == 1 failed, 499 passed, 19 skipped, 97 warnings in 182.59s (0:03:02) 
> ==
> ERROR: InvocationError for command 
> /__w/1/s/flink-python/.tox/py37-cython/bin/pytest --durations=0 (exited with 
> code 1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17146) Support conversion between PyFlink Table and Pandas DataFrame

2020-05-10 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17146:

Priority: Blocker  (was: Major)

> Support conversion between PyFlink Table and Pandas DataFrame
> -
>
> Key: FLINK-17146
> URL: https://issues.apache.org/jira/browse/FLINK-17146
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> Pandas dataframe is the de-facto standard to work with tabular data in Python 
> community. PyFlink table is Flink’s representation of the tabular data in 
> Python language. It would be nice to provide the ability to convert between 
> the PyFlink table and Pandas dataframe in PyFlink Table API which has the 
> following benefits:
>  * It provides users the ability to switch between PyFlink and Pandas 
> seamlessly when processing data in Python language. Users could process data 
> using one execution engine and switch to another seamlessly. For example, it 
> may happen that users have already got a Pandas dataframe at hand and want to 
> perform some expensive transformation of it. Then they could convert it to a 
> PyFlink table and leverage the power of Flink engine. Users could also 
> convert a PyFlink table to Pandas dataframe and perform transformation of it 
> with the rich functionalities provided by the Pandas ecosystem.
>  * No intermediate connectors are needed when converting between them.
> More details could be found in 
> [FLIP-120|https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-17255) Add HBase connector descriptor support in PyFlink

2020-04-30 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-17255:
---

Assignee: shuiqiangchen

> Add HBase connector descriptor support in PyFlink
> -
>
> Key: FLINK-17255
> URL: https://issues.apache.org/jira/browse/FLINK-17255
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: shuiqiangchen
>Priority: Major
>  Labels: beginners, pull-request-available
> Fix For: 1.11.0
>
>
> HBase connector descriptor is already supported since 1.10.0 at Java side. We 
> should also support it in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17256) Suppport keyword arguments in the PyFlink Descriptor API

2020-04-25 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17256:

Labels: beginners  (was: )

> Suppport keyword arguments in the PyFlink Descriptor API
> 
>
> Key: FLINK-17256
> URL: https://issues.apache.org/jira/browse/FLINK-17256
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Major
>  Labels: beginners
> Fix For: 1.11.0
>
>
> Keyword arguments is a very commonly used feature in Python. We should 
> support it in the PyFlink Descriptor API to make the API more user friendly 
> for Python users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17255) Add HBase connector descriptor support in PyFlink

2020-04-25 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17255:

Labels: beginners  (was: )

> Add HBase connector descriptor support in PyFlink
> -
>
> Key: FLINK-17255
> URL: https://issues.apache.org/jira/browse/FLINK-17255
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Major
>  Labels: beginners
> Fix For: 1.11.0
>
>
> HBase connector descriptor is already supported since 1.10.0 at Java side. We 
> should also support it in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17254) Improve the PyFlink documentation and examples to use SQL DDL for source/sink definition

2020-04-25 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17254:

Labels: beginners  (was: )

> Improve the PyFlink documentation and examples to use SQL DDL for source/sink 
> definition
> 
>
> Key: FLINK-17254
> URL: https://issues.apache.org/jira/browse/FLINK-17254
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: sunjincheng
>Priority: Major
>  Labels: beginners
> Fix For: 1.11.0
>
>
> Currently there are two ways to register a table sink/source in PyFlink table 
> API:
> 1) TableEnvironment.connect
> 2) TableEnvironment.sql_update
> I think it's better to provide documentation and examples on how to use 2) in 
> PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17308) ExecutionGraphCache cachedExecutionGraphs not cleanup cause OOM Bug

2020-04-23 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091144#comment-17091144
 ] 

sunjincheng commented on FLINK-17308:
-

Reset the fix version as 1.9.3 was released.

> ExecutionGraphCache cachedExecutionGraphs not cleanup cause OOM Bug
> ---
>
> Key: FLINK-17308
> URL: https://issues.apache.org/jira/browse/FLINK-17308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.0
>Reporter: yujunyong
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0, 1.9.4
>
>
> class org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache will 
> cache job execution graph in field 
> "cachedExecutionGraphs" when call method 
> "getExecutionGraph", but never call it's 
> cleanup method in flink. it's cause JobManager Out of Memory, When submit a 
> lot of batch job and fetch these job's info. becasue these operation cache 
> all these job execution graph and "cleanup" method never called



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17308) ExecutionGraphCache cachedExecutionGraphs not cleanup cause OOM Bug

2020-04-23 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-17308:

Fix Version/s: (was: 1.9.3)
   1.9.4

> ExecutionGraphCache cachedExecutionGraphs not cleanup cause OOM Bug
> ---
>
> Key: FLINK-17308
> URL: https://issues.apache.org/jira/browse/FLINK-17308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.0
>Reporter: yujunyong
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0, 1.9.4
>
>
> class org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache will 
> cache job execution graph in field 
> "cachedExecutionGraphs" when call method 
> "getExecutionGraph", but never call it's 
> cleanup method in flink. it's cause JobManager Out of Memory, When submit a 
> lot of batch job and fetch these job's info. becasue these operation cache 
> all these job execution graph and "cleanup" method never called



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-12717) Add windows support for PyFlink

2020-04-20 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087401#comment-17087401
 ] 

sunjincheng commented on FLINK-12717:
-

Hi [~zhongwei] I have update the description of this issue for fully support 
windows for PyFlink. :)

> Add windows support for PyFlink
> ---
>
> Key: FLINK-12717
> URL: https://issues.apache.org/jira/browse/FLINK-12717
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Major
>
> The aim of this JIRA is to allow Python users to develop PyFlink programs in 
> windows. Users should be able to run a simple PyFlink program in the IDE(via 
> minicluster) for debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12717) Add windows support for PyFlink

2020-04-20 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-12717:

Summary: Add windows support for PyFlink  (was: Add windows support for the 
Python shell script)

> Add windows support for PyFlink
> ---
>
> Key: FLINK-12717
> URL: https://issues.apache.org/jira/browse/FLINK-12717
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Major
>
> The aim of this JIRA is to allow Python users to develop PyFlink programs in 
> windows. Users should be able to run a simple PyFlink program in the IDE(via 
> minicluster) for debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12717) Add windows support for the Python shell script

2020-04-20 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-12717:

Description: The aim of this JIRA is to allow Python users to develop 
PyFlink programs in windows. Users should be able to run a simple PyFlink 
program in the IDE(via minicluster) for debugging purposes.  (was: We should 
add a windows shell script for pyflink-gateway-server.sh.)

> Add windows support for the Python shell script
> ---
>
> Key: FLINK-12717
> URL: https://issues.apache.org/jira/browse/FLINK-12717
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Major
>
> The aim of this JIRA is to allow Python users to develop PyFlink programs in 
> windows. Users should be able to run a simple PyFlink program in the IDE(via 
> minicluster) for debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17256) Suppport keyword arguments in the PyFlink Descriptor API

2020-04-20 Thread sunjincheng (Jira)
sunjincheng created FLINK-17256:
---

 Summary: Suppport keyword arguments in the PyFlink Descriptor API
 Key: FLINK-17256
 URL: https://issues.apache.org/jira/browse/FLINK-17256
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


Keyword arguments is a very commonly used feature in Python. We should support 
it in the PyFlink Descriptor API to make the API more user friendly for Python 
users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17255) Add HBase connector descriptor support in PyFlink

2020-04-20 Thread sunjincheng (Jira)
sunjincheng created FLINK-17255:
---

 Summary: Add HBase connector descriptor support in PyFlink
 Key: FLINK-17255
 URL: https://issues.apache.org/jira/browse/FLINK-17255
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


HBase connector descriptor is already supported since 1.10.0 at Java side. We 
should also support it in PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17254) Improve the PyFlink documentation and examples to use SQL DDL for source/sink definition

2020-04-20 Thread sunjincheng (Jira)
sunjincheng created FLINK-17254:
---

 Summary: Improve the PyFlink documentation and examples to use SQL 
DDL for source/sink definition
 Key: FLINK-17254
 URL: https://issues.apache.org/jira/browse/FLINK-17254
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0


Currently there are two ways to register a table sink/source in PyFlink table 
API:
1) TableEnvironment.connect
2) TableEnvironment.sql_update
I think it's better to provide documentation and examples on how to use 2) in 
PyFlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14071) [for Documents Improvement] Add explicit path where jars files should be put for pyflink development

2020-04-19 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087346#comment-17087346
 ] 

sunjincheng commented on FLINK-14071:
-

Thanks for mention that issue [~coldmoon777] .  This issue is very 
important for users. we would like add some Usage Note in PyFlink docs :) 


> [for Documents Improvement] Add explicit path where jars files 
> should be put for pyflink development
> --
>
> Key: FLINK-14071
> URL: https://issues.apache.org/jira/browse/FLINK-14071
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.9.0
>Reporter: Xu Yang
>Priority: Minor
>  Labels: beginner, documentation
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> for docu improvement:
> I propose that Add explicit path where jars files should be put for 
> pyflink development.
> for some beginners, they are on a hard way to run a demo successful, not to 
> mention a production coding...
> When it comes to pyflink development, I found the docu lacks a explicit 
> descritption of where the jars files should be put.
> The recommended path is:
> *../site-packages/pyflink/lib/your_jar_files*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-13471) Add FlatAggregate support to stream Table API(blink planner)

2020-04-16 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-13471.
---
Resolution: Fixed

> Add FlatAggregate support to stream Table API(blink planner)
> 
>
> Key: FLINK-13471
> URL: https://issues.apache.org/jira/browse/FLINK-13471
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add FlatAggregate support to stream Table API(blink planner), i.e, align with 
> flink planner.
>  The API looks like:
> {code:java}
>  TableAggregateFunction tableAggFunc = new MyTableAggregateFunction();
>  tableEnv.registerFunction("tableAggFunc", tableAggFunc);
> tab.groupBy("key")
>.flatAggregate("tableAggFunc(a, b) as (x, y, z)")
>.select("key, x, y, z")
> {code}
> The detail can be found in 
> [Flip-29|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-13471) Add FlatAggregate support to stream Table API(blink planner)

2020-04-16 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-13471:
---

Assignee: sunjincheng

> Add FlatAggregate support to stream Table API(blink planner)
> 
>
> Key: FLINK-13471
> URL: https://issues.apache.org/jira/browse/FLINK-13471
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add FlatAggregate support to stream Table API(blink planner), i.e, align with 
> flink planner.
>  The API looks like:
> {code:java}
>  TableAggregateFunction tableAggFunc = new MyTableAggregateFunction();
>  tableEnv.registerFunction("tableAggFunc", tableAggFunc);
> tab.groupBy("key")
>.flatAggregate("tableAggFunc(a, b) as (x, y, z)")
>.select("key, x, y, z")
> {code}
> The detail can be found in 
> [Flip-29|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-13471) Add FlatAggregate support to stream Table API(blink planner)

2020-04-16 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned FLINK-13471:
---

Assignee: (was: sunjincheng)

> Add FlatAggregate support to stream Table API(blink planner)
> 
>
> Key: FLINK-13471
> URL: https://issues.apache.org/jira/browse/FLINK-13471
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add FlatAggregate support to stream Table API(blink planner), i.e, align with 
> flink planner.
>  The API looks like:
> {code:java}
>  TableAggregateFunction tableAggFunc = new MyTableAggregateFunction();
>  tableEnv.registerFunction("tableAggFunc", tableAggFunc);
> tab.groupBy("key")
>.flatAggregate("tableAggFunc(a, b) as (x, y, z)")
>.select("key, x, y, z")
> {code}
> The detail can be found in 
> [Flip-29|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (FLINK-13471) Add FlatAggregate support to stream Table API(blink planner)

2020-04-16 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reopened FLINK-13471:
-

> Add FlatAggregate support to stream Table API(blink planner)
> 
>
> Key: FLINK-13471
> URL: https://issues.apache.org/jira/browse/FLINK-13471
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add FlatAggregate support to stream Table API(blink planner), i.e, align with 
> flink planner.
>  The API looks like:
> {code:java}
>  TableAggregateFunction tableAggFunc = new MyTableAggregateFunction();
>  tableEnv.registerFunction("tableAggFunc", tableAggFunc);
> tab.groupBy("key")
>.flatAggregate("tableAggFunc(a, b) as (x, y, z)")
>.select("key, x, y, z")
> {code}
> The detail can be found in 
> [Flip-29|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-03 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074446#comment-17074446
 ] 

sunjincheng commented on FLINK-1:
-

I agree with [~aljoscha] that we should be careful when adding codes to the 
core modules. It seems that the added code in the core module is just to 
eliminate the code duplication which maybe introduced in the future. I think 
it's unnecessary at least for now. Maybe we can come up with another more clean 
way in the future when we actually encounter this issue. What do you think?

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16943) Support adding jars in PyFlink

2020-04-02 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073559#comment-17073559
 ] 

sunjincheng edited comment on FLINK-16943 at 4/2/20, 9:48 AM:
--

I think the best situation is that Java also supports job submission with 
multiple jars, but no matter what, PyFlink must need this function, so I think 
it's reasonable to add 'add_jars' interface in PyFlink (Does not affect any 
other Java modules).  The implementation of 'add_jars' is transparent to users. 
We can align with Java at any time in the future, if necessary. i.e. we want to 
ensure the compatibility of follow up versions.  The JIRA and discussion on 
Java supporting multiple jars can be found in [1] & [2].

[1] https://issues.apache.org/jira/browse/FLINK-14319
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Register-user-jar-files-in-Stream-ExecutionEnvironment-td35801.html


was (Author: sunjincheng121):
I think the best situation is that Java also supports job submission with 
multiple jars, but no matter what, PyFlink must need this function, so I think 
it's reasonable to add 'add_jars' interface in PyFlink (Does not affect any 
other Java modules).  The implementation of 'add_jars' is transparent to users. 
We can align with Java at any time in the future, if necessary. i.e. we want to 
ensure the compatibility of follow up versions.

> Support adding jars in PyFlink
> --
>
> Key: FLINK-16943
> URL: https://issues.apache.org/jira/browse/FLINK-16943
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Wei Zhong
>Priority: Major
>
> Since flink-1.10.0 released, many users have complained that PyFlink is 
> inconvenient when loading external jar packages. For local execution, users 
> need to copy the jar files to the lib folder under the installation directory 
> of PyFlink, which is hard to locate. For job submission, users need to merge 
> their jars into one, as `flink run` only accepts one jar file. It may be easy 
> for Java users but difficult for Python users if they haven't touched Java.
> We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
> this problem. It will add the jars to the context classloader of Py4j gateway 
> server and add to the `PipelineOptions.JARS` of the configuration of 
> StreamExecutionEnviornment/ExecutionEnviornment.
> Via this interface, users could add jars in their python job. The jars will 
> be loaded immediately, and users could use it even on the next line of the 
> Python code. Submitting a job with multiple external jars won't be a problem 
> anymore because all the jars in `PipelineOptions.JARS` will be added to the 
> JobGraph and upload to the cluster.
> As it is not a big change I'm not sure whether it is necessary to create a 
> FLIP to discuss this. So I created a JIRA first for flexibility. What do you 
> think guys?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16943) Support adding jars in PyFlink

2020-04-02 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073559#comment-17073559
 ] 

sunjincheng commented on FLINK-16943:
-

I think the best situation is that Java also supports job submission with 
multiple jars, but no matter what, PyFlink must need this function, so I think 
it's reasonable to add 'add_jars' interface in PyFlink (Does not affect any 
other Java modules).  The implementation of 'add_jars' is transparent to users. 
We can align with Java at any time in the future, if necessary. i.e. we want to 
ensure the compatibility of follow up versions.

> Support adding jars in PyFlink
> --
>
> Key: FLINK-16943
> URL: https://issues.apache.org/jira/browse/FLINK-16943
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Wei Zhong
>Priority: Major
>
> Since flink-1.10.0 released, many users have complained that PyFlink is 
> inconvenient when loading external jar packages. For local execution, users 
> need to copy the jar files to the lib folder under the installation directory 
> of PyFlink, which is hard to locate. For job submission, users need to merge 
> their jars into one, as `flink run` only accepts one jar file. It may be easy 
> for Java users but difficult for Python users if they haven't touched Java.
> We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
> this problem. It will add the jars to the context classloader of Py4j gateway 
> server and add to the `PipelineOptions.JARS` of the configuration of 
> StreamExecutionEnviornment/ExecutionEnviornment.
> Via this interface, users could add jars in their python job. The jars will 
> be loaded immediately, and users could use it even on the next line of the 
> Python code. Submitting a job with multiple external jars won't be a problem 
> anymore because all the jars in `PipelineOptions.JARS` will be added to the 
> JobGraph and upload to the cluster.
> As it is not a big change I'm not sure whether it is necessary to create a 
> FLIP to discuss this. So I created a JIRA first for flexibility. What do you 
> think guys?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16943) Support adding jars in PyFlink

2020-04-02 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073553#comment-17073553
 ] 

sunjincheng edited comment on FLINK-16943 at 4/2/20, 9:38 AM:
--

Thanks for report this issue, I think this is very important for Python users 
as usually Python users know very little about Java, merge JARs are very 
difficult for them, I have also encountered feedback from many Chinese users, 
and I wrote a 
[blog|https://enjoyment.cool/2020/03/31/Apache-Flink-%E6%89%AB%E9%9B%B7%E7%B3%BB%E5%88%97-PyFlink%E5%A6%82%E4%BD%95%E8%A7%A3%E5%86%B3%E5%A4%9AJAR%E5%8C%85%E4%BE%9D%E8%B5%96%E9%97%AE%E9%A2%98/]
  for Python users if Merge JARs, but solve the issue in 
[blog|https://enjoyment.cool/2020/03/31/Apache-Flink-%E6%89%AB%E9%9B%B7%E7%B3%BB%E5%88%97-PyFlink%E5%A6%82%E4%BD%95%E8%A7%A3%E5%86%B3%E5%A4%9AJAR%E5%8C%85%E4%BE%9D%E8%B5%96%E9%97%AE%E9%A2%98/]
 way is not good compare with on the API level, and on the CLI support for 
multiple JARs has been added.



was (Author: sunjincheng121):
Thanks for report this issue, I think this is very important for Python users 
as usually Python users know very little about Java, merge JARs are very 
difficult for them, I have also encountered feedback from many Chinese users, 
and I wrote a blog for Python users if Merge JARs, but solve the issue in 
[blog|https://enjoyment.cool/2020/03/31/Apache-Flink-%E6%89%AB%E9%9B%B7%E7%B3%BB%E5%88%97-PyFlink%E5%A6%82%E4%BD%95%E8%A7%A3%E5%86%B3%E5%A4%9AJAR%E5%8C%85%E4%BE%9D%E8%B5%96%E9%97%AE%E9%A2%98/]
 way is not good compare with on the API level, and on the CLI support for 
multiple JARs has been added.


> Support adding jars in PyFlink
> --
>
> Key: FLINK-16943
> URL: https://issues.apache.org/jira/browse/FLINK-16943
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Wei Zhong
>Priority: Major
>
> Since flink-1.10.0 released, many users have complained that PyFlink is 
> inconvenient when loading external jar packages. For local execution, users 
> need to copy the jar files to the lib folder under the installation directory 
> of PyFlink, which is hard to locate. For job submission, users need to merge 
> their jars into one, as `flink run` only accepts one jar file. It may be easy 
> for Java users but difficult for Python users if they haven't touched Java.
> We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
> this problem. It will add the jars to the context classloader of Py4j gateway 
> server and add to the `PipelineOptions.JARS` of the configuration of 
> StreamExecutionEnviornment/ExecutionEnviornment.
> Via this interface, users could add jars in their python job. The jars will 
> be loaded immediately, and users could use it even on the next line of the 
> Python code. Submitting a job with multiple external jars won't be a problem 
> anymore because all the jars in `PipelineOptions.JARS` will be added to the 
> JobGraph and upload to the cluster.
> As it is not a big change I'm not sure whether it is necessary to create a 
> FLIP to discuss this. So I created a JIRA first for flexibility. What do you 
> think guys?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16943) Support adding jars in PyFlink

2020-04-02 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073553#comment-17073553
 ] 

sunjincheng commented on FLINK-16943:
-

Thanks for report this issue, I think this is very important for Python users 
as usually Python users know very little about Java, merge JARs are very 
difficult for them, I have also encountered feedback from many Chinese users, 
and I wrote a blog for Python users if Merge JARs, but solve the issue in 
[blog|https://enjoyment.cool/2020/03/31/Apache-Flink-%E6%89%AB%E9%9B%B7%E7%B3%BB%E5%88%97-PyFlink%E5%A6%82%E4%BD%95%E8%A7%A3%E5%86%B3%E5%A4%9AJAR%E5%8C%85%E4%BE%9D%E8%B5%96%E9%97%AE%E9%A2%98/]
 way is not good compare with on the API level, and on the CLI support for 
multiple JARs has been added.


> Support adding jars in PyFlink
> --
>
> Key: FLINK-16943
> URL: https://issues.apache.org/jira/browse/FLINK-16943
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Wei Zhong
>Priority: Major
>
> Since flink-1.10.0 released, many users have complained that PyFlink is 
> inconvenient when loading external jar packages. For local execution, users 
> need to copy the jar files to the lib folder under the installation directory 
> of PyFlink, which is hard to locate. For job submission, users need to merge 
> their jars into one, as `flink run` only accepts one jar file. It may be easy 
> for Java users but difficult for Python users if they haven't touched Java.
> We intend to add a `add_jars` interface on PyFlink TableEnvironment to solve 
> this problem. It will add the jars to the context classloader of Py4j gateway 
> server and add to the `PipelineOptions.JARS` of the configuration of 
> StreamExecutionEnviornment/ExecutionEnviornment.
> Via this interface, users could add jars in their python job. The jars will 
> be loaded immediately, and users could use it even on the next line of the 
> Python code. Submitting a job with multiple external jars won't be a problem 
> anymore because all the jars in `PipelineOptions.JARS` will be added to the 
> JobGraph and upload to the cluster.
> As it is not a big change I'm not sure whether it is necessary to create a 
> FLIP to discuss this. So I created a JIRA first for flexibility. What do you 
> think guys?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16762) Relocation Beam dependency of PyFlink

2020-03-24 Thread sunjincheng (Jira)
sunjincheng created FLINK-16762:
---

 Summary: Relocation Beam dependency of PyFlink
 Key: FLINK-16762
 URL: https://issues.apache.org/jira/browse/FLINK-16762
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.10.0
Reporter: sunjincheng


Some users may already use beam on their own cluster, which may cause the 
conflict between the beam jar package carried by pyflink and the jar of the 
user cluster beam to a certain extent. So, I would like to relocation the Beam 
dependency of PyFlink.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16026) Travis failed due to python setup

2020-02-13 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036088#comment-17036088
 ] 

sunjincheng edited comment on FLINK-16026 at 2/13/20 9:54 AM:
--

Thanks for the reminder [~chesnay].

We have offline discussed about initializing all current dependencies to a 
fixed version before this patch. The current patch is a quick fix for Travis. 
Will feedback here when have the final conclusion.


was (Author: sunjincheng121):
Thanks for the reminder [~chesnay].

We have discussed about initializing all current dependencies to a fixed 
version before this patch. The current patch is a quick fix for Travis. Will 
feedback here when have the final conclusion.

> Travis failed due to python setup
> -
>
> Key: FLINK-16026
> URL: https://issues.apache.org/jira/browse/FLINK-16026
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Jingsong Lee
>Assignee: Huang Xingbo
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://api.travis-ci.com/v3/job/286671652/log.txt]
> [https://api.travis-ci.org/v3/job/649754603/log.txt]
> [https://api.travis-ci.com/v3/job/286409130/log.txt]
> Collecting avro-python3<2.0.0,>=1.8.1; python_version >= "3.0" (from 
> apache-beam==2.19.0->apache-flink==1.11.dev0) Using cached 
> https://files.pythonhosted.org/packages/31/21/d98e2515e5ca0337d7e747e8065227ee77faf5c817bbb74391899613178a/avro-python3-1.9.2.tar.gz
>  Complete output from command python setup.py egg_info: Traceback (most 
> recent call last): File "", line 1, in  File 
> "/tmp/pip-install-d6uvsl_b/avro-python3/setup.py", line 41, in  
> import pycodestyle ModuleNotFoundError: No module named 'pycodestyle' 
>  Command "python setup.py egg_info" 
> failed with error code 1 in /tmp/pip-install-d6uvsl_b/avro-python3/ You are 
> using pip version 10.0.1, however version 20.0.2 is available. You should 
> consider upgrading via the 'pip install --upgrade pip' command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16026) Travis failed due to python setup

2020-02-13 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036088#comment-17036088
 ] 

sunjincheng commented on FLINK-16026:
-

Thanks for the reminder [~chesnay].

We have discussed about initializing all current dependencies to a fixed 
version before this patch. The current patch is a quick fix for Travis. Will 
feedback here when have the final conclusion.

> Travis failed due to python setup
> -
>
> Key: FLINK-16026
> URL: https://issues.apache.org/jira/browse/FLINK-16026
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Jingsong Lee
>Assignee: Huang Xingbo
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://api.travis-ci.com/v3/job/286671652/log.txt]
> [https://api.travis-ci.org/v3/job/649754603/log.txt]
> [https://api.travis-ci.com/v3/job/286409130/log.txt]
> Collecting avro-python3<2.0.0,>=1.8.1; python_version >= "3.0" (from 
> apache-beam==2.19.0->apache-flink==1.11.dev0) Using cached 
> https://files.pythonhosted.org/packages/31/21/d98e2515e5ca0337d7e747e8065227ee77faf5c817bbb74391899613178a/avro-python3-1.9.2.tar.gz
>  Complete output from command python setup.py egg_info: Traceback (most 
> recent call last): File "", line 1, in  File 
> "/tmp/pip-install-d6uvsl_b/avro-python3/setup.py", line 41, in  
> import pycodestyle ModuleNotFoundError: No module named 'pycodestyle' 
>  Command "python setup.py egg_info" 
> failed with error code 1 in /tmp/pip-install-d6uvsl_b/avro-python3/ You are 
> using pip version 10.0.1, however version 20.0.2 is available. You should 
> consider upgrading via the 'pip install --upgrade pip' command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15253) Accumulators are not checkpointed

2020-02-10 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033426#comment-17033426
 ] 

sunjincheng commented on FLINK-15253:
-

Yes, [~mxm] , from the points of my view, anyway, I think it's a good idea to 
decouple the accumulators from the heartbeat. 

> Accumulators are not checkpointed
> -
>
> Key: FLINK-15253
> URL: https://issues.apache.org/jira/browse/FLINK-15253
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Runtime / Checkpointing
>Reporter: Maximilian Michels
>Priority: Major
>
> Accumulators are not checkpointed which make them relatively useless for 
> streaming applications. They are also tied to the heartbeat (FLINK-15252), 
> which makes them fragile.
> We could consider deactivating accumulators in streaming mode since they seem 
> to originally be a batch feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15937) Correct the Development Status for PyFlink

2020-02-06 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032093#comment-17032093
 ] 

sunjincheng commented on FLINK-15937:
-

Sounds good! Thanks [~gjy]!

> Correct the Development Status for PyFlink
> --
>
> Key: FLINK-15937
> URL: https://issues.apache.org/jira/browse/FLINK-15937
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
> Attachments: image-2020-02-06-19-17-28-672.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Correct the  `Development Status` value. 
> From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
> Production/Stable'`.
>  The `Planning status` in PyPI means tell user that the package cannot be 
> using in production [1]. So, correct the Development Status is very important 
> for user. 
> I would like to contains this fix in 1.10.0 release. 
>  
> [[1] 
> https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]
> !image-2020-02-06-19-17-28-672.png|width=268,height=244!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-15937) Correct the Development Status for PyFlink

2020-02-06 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031481#comment-17031481
 ] 

sunjincheng edited comment on FLINK-15937 at 2/6/20 11:20 AM:
--

Hi [~gary] , Thanks for pay attention on this issue. I found that you had reset 
the fix  version from 1.10 to 1.10.1. Sorry, I am not add more info in the Jira 
when I open it. From the points of my view `Planning status` in PyPI means that 
the package cannot be using in production. and correct the Development Status 
is very important for user.  I would like to contains this patch in 1.10.0 
release. 

 

What do you think?


was (Author: sunjincheng121):
Hi Gary Yao, Thanks for pay attention on this issue. I found that you had reset 
the fix  version from 1.10 to 1.10.1. From the points of my view `Planning 
status` in PyPI means that the package cannot be using in production. and 
correct the Development Status is very important for user.  I would like to 
contains this patch in 1.10.0 release. 

What do you think?

> Correct the Development Status for PyFlink
> --
>
> Key: FLINK-15937
> URL: https://issues.apache.org/jira/browse/FLINK-15937
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
> Attachments: image-2020-02-06-19-17-28-672.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Correct the  `Development Status` value. 
> From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
> Production/Stable'`.
>  The `Planning status` in PyPI means tell user that the package cannot be 
> using in production [1]. So, correct the Development Status is very important 
> for user. 
> I would like to contains this fix in 1.10.0 release. 
>  
> [[1] 
> https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]
> !image-2020-02-06-19-17-28-672.png|width=268,height=244!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15937) Correct the Development Status for PyFlink

2020-02-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-15937:

Attachment: (was: image-2020-02-06-19-14-00-631.png)

> Correct the Development Status for PyFlink
> --
>
> Key: FLINK-15937
> URL: https://issues.apache.org/jira/browse/FLINK-15937
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
> Attachments: image-2020-02-06-19-17-28-672.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Correct the  `Development Status` value. 
> From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
> Production/Stable'`.
>  The `Planning status` in PyPI means tell user that the package cannot be 
> using in production [1]. So, correct the Development Status is very important 
> for user. 
> I would like to contains this fix in 1.10.0 release. 
>  
> [[1] 
> https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]
> !image-2020-02-06-19-17-28-672.png|width=268,height=244!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15937) Correct the Development Status for PyFlink

2020-02-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-15937:

Description: 
Correct the  `Development Status` value. 

>From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
>Production/Stable'`.

 The `Planning status` in PyPI means tell user that the package cannot be using 
in production [1]. So, correct the Development Status is very important for 
user. 

I would like to contains this fix in 1.10.0 release. 

 

[[1] 
https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]

!image-2020-02-06-19-17-28-672.png|width=268,height=244!

  was:
Correct the  `Development Status` value. 

>From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
>Production/Stable'`.

 

 

[[1] 
https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]


> Correct the Development Status for PyFlink
> --
>
> Key: FLINK-15937
> URL: https://issues.apache.org/jira/browse/FLINK-15937
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
> Attachments: image-2020-02-06-19-17-28-672.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Correct the  `Development Status` value. 
> From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
> Production/Stable'`.
>  The `Planning status` in PyPI means tell user that the package cannot be 
> using in production [1]. So, correct the Development Status is very important 
> for user. 
> I would like to contains this fix in 1.10.0 release. 
>  
> [[1] 
> https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]
> !image-2020-02-06-19-17-28-672.png|width=268,height=244!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15937) Correct the Development Status for PyFlink

2020-02-06 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-15937:

Attachment: image-2020-02-06-19-17-28-672.png

> Correct the Development Status for PyFlink
> --
>
> Key: FLINK-15937
> URL: https://issues.apache.org/jira/browse/FLINK-15937
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.10.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.11.0
>
> Attachments: image-2020-02-06-19-17-28-672.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Correct the  `Development Status` value. 
> From `'Development Status :: 1 - Planning'` to `'Development Status :: 5 - 
> Production/Stable'`.
>  
>  
> [[1] 
> https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning|https://pypi.org/search/?c=Development+Status+%3A%3A+1+-+Planning]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >