date:20200809

[jira] [Comment Edited] (FLINK-18859) ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources failed with "Condition was not met in given timeout."

2020-08-09 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174061#comment-17174061
 ] 

Zhu Zhu edited comment on FLINK-18859 at 8/10/20, 5:56 AM:
---

I tried running the case 1000 times and failed to reproduce in my dev 
environment.
And the transfer.sh unfortunately failed and I cannot see the detailed 
execution log.
I guess this instability was caused by a temporarily slowness of the CI 
environment, which resulted in the job to not reach the expected state(FAILED) 
within 2000 ms.


was (Author: zhuzh):
I tried running the case 1000 times and failed to reproduce in my dev 
environment.
And the transfer.sh unfortunately failed and I cannot see the detailed 
execution log.
I would guess this instability is caused by a temporarily slowness of the CI 
environment, which resulted in the job to not reach the expected state(FAILED) 
within 2000 ms.

> ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources
>  failed with "Condition was not met in given timeout."
> -
>
> Key: FLINK-18859
> URL: https://issues.apache.org/jira/browse/FLINK-18859
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5300=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=66b5c59a-0094-561d-0e44-b149dfdd586d]
> {code}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.673 
> s <<< FAILURE! - in 
> org.apache.flink.runtime.executiongraph.ExecutionGraphNotEnoughResourceTest
> [ERROR] 
> testRestartWithSlotSharingAndNotEnoughResources(org.apache.flink.runtime.executiongraph.ExecutionGraphNotEnoughResourceTest)
>   Time elapsed: 3.158 s  <<< ERROR!
> java.util.concurrent.TimeoutException: Condition was not met in given timeout.
>   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:129)
>   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:119)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources(ExecutionGraphNotEnoughResourceTest.java:130)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-11779) CLI ignores -m parameter if high-availability is ZOOKEEPER

2020-08-09 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174087#comment-17174087
 ] 

Yang Wang commented on FLINK-11779:
---

I second [~maguowei]'s comments. AFAIK, the {{-m host:port}}, aka 
{{-Drest.address=host -Drest.port=port}}, should only take effect on standalone 
non-HA mode. For other deployments, it will always be overridden in 
{{YarnClusterDescriptor}} or {{KubernetesClusterDescriptor}}, which retrieves 
from Yarn ApplicationReport, zookeeper or Kubernetes service.

So maybe we just need to document this behavior.

> CLI ignores -m parameter if high-availability is ZOOKEEPER 
> ---
>
> Key: FLINK-11779
> URL: https://issues.apache.org/jira/browse/FLINK-11779
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Gary Yao
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> The CLI will ignores the host/port provided by the {{-m}} parameter if 
> {{high-availability: ZOOKEEPER}} is configured in {{flink-conf.yaml}}
> *Expected behavior*
> * TBD: either document this behavior or give precedence to {{-m}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-18835) sql using group by, duplicated group fileld appears

2020-08-09 Thread YHF (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174077#comment-17174077
 ] 

YHF edited comment on FLINK-18835 at 8/10/20, 5:45 AM:
---

[~jark] but why the group result loss, source data (A=2) in a time window, but 
the group result does not have A=2

you mean that after flink sql operation,I need to group by the data again? 
group twice?


was (Author: linshi2020):
[~jark] but why the group result loss, source data (A=2) in a time window, but 
the group result does not have A=2

> sql using group by, duplicated group fileld appears
> ---
>
> Key: FLINK-18835
> URL: https://issues.apache.org/jira/browse/FLINK-18835
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.1
>Reporter: YHF
>Priority: Critical
> Attachments: SumAnalysis.java
>
>
> datasource is kafka,then create a temporary view, group by (fieldA,fieldB) 
> using sql,
> then transform the result table to datastream using toRetractStream, then 
> print the result,
> I find duplicated (fieldA,fieldB)
> see attachment for code
> group by(scanType,scanSite,cmtInf),but result is below
> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})
> 3> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-18866) Support filter() operation for Python DataStream API.

2020-08-09 Thread Hequn Cheng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng reassigned FLINK-18866:
---

Assignee: Shuiqiang Chen

> Support filter() operation for Python DataStream API.
> -
>
> Key: FLINK-18866
> URL: https://issues.apache.org/jira/browse/FLINK-18866
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Shuiqiang Chen
>Assignee: Shuiqiang Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Support filter() interface for Python DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18862) Fix BinaryRawValueData cannot be cast to StringData in runtime

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-18862:

Description: 
1. Env：flinksql、 version 1.11.1，perjob mode
2. Error：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast 
to org.apache.flink.table.data.StringData

3、Job：

(1) create a kafka table
{code:java}
CREATE TABLE kafka(
x String,
y String
)with(
   'connector' = 'kafka',
..
)
{code}


(2)create a view:
{code:java}
   CREATE VIEW view1 AS
   SELECT 
   x, 
   y, 
   CAST(COUNT(1) AS VARCHAR) AS ct
   FROM kafka
   GROUP BY 
   x, y
{code}


(3) aggregate on the view:
{code:java}
select 
 x, 
 LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
FROM view1
GROUP BY x
{code}

And then the exception is 
thrown：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
org.apache.flink.table.data.StringData
The problem is that, there is no RawValueData in the query. The result type of 
count(1) should be bigint, not RawValueData. 
   
(4) If there is no aggregation, the job can run succefully.
{code:java}
select 
x, 
CONCAT_WS('=', y, ct)
from view1
{code}

The detailed exception:
{code:java}
java.lang.ClassCastException: 
org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
org.apache.flink.table.data.StringData
at 
org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
~[flink-table-blink_2.11-1.11.1.jar:?]
at org.apache.flink.table.data.RowData.get(RowData.java:273) 
~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at

[GitHub] [flink] hequn8128 commented on a change in pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



hequn8128 commented on a change in pull request #13094:
URL: https://github.com/apache/flink/pull/13094#discussion_r467693696



##
File path: flink-python/pyflink/datastream/tests/test_data_stream.py
##
@@ -149,6 +149,22 @@ def flat_map(value):
 expected.sort()
 self.assertEqual(expected, results)
 
+def test_add_sink_with_sink_func_class(self):

Review comment:
   The test is useless here, as it uses the DataStreamCollectUtil to 
collect and verify. The `add_sink` has not been tested. You can add a 
CustomSinkFunction which extends SinkFunction. The code looks like:
   ```
   class TestSinkFunction(SinkFunction):
  def __init__(self, func):
 ...
  def get_results():
...
   ```

##
File path: flink-python/pyflink/datastream/functions.py
##
@@ -147,3 +150,34 @@ def _get_python_env():
 gateway = get_gateway()
 exec_type = 
gateway.jvm.org.apache.flink.table.functions.python.PythonEnv.ExecType.PROCESS
 return 
gateway.jvm.org.apache.flink.table.functions.python.PythonEnv(exec_type)
+
+
+class JavaFunctionWrapper(object):

Review comment:
   Add comments for this class.

##
File path: flink-python/pyflink/datastream/functions.py
##
@@ -147,3 +150,34 @@ def _get_python_env():
 gateway = get_gateway()
 exec_type = 
gateway.jvm.org.apache.flink.table.functions.python.PythonEnv.ExecType.PROCESS
 return 
gateway.jvm.org.apache.flink.table.functions.python.PythonEnv(exec_type)
+
+
+class JavaFunctionWrapper(object):
+
+def __init__(self, j_function):
+self._j_function = j_function
+
+def get_java_function(self):
+return self._j_function
+
+
+class SinkFunction(JavaFunctionWrapper):
+"""
+The base class for SinkFunctions.
+"""
+
+def __init__(self, sink_func: Union[str, JavaObject], *args):
+"""
+Constructor of SinkFunction.
+
+:param sink_func: The java SinkFunction object.
+"""
+if isinstance(sink_func, str):
+j_source_func_class = get_gateway().jvm.__getattr__(sink_func)
+if len(args) > 0:
+j_sink_func = j_source_func_class(*args)
+else:
+j_sink_func = j_source_func_class()
+else:
+j_sink_func = sink_func
+super(SinkFunction, self).__init__(j_sink_func)

Review comment:
   Move this logic into `JavaFunctionWrapper` so that can be shared by 
other classes, e.g., SourceFunction. 

##
File path: flink-python/pyflink/datastream/functions.py
##
@@ -147,3 +150,34 @@ def _get_python_env():
 gateway = get_gateway()
 exec_type = 
gateway.jvm.org.apache.flink.table.functions.python.PythonEnv.ExecType.PROCESS
 return 
gateway.jvm.org.apache.flink.table.functions.python.PythonEnv(exec_type)
+
+
+class JavaFunctionWrapper(object):
+
+def __init__(self, j_function):
+self._j_function = j_function
+
+def get_java_function(self):
+return self._j_function
+
+
+class SinkFunction(JavaFunctionWrapper):
+"""
+The base class for SinkFunctions.
+"""
+
+def __init__(self, sink_func: Union[str, JavaObject], *args):

Review comment:
   If you want to support `*args`, you need to take all types into 
consideration, i.e., convert different python types to java types. I think we 
don't need to support `*args` here. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-18862) Fix BinaryRawValueData cannot be cast to StringData in runtime

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-18862:
---

Assignee: Jark Wu

> Fix BinaryRawValueData cannot be cast to StringData in runtime
> --
>
> Key: FLINK-18862
> URL: https://issues.apache.org/jira/browse/FLINK-18862
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
>Reporter: YUJIANBO
>Assignee: Jark Wu
>Priority: Major
> Fix For: 1.12.0, 1.11.2
>
> Attachments: sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功.txt
>
>
> 1、环境：flinksql、 版本是1.11.1，perjob模式
> 2、报错：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
> 3、产生的背景：
> (1)从kafka建一张表：
> {code:java}
> CREATE TABLE kafka(
> x String,
> y String
> )with(
>'connector' = 'kafka',
> ..
> )
> {code}
> (2)建一张view表：
> {code:java}
>CREATE VIEW view1 AS
>SELECT 
>x, 
>y, 
>CAST(COUNT(1) AS VARCHAR) AS ct
>FROM kafka
>GROUP BY 
>x, y
> {code}
> (3)然后利用这个view再做一次agg操作：
> {code:java}
> select 
>  x, 
>  LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
> FROM view1
> GROUP BY x
> {code}
>然后就报错了：org.apache.flink.table.data.binary.BinaryRawValueData cannot be 
> cast to org.apache.flink.table.data.StringData
>
> (4)但是我尝试不做agg操作，没有报错，说明count(1)的结果值是可以转成String。
> {code:java}
> select 
>   x, 
>   CONCAT_WS('=', y, ct)
> from view1
> {code}
> *但是再经过一次agg的操作为什么会报错呢？*
> 4、问题：通过上面的对比说明count(1) 是能够被cast 成string的，可是再经过一次agg的操作怎么就不行了，
>  请问有什么比较好的办法解决这个问题？
> 5、稍微详细点的报错：
> {code:java}
> java.lang.ClassCastException: 
> org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
>   at 
> org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
> ~[flink-table-blink_2.11-1.11.1.jar:?]
>   at org.apache.flink.table.data.RowData.get(RowData.java:273) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
>

[jira] [Updated] (FLINK-18862) Fix BinaryRawValueData cannot be cast to StringData in runtime

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-18862:

Summary: Fix BinaryRawValueData cannot be cast to StringData in runtime  
(was: sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功)

> Fix BinaryRawValueData cannot be cast to StringData in runtime
> --
>
> Key: FLINK-18862
> URL: https://issues.apache.org/jira/browse/FLINK-18862
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
>Reporter: YUJIANBO
>Priority: Major
> Attachments: sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功.txt
>
>
> 1、环境：flinksql、 版本是1.11.1，perjob模式
> 2、报错：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
> 3、产生的背景：
> (1)从kafka建一张表：
> {code:java}
> CREATE TABLE kafka(
> x String,
> y String
> )with(
>'connector' = 'kafka',
> ..
> )
> {code}
> (2)建一张view表：
> {code:java}
>CREATE VIEW view1 AS
>SELECT 
>x, 
>y, 
>CAST(COUNT(1) AS VARCHAR) AS ct
>FROM kafka
>GROUP BY 
>x, y
> {code}
> (3)然后利用这个view再做一次agg操作：
> {code:java}
> select 
>  x, 
>  LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
> FROM view1
> GROUP BY x
> {code}
>然后就报错了：org.apache.flink.table.data.binary.BinaryRawValueData cannot be 
> cast to org.apache.flink.table.data.StringData
>
> (4)但是我尝试不做agg操作，没有报错，说明count(1)的结果值是可以转成String。
> {code:java}
> select 
>   x, 
>   CONCAT_WS('=', y, ct)
> from view1
> {code}
> *但是再经过一次agg的操作为什么会报错呢？*
> 4、问题：通过上面的对比说明count(1) 是能够被cast 成string的，可是再经过一次agg的操作怎么就不行了，
>  请问有什么比较好的办法解决这个问题？
> 5、稍微详细点的报错：
> {code:java}
> java.lang.ClassCastException: 
> org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
>   at 
> org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
> ~[flink-table-blink_2.11-1.11.1.jar:?]
>   at org.apache.flink.table.data.RowData.get(RowData.java:273) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
>

[jira] [Updated] (FLINK-18862) Fix BinaryRawValueData cannot be cast to StringData in runtime

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-18862:

Fix Version/s: 1.11.2
   1.12.0

> Fix BinaryRawValueData cannot be cast to StringData in runtime
> --
>
> Key: FLINK-18862
> URL: https://issues.apache.org/jira/browse/FLINK-18862
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
>Reporter: YUJIANBO
>Priority: Major
> Fix For: 1.12.0, 1.11.2
>
> Attachments: sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功.txt
>
>
> 1、环境：flinksql、 版本是1.11.1，perjob模式
> 2、报错：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
> 3、产生的背景：
> (1)从kafka建一张表：
> {code:java}
> CREATE TABLE kafka(
> x String,
> y String
> )with(
>'connector' = 'kafka',
> ..
> )
> {code}
> (2)建一张view表：
> {code:java}
>CREATE VIEW view1 AS
>SELECT 
>x, 
>y, 
>CAST(COUNT(1) AS VARCHAR) AS ct
>FROM kafka
>GROUP BY 
>x, y
> {code}
> (3)然后利用这个view再做一次agg操作：
> {code:java}
> select 
>  x, 
>  LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
> FROM view1
> GROUP BY x
> {code}
>然后就报错了：org.apache.flink.table.data.binary.BinaryRawValueData cannot be 
> cast to org.apache.flink.table.data.StringData
>
> (4)但是我尝试不做agg操作，没有报错，说明count(1)的结果值是可以转成String。
> {code:java}
> select 
>   x, 
>   CONCAT_WS('=', y, ct)
> from view1
> {code}
> *但是再经过一次agg的操作为什么会报错呢？*
> 4、问题：通过上面的对比说明count(1) 是能够被cast 成string的，可是再经过一次agg的操作怎么就不行了，
>  请问有什么比较好的办法解决这个问题？
> 5、稍微详细点的报错：
> {code:java}
> java.lang.ClassCastException: 
> org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
>   at 
> org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
> ~[flink-table-blink_2.11-1.11.1.jar:?]
>   at org.apache.flink.table.data.RowData.get(RowData.java:273) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
>

[jira] [Commented] (FLINK-18862) sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174082#comment-17174082
 ] 

Jark Wu commented on FLINK-18862:
-

Thanks for reporting this [~YUJIANBO], I have reproduced this problem in local. 
I will take this issue. 

> sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功
> ---
>
> Key: FLINK-18862
> URL: https://issues.apache.org/jira/browse/FLINK-18862
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
>Reporter: YUJIANBO
>Priority: Major
> Attachments: sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功.txt
>
>
> 1、环境：flinksql、 版本是1.11.1，perjob模式
> 2、报错：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
> 3、产生的背景：
> (1)从kafka建一张表：
> {code:java}
> CREATE TABLE kafka(
> x String,
> y String
> )with(
>'connector' = 'kafka',
> ..
> )
> {code}
> (2)建一张view表：
> {code:java}
>CREATE VIEW view1 AS
>SELECT 
>x, 
>y, 
>CAST(COUNT(1) AS VARCHAR) AS ct
>FROM kafka
>GROUP BY 
>x, y
> {code}
> (3)然后利用这个view再做一次agg操作：
> {code:java}
> select 
>  x, 
>  LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
> FROM view1
> GROUP BY x
> {code}
>然后就报错了：org.apache.flink.table.data.binary.BinaryRawValueData cannot be 
> cast to org.apache.flink.table.data.StringData
>
> (4)但是我尝试不做agg操作，没有报错，说明count(1)的结果值是可以转成String。
> {code:java}
> select 
>   x, 
>   CONCAT_WS('=', y, ct)
> from view1
> {code}
> *但是再经过一次agg的操作为什么会报错呢？*
> 4、问题：通过上面的对比说明count(1) 是能够被cast 成string的，可是再经过一次agg的操作怎么就不行了，
>  请问有什么比较好的办法解决这个问题？
> 5、稍微详细点的报错：
> {code:java}
> java.lang.ClassCastException: 
> org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
>   at 
> org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
> ~[flink-table-blink_2.11-1.11.1.jar:?]
>   at org.apache.flink.table.data.RowData.get(RowData.java:273) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
>

[GitHub] [flink] flinkbot edited a comment on pull request #13098: [FLINK-18866][python] Support filter() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13098:
URL: https://github.com/apache/flink/pull/13098#issuecomment-671163278


   
   ## CI report:
   
   * 8987176f205b330cf101f81aa126637c0846b298 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5332)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13094:
URL: https://github.com/apache/flink/pull/13094#issuecomment-671071159


   
   ## CI report:
   
   * b2d512352be69a3661708014f21f0dad16050c10 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5329)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18835) sql using group by, duplicated group fileld appears

2020-08-09 Thread YHF (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174077#comment-17174077
 ] 

YHF commented on FLINK-18835:
-

[~jark] but why the group result loss, source data (A=2) in a time window, but 
the group result does not have A=2

> sql using group by, duplicated group fileld appears
> ---
>
> Key: FLINK-18835
> URL: https://issues.apache.org/jira/browse/FLINK-18835
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.1
>Reporter: YHF
>Priority: Critical
> Attachments: SumAnalysis.java
>
>
> datasource is kafka,then create a temporary view, group by (fieldA,fieldB) 
> using sql,
> then transform the result table to datastream using toRetractStream, then 
> print the result,
> I find duplicated (fieldA,fieldB)
> see attachment for code
> group by(scanType,scanSite,cmtInf),but result is below
> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})
> 3> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13089:
URL: https://github.com/apache/flink/pull/13089#issuecomment-670624464


   
   ## CI report:
   
   * 5875f7f608fb37a9e6365bbe969c743a5e64bdc8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5331)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #13098: [FLINK-18866][python] Support filter() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot commented on pull request #13098:
URL: https://github.com/apache/flink/pull/13098#issuecomment-671163278


   
   ## CI report:
   
   * 8987176f205b330cf101f81aa126637c0846b298 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] shuiqiangchen commented on a change in pull request #13095: [FLINK-18861][python] Support add_source() to get a DataStream for Py…

2020-08-09 Thread GitBox



shuiqiangchen commented on a change in pull request #13095:
URL: https://github.com/apache/flink/pull/13095#discussion_r467686769



##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,61 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from pyflink.java_gateway import get_gateway
+
+
+class SourceFunction(object):

Review comment:
   Thank you , I will make the same change in the following commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #13098: [FLINK-18866][python] Support filter() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot commented on pull request #13098:
URL: https://github.com/apache/flink/pull/13098#issuecomment-671160441


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 8987176f205b330cf101f81aa126637c0846b298 (Mon Aug 10 
04:50:20 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-18866).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13089:
URL: https://github.com/apache/flink/pull/13089#issuecomment-670624464


   
   ## CI report:
   
   * cb30f04152e15e4ea20f7ad2abd8e48c0437314e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5298)
 
   * 5875f7f608fb37a9e6365bbe969c743a5e64bdc8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13088: [FLINK-18814][docs-zh] Translate the 'Side Outputs' page of 'DataStream API' into Chinese

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13088:
URL: https://github.com/apache/flink/pull/13088#issuecomment-670624360


   
   ## CI report:
   
   * e976b9e55145c4f52efd750ecb56067986a8b34c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5330)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] shuiqiangchen commented on a change in pull request #13095: [FLINK-18861][python] Support add_source() to get a DataStream for Py…

2020-08-09 Thread GitBox



shuiqiangchen commented on a change in pull request #13095:
URL: https://github.com/apache/flink/pull/13095#discussion_r467686241



##
File path: flink-python/pyflink/datastream/stream_execution_environment.py
##
@@ -447,6 +447,23 @@ def get_execution_environment():
 .StreamExecutionEnvironment.getExecutionEnvironment()
 return StreamExecutionEnvironment(j_stream_exection_environment)
 
+def add_source(self, source_func: SourceFunction, source_name: str = 
'Custom Source',
+   type_info: TypeInformation = None) -> 'DataStream':
+"""
+Adds a data source to the streaming topology.
+
+:param source_func: the user defined function.
+:param source_name: name of the data source. Optional.
+:param type_info: type of the returned stream. Optional.
+:return: the data stream constructed.
+"""
+j_type_info = type_info.get_java_type_info() if type_info is not None 
else None
+j_data_stream = 
self._j_stream_execution_environment.addSource(source_func

Review comment:
   Maybe we can do it by a single call for addSource(SourceFunction 
function, String sourceName, TypeInformation typeInfo), since the 
`StreamExecutionEnvironment.addSource(SourceFunction function, String 
sourceName)` will then call 
`StreamExecutionEnvironment.addSource(SourceFunction function, String 
sourceName, TypeInformation typeInfo)` by setting typeInfo to null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] shuiqiangchen commented on a change in pull request #13095: [FLINK-18861][python] Support add_source() to get a DataStream for Py…

2020-08-09 Thread GitBox



shuiqiangchen commented on a change in pull request #13095:
URL: https://github.com/apache/flink/pull/13095#discussion_r467686241



##
File path: flink-python/pyflink/datastream/stream_execution_environment.py
##
@@ -447,6 +447,23 @@ def get_execution_environment():
 .StreamExecutionEnvironment.getExecutionEnvironment()
 return StreamExecutionEnvironment(j_stream_exection_environment)
 
+def add_source(self, source_func: SourceFunction, source_name: str = 
'Custom Source',
+   type_info: TypeInformation = None) -> 'DataStream':
+"""
+Adds a data source to the streaming topology.
+
+:param source_func: the user defined function.
+:param source_name: name of the data source. Optional.
+:param type_info: type of the returned stream. Optional.
+:return: the data stream constructed.
+"""
+j_type_info = type_info.get_java_type_info() if type_info is not None 
else None
+j_data_stream = 
self._j_stream_execution_environment.addSource(source_func

Review comment:
   Maybe we can do it by a single call for addSource(SourceFunction 
function, String sourceName, TypeInformation typeInfo), since the 
`StreamExecutionEnvironment.addSource(SourceFunction function, String 
sourceName)` will then call 
`StreamExecutionEnvironment.addSource(SourceFunction function, String 
sourceName, TypeInformation typeInfo)` by setting typeInfo to be null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18866) Support filter() operation for Python DataStream API.

2020-08-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18866:
---
Labels: pull-request-available  (was: )

> Support filter() operation for Python DataStream API.
> -
>
> Key: FLINK-18866
> URL: https://issues.apache.org/jira/browse/FLINK-18866
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Shuiqiang Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Support filter() interface for Python DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] hequn8128 commented on a change in pull request #13095: [FLINK-18861][python] Support add_source() to get a DataStream for Py…

2020-08-09 Thread GitBox



hequn8128 commented on a change in pull request #13095:
URL: https://github.com/apache/flink/pull/13095#discussion_r467685223



##
File path: 
flink-python/src/test/java/org/apache/flink/python/util/MyCustomSourceFunction.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.python.util;
+
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.types.Row;
+
+import java.util.Random;
+
+/**
+ * A custom source function to for testing add a custom source in Python 
StreamExecutionEnvironment.
+ */
+public class MyCustomSourceFunction implements SourceFunction {
+
+   private static final String[] NAMES = {"Bob", "Marry", "Henry", "Mike", 
"Ted", "Jack"};
+
+   private int recordCount = 50;
+
+   public MyCustomSourceFunction() {
+   }
+
+   public MyCustomSourceFunction(int recordCount) {
+   this.recordCount = recordCount;
+   }
+
+   public void run(SourceContext sourceContext) {
+   Random random = new Random();
+   for (int i = 0; i < recordCount; i++) {
+   Row row = Row.of(random.nextInt(1000), 
NAMES[random.nextInt(NAMES.length)], random.nextDouble());
+   sourceContext.collect(row);
+   }
+

Review comment:
   Remove the empty line.

##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,61 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from pyflink.java_gateway import get_gateway
+
+
+class SourceFunction(object):

Review comment:
   Same as the comments 
[here](https://github.com/apache/flink/pull/13094#discussion_r467659569), so 
that we don't need to introduce the `CustomSourceFunction`.

##
File path: flink-python/pyflink/datastream/stream_execution_environment.py
##
@@ -447,6 +447,23 @@ def get_execution_environment():
 .StreamExecutionEnvironment.getExecutionEnvironment()
 return StreamExecutionEnvironment(j_stream_exection_environment)
 
+def add_source(self, source_func: SourceFunction, source_name: str = 
'Custom Source',
+   type_info: TypeInformation = None) -> 'DataStream':
+"""
+Adds a data source to the streaming topology.
+
+:param source_func: the user defined function.
+:param source_name: name of the data source. Optional.
+:param type_info: type of the returned stream. Optional.
+:return: the data stream constructed.
+"""
+j_type_info = type_info.get_java_type_info() if type_info is not None 
else None
+j_data_stream = 
self._j_stream_execution_environment.addSource(source_func

Review comment:
   If the type_info is None, we should call 
`_j_stream_execution_environment.addSource(SourceFunction function, String 
sourceName)`

##
File path: 
flink-python/src/test/java/org/apache/flink/python/util/MyCustomSourceFunction.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under

[GitHub] [flink] shuiqiangchen opened a new pull request #13098: [FLINK-18866][python] Support filter() operation for Python DataStrea…

2020-08-09 Thread GitBox



shuiqiangchen opened a new pull request #13098:
URL: https://github.com/apache/flink/pull/13098


   
   
   ## What is the purpose of the change
   
   Support filter() operation for Python DataStream API.
   
   ## Brief change log
   
   - Add filter() interface for Python DataStream API
   - Add a new Function class named FilterFunction and a wrapper class for user 
defined filter function.
   
   ## Verifying this change
   
   This change has test case covered by test_filter_with_data_types and 
test_filter_without_data_types in test_data_stream.py
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): ( no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: ( no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18866) Support filter() operation for Python DataStream API.

2020-08-09 Thread Shuiqiang Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuiqiang Chen updated FLINK-18866:
---
Summary: Support filter() operation for Python DataStream API.  (was: 
Support filter() interface for Python DataStream API.)

> Support filter() operation for Python DataStream API.
> -
>
> Key: FLINK-18866
> URL: https://issues.apache.org/jira/browse/FLINK-18866
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Shuiqiang Chen
>Priority: Major
> Fix For: 1.12.0
>
>
> Support filter() interface for Python DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] RocMarshal commented on pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



RocMarshal commented on pull request #13089:
URL: https://github.com/apache/flink/pull/13089#issuecomment-671157475


   Hi, @Thesharing .
   I made some changes  according to your suggestions.
   It looks more perfect now.
   Please take a look.
   Thank you so much for your help and patience.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18857) Invalid lambda deserialization when use ElasticsearchSink#Builder to build ElasticserachSink

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174071#comment-17174071
 ] 

Jark Wu commented on FLINK-18857:
-

Hi [~fsk119], users shouldn't use the shaded jar 
{{flink-sql-connector-elasticsearch}} in programing project, otherwise, there 
is always a class mismatch happen. 
Instead, users should use {{flink-connector-elasticsearch}} as the project 
dependency.

> Invalid lambda deserialization when use ElasticsearchSink#Builder to build 
> ElasticserachSink
> 
>
> Key: FLINK-18857
> URL: https://issues.apache.org/jira/browse/FLINK-18857
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / ElasticSearch, Table SQL / API
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Shengkai Fang
>Priority: Major
>
> Currently when we use code below
> {code:java}
> new ElasticSearchSink.Builder(...).build()
> {code}
> we will get Invalid lambda deserialization error if users doesn't 
> setRestClientFactory explicitly.  The reasion behind this bug has been 
> figured out in FLINK-18006, which is caused by maven. However,  we only fix 
> the behaviour in ElasticSearchDynamicSink. When users who build es sink by 
> themselvies will still get the error. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18852) StreamScan should keep the same parallelism as the input

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174070#comment-17174070
 ] 

Jark Wu commented on FLINK-18852:
-

I think this only happens in legacy planner (maybe this line [1]). Did you try 
the blink planner? 

[1]: 
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/plan/nodes/datastream/StreamScan.scala#L91

> StreamScan should keep the same parallelism as the input
> 
>
> Key: FLINK-18852
> URL: https://issues.apache.org/jira/browse/FLINK-18852
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.1
>Reporter: liupengcheng
>Priority: Major
> Attachments: image-2020-08-07-21-22-57-843.png
>
>
> Currently, the parallelism for StreamTableSourceScan/DataStreamScan is not 
> inherited from  the upstream input, but retrieved from the config. I think 
> this is unexpected.
> I find this issue through UT, here is an example:
> {code:java}
> // env parallelism is set to 4
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> val tEnv = StreamTableEnvironment.create(env)
> StreamITCase.testResults = new mutable.MutableList[String]
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
> env.setParallelism(4)
> // DataSource parallelism is set to 1
> val table1 = env.fromCollection(left)
>   .setParallelism(1)
>   .assignTimestampsAndWatermarks(new 
> TimestampAndWatermarkWithOffset[(Long, String)](0))
>   .toTable(tEnv, 'a, 'b)
> val table2 = env.fromCollection(right)
>   .setParallelism(1)
>   .assignTimestampsAndWatermarks(new 
> TimestampAndWatermarkWithOffset[(Long, String)](0))
>   .toTable(tEnv, 'a, 'b)
> {code}
> But when you start the execution, and visualize the execution plan, you can 
> find that the "from"(the StreamScan) operator's parallelism is 4.
>   !image-2020-08-07-21-22-57-843.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18867) Generic table stored in Hive catalog is incompatible between 1.10 and 1.11

2020-08-09 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-18867:
-
Priority: Critical  (was: Major)

> Generic table stored in Hive catalog is incompatible between 1.10 and 1.11
> --
>
> Key: FLINK-18867
> URL: https://issues.apache.org/jira/browse/FLINK-18867
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Priority: Critical
> Fix For: 1.11.2
>
>
> Generic table stored in 1.10 cannot be accessed in 1.11, because we changed 
> how table schema is stored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18867) Generic table stored in Hive catalog is incompatible between 1.10 and 1.11

2020-08-09 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-18867:
-
Fix Version/s: 1.11.2

> Generic table stored in Hive catalog is incompatible between 1.10 and 1.11
> --
>
> Key: FLINK-18867
> URL: https://issues.apache.org/jira/browse/FLINK-18867
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Priority: Major
> Fix For: 1.11.2
>
>
> Generic table stored in 1.10 cannot be accessed in 1.11, because we changed 
> how table schema is stored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18867) Generic table stored in Hive catalog is incompatible between 1.10 and 1.11

2020-08-09 Thread Rui Li (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated FLINK-18867:
---
Description: Generic table stored in 1.10 cannot be accessed in 1.11, 
because we changed how table schema is stored.

> Generic table stored in Hive catalog is incompatible between 1.10 and 1.11
> --
>
> Key: FLINK-18867
> URL: https://issues.apache.org/jira/browse/FLINK-18867
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Priority: Major
>
> Generic table stored in 1.10 cannot be accessed in 1.11, because we changed 
> how table schema is stored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13094:
URL: https://github.com/apache/flink/pull/13094#issuecomment-671071159


   
   ## CI report:
   
   * a8517591f8e26a91f25c00e2532af495b8fd3c7b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5318)
 
   * b2d512352be69a3661708014f21f0dad16050c10 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5329)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13088: [FLINK-18814][docs-zh] Translate the 'Side Outputs' page of 'DataStream API' into Chinese

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13088:
URL: https://github.com/apache/flink/pull/13088#issuecomment-670624360


   
   ## CI report:
   
   * c9e436eff90e8ba0b2dfc8578d4173227bf6cb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5297)
 
   * e976b9e55145c4f52efd750ecb56067986a8b34c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13097: [FLINK-18864][python] Support key_by() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13097:
URL: https://github.com/apache/flink/pull/13097#issuecomment-671146880


   
   ## CI report:
   
   * 52e8670eef5651d09ad0defeb1a8729aa7c589ba Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5324)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Thesharing commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



Thesharing commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r467681891



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -80,11 +80,13 @@ $ tail log/flink-*-taskexecutor-*.out
   (be,2)
 {% endhighlight %}
 
-Additionally, you can check Flink's [Web UI](http://localhost:8080) to monitor 
the status of the Cluster and running Job.
+另外，你可以检查 Flink 的 [Web UI](http://localhost:8080) 来监视集群的状态和正在运行的作业。

Review comment:
   > How about '查看'？
   
   Great. I think this is better.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-18867) Generic table stored in Hive catalog is incompatible between 1.10 and 1.11

2020-08-09 Thread Rui Li (Jira)

Rui Li created FLINK-18867:
--

 Summary: Generic table stored in Hive catalog is incompatible 
between 1.10 and 1.11
 Key: FLINK-18867
 URL: https://issues.apache.org/jira/browse/FLINK-18867
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.11.1
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] RocMarshal commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



RocMarshal commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r467681592



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 发布版本的正式版本。

Review comment:
   Yes, it's concise and precise obviously.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18844) Support maxwell-json format to read Maxwell changelogs

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-18844:

Summary: Support maxwell-json format to read Maxwell changelogs  (was: 
support maxwell-json for Maxwell binlog collector)

> Support maxwell-json format to read Maxwell changelogs
> --
>
> Key: FLINK-18844
> URL: https://issues.apache.org/jira/browse/FLINK-18844
> Project: Flink
>  Issue Type: Sub-task
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: jinxin
>Assignee: jinxin
>Priority: Major
> Fix For: 1.12.0
>
>
> Hi,i have finish these code .So, can assign this issule  to me ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] RocMarshal commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



RocMarshal commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r467681069



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -80,11 +80,13 @@ $ tail log/flink-*-taskexecutor-*.out
   (be,2)
 {% endhighlight %}
 
-Additionally, you can check Flink's [Web UI](http://localhost:8080) to monitor 
the status of the Cluster and running Job.
+另外，你可以检查 Flink 的 [Web UI](http://localhost:8080) 来监视集群的状态和正在运行的作业。

Review comment:
   How about '查看'？





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18858) Kinesis Flink SQL Connector

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174066#comment-17174066
 ] 

Jark Wu commented on FLINK-18858:
-

+1 for this.

> Kinesis Flink SQL Connector
> ---
>
> Key: FLINK-18858
> URL: https://issues.apache.org/jira/browse/FLINK-18858
> Project: Flink
>  Issue Type: Improvement
>Reporter: Waldemar Hummer
>Priority: Major
>
> Hi all,
> as far as I can see in the [list of 
> connectors|https://github.com/apache/flink/tree/master/flink-connectors], we 
> have a 
> {{[flink-connector-kinesis|https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-kinesis]}}
>  for *programmatic access* to Kinesis streams, but there does not yet seem to 
> exist a *Kinesis SQL connector* (something like 
> {{flink-sql-connector-kinesis}}, analogous to {{flink-sql-connector-kafka}}).
> Our use case would be to enable SQL queries with direct access to Kinesis 
> sources (and potentially sinks), to enable something like the following Flink 
> SQL queries:
> {code:java}
>  $ bin/sql-client.sh embedded
> ...
> Flink SQL> CREATE TABLE Orders(`user` string, amount int, rowtime TIME) WITH 
> ('connector' = 'kinesis', ...);
> ...
> Flink SQL> SELECT * FROM Orders ...;
> ...{code}
>  
> I was wondering if this is something that has been considered, or is already 
> actively being worked on? If one of you can provide some guidance, we may be 
> able to work on a PoC implementation to add this functionality.
>  
> (Wasn't able to find an existing issue in the backlog - if this is a 
> duplicate, then please let me know as well.)
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18840) Support StatementSet with DataStream API

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174065#comment-17174065
 ] 

Jark Wu commented on FLINK-18840:
-

I have the similar thought with [~NicoK] and [~godfreyhe]. What users need is 
using {{StatementSet}} and {{toAppendStream}} in a single job. Therefor, we can 
register the transformations after {{toAppendStream}} as a black-box, wrap the 
transformations as a TableSink, for example, we can return a dummy DataStream 
for {{toAppendStream}} or a {{Consumer> callback}}.

> Support StatementSet with DataStream API
> 
>
> Key: FLINK-18840
> URL: https://issues.apache.org/jira/browse/FLINK-18840
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: Timo Walther
>Priority: Major
>
> Currently, users of the {{StreamTableEnvironment}} cannot not translate a 
> {{StatementSet}} to DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-18860) Translate "Execution Plans" page of "Managing Execution" into Chinese

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-18860:
---

Assignee: Huang Xiao

> Translate "Execution Plans" page of "Managing Execution" into Chinese
> -
>
> Key: FLINK-18860
> URL: https://issues.apache.org/jira/browse/FLINK-18860
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Affects Versions: 1.10.0, 1.11.0, 1.11.1
>Reporter: Huang Xiao
>Assignee: Huang Xiao
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> [https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/execution_plans.html]
> The markdown file is located in {{flink/docs/dev/execution_plans.zh.md}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-11779) CLI ignores -m parameter if high-availability is ZOOKEEPER

2020-08-09 Thread Guowei Ma (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173995#comment-17173995
 ] 

Guowei Ma edited comment on FLINK-11779 at 8/10/20, 4:11 AM:
-

I am not very sure about the detailed scenarios so please correct me if I miss 
something. 

Maybe we should document that -m(rest.address) is only respected by the 
StandaloneHaService. This is also consistent with current implementation. For 
example the yarn/k8s does not respect the “reset.address” set by the user at 
all.

In the Generic CLI mode there is no -m option. (It does not use the 
`AbstractCustomCommandLine`) So this is also consistent with the above 
statement.

I agree with [~aljoscha] that maybe we should not expose the 
`jobmanager.rpc.address` to the client any more. (But this might introduce some 
incompatible problem)

 

What do you think?


was (Author: maguowei):
# What I understand is that the rest server should publish its address to the 
HighAvailabilityService. The rest client should always retrieve the rest server 
address from a HighAvailabilityService. 
 # In the default mode the -m option means two things
 ## User wants to use `StandaloneHaServices`  
 ## Set the value for the `jobmanager.rpc.address`
 #  In the Generic CLI mode there is no -m option. It does not use the 
`AbstractCustomCommandLine`. So in this mode client and server should always 
respect the `HighAvailabilityServcie` in the config. I agree with [~aljoscha] 
that maybe we should not expose the `jobmanager.rpc.address` to the client any 
more.  But this might introduce some incompatible problem.

> CLI ignores -m parameter if high-availability is ZOOKEEPER 
> ---
>
> Key: FLINK-11779
> URL: https://issues.apache.org/jira/browse/FLINK-11779
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Gary Yao
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> The CLI will ignores the host/port provided by the {{-m}} parameter if 
> {{high-availability: ZOOKEEPER}} is configured in {{flink-conf.yaml}}
> *Expected behavior*
> * TBD: either document this behavior or give precedence to {{-m}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13094:
URL: https://github.com/apache/flink/pull/13094#issuecomment-671071159


   
   ## CI report:
   
   * a8517591f8e26a91f25c00e2532af495b8fd3c7b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5318)
 
   * b2d512352be69a3661708014f21f0dad16050c10 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #12985: [FLINK-18682][orc][hive] Vector orc reader cannot read Hive 2.0.0 table

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #12985:
URL: https://github.com/apache/flink/pull/12985#issuecomment-663499515


   
   ## CI report:
   
   * 8c2efee79228f39d19301d15a4ff6d733db85cf4 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5250)
 
   * 17af5c4e28164dd0d04c7766c015d4b9e4894052 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5327)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #12996: [FLINK-18688][table-planner-blink] Fix binary row writing with incorrect order in ProjectionCodeGenerator by removing for loop optimi

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #12996:
URL: https://github.com/apache/flink/pull/12996#issuecomment-664264470


   
   ## CI report:
   
   * b3ae6df40ce58ebbe4dea3e6f1b586ec70b6b91b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5050)
 
   * 13a8ba700f0b3468f7a0da776bba1e5494ad6faf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5328)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on pull request #13088: [FLINK-18814][docs-zh] Translate the 'Side Outputs' page of 'DataStream API' into Chinese

2020-08-09 Thread GitBox



RocMarshal commented on pull request #13088:
URL: https://github.com/apache/flink/pull/13088#issuecomment-671153087


   @XBaith 
   Thank you for your review.
   I made some changes according to you suggestions. That looks better now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18859) ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources failed with "Condition was not met in given timeout."

2020-08-09 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174061#comment-17174061
 ] 

Zhu Zhu commented on FLINK-18859:
-

I tried running the case 1000 times and failed to reproduce in my dev 
environment.
And the transfer.sh unfortunately failed and I cannot see the detailed 
execution log.
I would guess this instability is caused by a temporarily slowness of the CI 
environment, which resulted in the job to not reach the expected state(FAILED) 
within 2000 ms.

> ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources
>  failed with "Condition was not met in given timeout."
> -
>
> Key: FLINK-18859
> URL: https://issues.apache.org/jira/browse/FLINK-18859
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5300=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=66b5c59a-0094-561d-0e44-b149dfdd586d]
> {code}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.673 
> s <<< FAILURE! - in 
> org.apache.flink.runtime.executiongraph.ExecutionGraphNotEnoughResourceTest
> [ERROR] 
> testRestartWithSlotSharingAndNotEnoughResources(org.apache.flink.runtime.executiongraph.ExecutionGraphNotEnoughResourceTest)
>   Time elapsed: 3.158 s  <<< ERROR!
> java.util.concurrent.TimeoutException: Condition was not met in given timeout.
>   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:129)
>   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:119)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources(ExecutionGraphNotEnoughResourceTest.java:130)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18840) Support StatementSet with DataStream API

2020-08-09 Thread godfrey he (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174059#comment-17174059
 ] 

godfrey he commented on FLINK-18840:


how about adding some methods in {{StatementSet}}, just like:

{code:java}
public interface StatementSet {
   // convert the given table to DataStream, do map/flapMap/xx through 
`function`
   StatementSet addAppendStream(Table table, Class clazz, 
Function, DataSink> function);

   StatementSet addRetractStream(Table table, Class clazz, 
Function>, DataSink> function);

// or unify the above two methods into one ?
 StatementSet addDataStream(Table table, Function, DataSink> 
function);
}
{code}

pros: unified use interface, just like {{addInsertSql}} and {{addInsert}}. and 
we can use {{StatementSet#execute}} to execute SQL/Table/DataStream as a whole 
job.
cons: we need add dependency {{flink-streaming-java}} for 
{{flink-table-api-java}} 


> Support StatementSet with DataStream API
> 
>
> Key: FLINK-18840
> URL: https://issues.apache.org/jira/browse/FLINK-18840
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: Timo Walther
>Priority: Major
>
> Currently, users of the {{StreamTableEnvironment}} cannot not translate a 
> {{StatementSet}} to DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13097: [FLINK-18864][python] Support key_by() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #13097:
URL: https://github.com/apache/flink/pull/13097#issuecomment-671146880


   
   ## CI report:
   
   * 52e8670eef5651d09ad0defeb1a8729aa7c589ba Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5324)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #12996: [FLINK-18688][table-planner-blink] Fix binary row writing with incorrect order in ProjectionCodeGenerator by removing for loop optimi

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #12996:
URL: https://github.com/apache/flink/pull/12996#issuecomment-664264470


   
   ## CI report:
   
   * b3ae6df40ce58ebbe4dea3e6f1b586ec70b6b91b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5050)
 
   * 13a8ba700f0b3468f7a0da776bba1e5494ad6faf UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #12985: [FLINK-18682][orc][hive] Vector orc reader cannot read Hive 2.0.0 table

2020-08-09 Thread GitBox



flinkbot edited a comment on pull request #12985:
URL: https://github.com/apache/flink/pull/12985#issuecomment-663499515


   
   ## CI report:
   
   * 8c2efee79228f39d19301d15a4ff6d733db85cf4 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5250)
 
   * 17af5c4e28164dd0d04c7766c015d4b9e4894052 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18678) Hive connector fails to create vector orc reader if user specifies incorrect hive version

2020-08-09 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-18678:
-
Issue Type: Task  (was: Improvement)

> Hive connector fails to create vector orc reader if user specifies incorrect 
> hive version
> -
>
> Key: FLINK-18678
> URL: https://issues.apache.org/jira/browse/FLINK-18678
> Project: Flink
>  Issue Type: Task
>  Components: Connectors / Hive, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile)
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.2
>
>
> Issue reported by user. User's Hive deployment is 2.1.1 and uses 
> {{flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar}} in Flink lib. If user 
> specifies Hive version as 2.1.1, then creating vectorized orc reader fails 
> with exception:
> {noformat}
> java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.ReaderImpl 
> cannot be cast to org.apache.orc.Reader
>   at org.apache.flink.orc.shim.OrcShimV200.createReader(OrcShimV200.java:63) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.shim.OrcShimV200.createRecordReader(OrcShimV200.java:98) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at org.apache.flink.orc.OrcSplitReader.(OrcSplitReader.java:73) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcColumnarRowSplitReader.(OrcColumnarRowSplitReader.java:54)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcSplitReaderUtil.genPartColumnarRowReader(OrcSplitReaderUtil.java:91)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
> ..
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-18678) Hive connector fails to create vector orc reader if user specifies incorrect hive version

2020-08-09 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-18678.

Resolution: Fixed

master: 75ddec657d101476f93a367dc94e749269ea87a1

release-1.11: c0aaa2d0e9f3035820ff9e4a5c565bde4a75b42c

> Hive connector fails to create vector orc reader if user specifies incorrect 
> hive version
> -
>
> Key: FLINK-18678
> URL: https://issues.apache.org/jira/browse/FLINK-18678
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile)
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.2
>
>
> Issue reported by user. User's Hive deployment is 2.1.1 and uses 
> {{flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar}} in Flink lib. If user 
> specifies Hive version as 2.1.1, then creating vectorized orc reader fails 
> with exception:
> {noformat}
> java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.ReaderImpl 
> cannot be cast to org.apache.orc.Reader
>   at org.apache.flink.orc.shim.OrcShimV200.createReader(OrcShimV200.java:63) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.shim.OrcShimV200.createRecordReader(OrcShimV200.java:98) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at org.apache.flink.orc.OrcSplitReader.(OrcSplitReader.java:73) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcColumnarRowSplitReader.(OrcColumnarRowSplitReader.java:54)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcSplitReaderUtil.genPartColumnarRowReader(OrcSplitReaderUtil.java:91)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
> ..
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] lirui-apache commented on pull request #12985: [FLINK-18682][orc][hive] Vector orc reader cannot read Hive 2.0.0 table

2020-08-09 Thread GitBox



lirui-apache commented on pull request #12985:
URL: https://github.com/apache/flink/pull/12985#issuecomment-671150008


   @JingsongLi I have refactored the code and added a util class to decide 
whether to use new or legacy timestamp vector. Please have another look, thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] lirui-apache commented on pull request #13078: [FLINK-18659][hive][orc] Fix streaming write for Hive 1.x Orc table

2020-08-09 Thread GitBox



lirui-apache commented on pull request #13078:
URL: https://github.com/apache/flink/pull/13078#issuecomment-671149793


   @JingsongLi Pleas help review this PR, thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhuzhurk commented on a change in pull request #11873: [FLINK-17295] Refactor the ExecutionAttemptID to consist of Execution…

2020-08-09 Thread GitBox



zhuzhurk commented on a change in pull request #11873:
URL: https://github.com/apache/flink/pull/11873#discussion_r467667602



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionAttemptID.java
##
@@ -18,33 +18,77 @@
 
 package org.apache.flink.runtime.executiongraph;
 
-import org.apache.flink.util.AbstractID;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID;
 
 import org.apache.flink.shaded.netty4.io.netty.buffer.ByteBuf;
+import org.apache.flink.util.Preconditions;
 
 /**
  * Unique identifier for the attempt to execute a tasks. Multiple attempts 
happen
  * in cases of failures and recovery.
  */
-public class ExecutionAttemptID extends AbstractID {
+public class ExecutionAttemptID implements java.io.Serializable {
 
private static final long serialVersionUID = -1169683445778281344L;
 
+   private final ExecutionVertexID executionVertexID;
+   private final int attemptNumber;
+
+   /**
+* Get a random execution attempt id.
+*/
public ExecutionAttemptID() {
+   this(new ExecutionVertexID(), 0);
}
 
-   public ExecutionAttemptID(long lowerPart, long upperPart) {
-   super(lowerPart, upperPart);
+   public ExecutionAttemptID(ExecutionVertexID executionVertexID, int 
attemptNumber) {
+   Preconditions.checkState(attemptNumber >= 0);
+   this.executionVertexID = 
Preconditions.checkNotNull(executionVertexID);
+   this.attemptNumber = attemptNumber;
}
 
public void writeTo(ByteBuf buf) {
-   buf.writeLong(this.lowerPart);
-   buf.writeLong(this.upperPart);
+   executionVertexID.writeTo(buf);
+   buf.writeInt(this.attemptNumber);
}
 
public static ExecutionAttemptID fromByteBuf(ByteBuf buf) {
-   long lower = buf.readLong();
-   long upper = buf.readLong();
-   return new ExecutionAttemptID(lower, upper);
+   final ExecutionVertexID executionVertexID = 
ExecutionVertexID.fromByteBuf(buf);
+   final int attemptNumber = buf.readInt();
+   return new ExecutionAttemptID(executionVertexID, attemptNumber);
+   }
+
+   @VisibleForTesting
+   public int getAttemptNumber() {
+   return attemptNumber;
+   }
+
+   @VisibleForTesting
+   public ExecutionVertexID getExecutionVertexID() {

Review comment:
   NIT: `getExecutionVertexId` would be better than `getExecutionVertexID`

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionAttemptID.java
##
@@ -18,33 +18,77 @@
 
 package org.apache.flink.runtime.executiongraph;
 
-import org.apache.flink.util.AbstractID;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID;
 
 import org.apache.flink.shaded.netty4.io.netty.buffer.ByteBuf;
+import org.apache.flink.util.Preconditions;
 
 /**
  * Unique identifier for the attempt to execute a tasks. Multiple attempts 
happen
  * in cases of failures and recovery.
  */
-public class ExecutionAttemptID extends AbstractID {
+public class ExecutionAttemptID implements java.io.Serializable {
 
private static final long serialVersionUID = -1169683445778281344L;
 
+   private final ExecutionVertexID executionVertexID;
+   private final int attemptNumber;
+
+   /**
+* Get a random execution attempt id.
+*/
public ExecutionAttemptID() {
+   this(new ExecutionVertexID(), 0);
}
 
-   public ExecutionAttemptID(long lowerPart, long upperPart) {
-   super(lowerPart, upperPart);
+   public ExecutionAttemptID(ExecutionVertexID executionVertexID, int 
attemptNumber) {
+   Preconditions.checkState(attemptNumber >= 0);
+   this.executionVertexID = 
Preconditions.checkNotNull(executionVertexID);
+   this.attemptNumber = attemptNumber;
}
 
public void writeTo(ByteBuf buf) {
-   buf.writeLong(this.lowerPart);
-   buf.writeLong(this.upperPart);
+   executionVertexID.writeTo(buf);
+   buf.writeInt(this.attemptNumber);
}
 
public static ExecutionAttemptID fromByteBuf(ByteBuf buf) {
-   long lower = buf.readLong();
-   long upper = buf.readLong();
-   return new ExecutionAttemptID(lower, upper);
+   final ExecutionVertexID executionVertexID = 
ExecutionVertexID.fromByteBuf(buf);
+   final int attemptNumber = buf.readInt();
+   return new ExecutionAttemptID(executionVertexID, attemptNumber);
+   }
+
+   @VisibleForTesting
+   public int getAttemptNumber() {
+   return attemptNumber;
+   }
+
+

[jira] [Assigned] (FLINK-18678) Hive connector fails to create vector orc reader if user specifies incorrect hive version

2020-08-09 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned FLINK-18678:


Assignee: Rui Li

> Hive connector fails to create vector orc reader if user specifies incorrect 
> hive version
> -
>
> Key: FLINK-18678
> URL: https://issues.apache.org/jira/browse/FLINK-18678
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile)
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.2
>
>
> Issue reported by user. User's Hive deployment is 2.1.1 and uses 
> {{flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar}} in Flink lib. If user 
> specifies Hive version as 2.1.1, then creating vectorized orc reader fails 
> with exception:
> {noformat}
> java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.ReaderImpl 
> cannot be cast to org.apache.orc.Reader
>   at org.apache.flink.orc.shim.OrcShimV200.createReader(OrcShimV200.java:63) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.shim.OrcShimV200.createRecordReader(OrcShimV200.java:98) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at org.apache.flink.orc.OrcSplitReader.(OrcSplitReader.java:73) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcColumnarRowSplitReader.(OrcColumnarRowSplitReader.java:54)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcSplitReaderUtil.genPartColumnarRowReader(OrcSplitReaderUtil.java:91)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
> ..
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18678) Hive connector fails to create vector orc reader if user specifies incorrect hive version

2020-08-09 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-18678:
-
Fix Version/s: 1.11.2

> Hive connector fails to create vector orc reader if user specifies incorrect 
> hive version
> -
>
> Key: FLINK-18678
> URL: https://issues.apache.org/jira/browse/FLINK-18678
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile)
>Affects Versions: 1.11.1
>Reporter: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.2
>
>
> Issue reported by user. User's Hive deployment is 2.1.1 and uses 
> {{flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar}} in Flink lib. If user 
> specifies Hive version as 2.1.1, then creating vectorized orc reader fails 
> with exception:
> {noformat}
> java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.ReaderImpl 
> cannot be cast to org.apache.orc.Reader
>   at org.apache.flink.orc.shim.OrcShimV200.createReader(OrcShimV200.java:63) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.shim.OrcShimV200.createRecordReader(OrcShimV200.java:98) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at org.apache.flink.orc.OrcSplitReader.(OrcSplitReader.java:73) 
> ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcColumnarRowSplitReader.(OrcColumnarRowSplitReader.java:54)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
>   at 
> org.apache.flink.orc.OrcSplitReaderUtil.genPartColumnarRowReader(OrcSplitReaderUtil.java:91)
>  ~[flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar:1.11.0]
> ..
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] JingsongLi merged pull request #12988: [FLINK-18678][hive][doc] Update doc about setting hive version

2020-08-09 Thread GitBox



JingsongLi merged pull request #12988:
URL: https://github.com/apache/flink/pull/12988


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] lirui-apache commented on pull request #12988: [FLINK-18678][hive][doc] Update doc about setting hive version

2020-08-09 Thread GitBox



lirui-apache commented on pull request #12988:
URL: https://github.com/apache/flink/pull/12988#issuecomment-671148750


   @JingsongLi I have updated the `catalogs` page. Please have another look, 
thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] hequn8128 commented on a change in pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



hequn8128 commented on a change in pull request #13094:
URL: https://github.com/apache/flink/pull/13094#discussion_r467659675



##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+class SinkFunction(object):

Review comment:
   Could we also add tests for SinkFunction. For example, you can add a 
CustomSinkFunction which extends SinkFunction. The code looks like:
   ```
   class TestSinkFunction(SinkFunction):
  def __init__(self, func):
 ...
  def get_results():
...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

2020-08-09 Thread Yumeng Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174053#comment-17174053
 ] 

Yumeng Zhang commented on FLINK-16069:
--

[~zhuzh] Yes, we encountered the same problem when deploying a job with a large 
number of tasks. It took a few minutes to finish deploying these tasks.

> Creation of TaskDeploymentDescriptor can block main thread for long time
> 
>
> Key: FLINK-16069
> URL: https://issues.apache.org/jira/browse/FLINK-16069
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: huweihua
>Priority: Major
>
> The deploy of tasks will take long time when we submit a high parallelism 
> job. And Execution#deploy run in mainThread, so it will block JobMaster 
> process other akka messages, such as Heartbeat. The creation of 
> TaskDeploymentDescriptor take most of time. We can put the creation in future.
> For example, A job [source(8000)->sink(8000)], the total 16000 tasks from 
> SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of 
> TaskManager timeout and job never success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18783) Load AkkaRpcService through separate class loader

2020-08-09 Thread Matt Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174052#comment-17174052
 ] 

Matt Wang commented on FLINK-18783:
---

[~trohrmann] Have you started doing this? if not, I'd like to do this.

> Load AkkaRpcService through separate class loader
> -
>
> Key: FLINK-18783
> URL: https://issues.apache.org/jira/browse/FLINK-18783
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.1, 1.12.0, 1.11.1
>Reporter: Till Rohrmann
>Priority: Major
>
> In order to reduce the runtime dependency on Scala and also to hide the Akka 
> dependency I suggest to load the AkkaRpcService and its dependencies through 
> a separate class loader similar to what we do with Flink's plugins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #13097: [FLINK-18864][python] Support key_by() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot commented on pull request #13097:
URL: https://github.com/apache/flink/pull/13097#issuecomment-671146880


   
   ## CI report:
   
   * 52e8670eef5651d09ad0defeb1a8729aa7c589ba UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18742) Some configuration args do not take effect at client

2020-08-09 Thread Matt Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174050#comment-17174050
 ] 

Matt Wang commented on FLINK-18742:
---

[~fly_in_gis] Can you help me review the PR, thx.

> Some configuration args do not take effect at client
> 
>
> Key: FLINK-18742
> URL: https://issues.apache.org/jira/browse/FLINK-18742
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Affects Versions: 1.11.1
>Reporter: Matt Wang
>Assignee: Matt Wang
>Priority: Major
>  Labels: pull-request-available
>
> Some configuration args from command line will not work at client, for 
> example, the job sets the {color:#505f79}_classloader.resolve-order_{color} 
> to _{color:#505f79}parent-first,{color}_ it can work at TaskManager, but 
> Client doesn't.
> The *FlinkUserCodeClassLoaders* will be created before calling the method of 
> _{color:#505f79}getEffectiveConfiguration(){color}_ at 
> {color:#505f79}org.apache.flink.client.cli.CliFrontend{color}, so the 
> _{color:#505f79}Configuration{color}_ used by 
> _{color:#505f79}PackagedProgram{color}_ does not include Configuration args.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] hequn8128 commented on a change in pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



hequn8128 commented on a change in pull request #13094:
URL: https://github.com/apache/flink/pull/13094#discussion_r467659569



##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+class SinkFunction(object):
+"""
+The base class for SinkFunctions.
+"""
+
+def __init__(self, j_sink_func):

Review comment:
   It makes no sense to let Python users init SinkFunction with a Java 
object. I think we can make the SinkFunction extends JavaFunctionWrapper and 
add a `get_java_function()` method in `JavaFunctionWrapper`.
   Meanwhile, the `JavaFunctionWrapper` should support initialization both with 
a java string or a java object.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] hequn8128 commented on a change in pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



hequn8128 commented on a change in pull request #13094:
URL: https://github.com/apache/flink/pull/13094#discussion_r467659569



##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+class SinkFunction(object):
+"""
+The base class for SinkFunctions.
+"""
+
+def __init__(self, j_sink_func):

Review comment:
   It makes no sense to let Python users init SinkFunction with a Java 
object. I think we can make the SinkFunction extends JavaFunctionWrapper and 
add a `get_java_function()` method in `FunctionWrapper`.
   Meanwhile, the SinkFunction should support initialization with a java string 
class.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-18866) Support filter() interface for Python DataStream API.

2020-08-09 Thread Shuiqiang Chen (Jira)

Shuiqiang Chen created FLINK-18866:
--

 Summary: Support filter() interface for Python DataStream API.
 Key: FLINK-18866
 URL: https://issues.apache.org/jira/browse/FLINK-18866
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0


Support filter() interface for Python DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-18861) Support add_source() to get a DataStream for Python StreamExecutionEnvironment

2020-08-09 Thread Hequn Cheng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng reassigned FLINK-18861:
---

Assignee: Shuiqiang Chen

> Support add_source() to get a DataStream for Python StreamExecutionEnvironment
> --
>
> Key: FLINK-18861
> URL: https://issues.apache.org/jira/browse/FLINK-18861
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Shuiqiang Chen
>Assignee: Shuiqiang Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Support add_source() to get a DataStream for Python 
> StreamExecutionEnvironment. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] hequn8128 commented on a change in pull request #13094: [FLINK-18766][python] Support add_sink() for Python DataStream API.

2020-08-09 Thread GitBox



hequn8128 commented on a change in pull request #13094:
URL: https://github.com/apache/flink/pull/13094#discussion_r467670697



##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+class SinkFunction(object):

Review comment:
   It's better to add the SinkFunction in the functions.py

##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+class SinkFunction(object):
+"""
+The base class for SinkFunctions.
+"""
+
+def __init__(self, j_sink_func):

Review comment:
   It makes no sense to let Python users init SinkFunction with a Java 
object. I think we can make the SinkFunction extends FunctionWrapper and add a 
`get_java_function()` method in `FunctionWrapper`.
   Meanwhile, the SinkFunction should support initialization with a java string 
class.  

##
File path: flink-python/pyflink/datastream/connectors.py
##
@@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+class SinkFunction(object):

Review comment:
   Could we also add tests for SinkFunction.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] WeiZhong94 commented on a change in pull request #13084: [FLINK-18847][docs][python] Add documentation about data types in Python Table API

2020-08-09 Thread GitBox



WeiZhong94 commented on a change in pull request #13084:
URL: https://github.com/apache/flink/pull/13084#discussion_r466900408



##
File path: docs/dev/table/python/python_types.md
##
@@ -0,0 +1,74 @@
+---
+title: "Python Data Types"
+nav-parent_id: python_tableapi
+nav-pos: 15
+---
+
+
+This page describes the data types supported in PyFlink Table API.
+
+* This will be replaced by the TOC
+{:toc}
+
+Data Type
+-
+
+A *data type* describes the logical type of a value in the table ecosystem. It 
can be used to declare input and/or
+output types of Python user-defined functions. Users of the Python Table API 
work with instances of
+`pyflink.table.types.DataType` within the Python Table API or when defining 
user-defined functions.
+
+A `DataType` instance declares the **logical type** which does not imply a 
concrete physical representation for transmission
+or storage. All pre-defined data types are available in `pyflink.table.types` 
and can be instantiated with the utility methods
+defined in `pyflink.table.types.DataTypes`.
+
+A list of all pre-defined data types can be found [below]({{ site.baseurl 
}}/dev/table/types.html#list-of-data-types).
+
+Data Type and Python type mapping

Review comment:
   Capitalize the first letter?

##
File path: docs/dev/table/types.zh.md
##
@@ -397,6 +438,21 @@ DataTypes.DECIMAL(p, s)
 |`java.math.BigDecimal`| X | X  | *缺省* 
  |
 |`org.apache.flink.table.data.DecimalData` | X | X  | 内部数据结构。 |
 
+
+
+
+{% highlight python %}
+DataTypes.DECIMAL(p, s)
+{% endhighlight %}
+
+注意 当前，声明`DataTypes.DECIMAL(p, 
s)`中的所指定的精度 `p` 必须为38，尾数 `n` 必须为 `18`。

Review comment:
   38 -> \`38\`?

##
File path: docs/dev/table/types.zh.md
##
@@ -698,6 +797,21 @@ DataTypes.TIMESTAMP(p)
 |`java.sql.Timestamp`| X | X  |
   |
 |`org.apache.flink.table.data.TimestampData` | X | X  | 内部数据结构。
  |
 
+
+
+
+{% highlight python %}
+DataTypes.TIMESTAMP(p)
+{% endhighlight %}
+
+注意 
当前，声明`DataTypes.TIMESTAMP(p)`中的所指定的精度 `p` 必须为3。

Review comment:
   ditto

##
File path: docs/dev/table/types.zh.md
##
@@ -656,6 +749,19 @@ DataTypes.TIME(p)
 |`java.lang.Long`  | X | X  | 描述自当天以来的纳秒数。 |
 |`long`| X | (X)| 描述自当天以来的纳秒数。仅当类型不可为空时才输出。 |
 
+
+
+
+{% highlight python %}
+DataTypes.TIME(p)
+{% endhighlight %}
+
+注意 当前，声明`DataTypes.TIME(p)`中的所指定的精度 
`p` 必须为0。

Review comment:
   0 -> \`0\`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] WeiZhong94 commented on a change in pull request #13084: [FLINK-18847][docs][python] Add documentation about data types in Python Table API

2020-08-09 Thread GitBox



WeiZhong94 commented on a change in pull request #13084:
URL: https://github.com/apache/flink/pull/13084#discussion_r467670215



##
File path: docs/dev/table/python/python_types.md
##
@@ -0,0 +1,74 @@
+---
+title: "Python Data Types"
+nav-parent_id: python_tableapi
+nav-pos: 15
+---
+
+
+This page describes the data types supported in PyFlink Table API.
+
+* This will be replaced by the TOC
+{:toc}
+
+Data Type
+-
+
+A *data type* describes the logical type of a value in the table ecosystem. It 
can be used to declare input and/or
+output types of Python user-defined functions. Users of the Python Table API 
work with instances of
+`pyflink.table.types.DataType` within the Python Table API or when defining 
user-defined functions.
+
+A `DataType` instance declares the **logical type** which does not imply a 
concrete physical representation for transmission
+or storage. All pre-defined data types are available in `pyflink.table.types` 
and can be instantiated with the utility methods
+defined in `pyflink.table.types.DataTypes`.
+
+A list of all pre-defined data types can be found [below]({{ site.baseurl 
}}/dev/table/types.html#list-of-data-types).
+
+Data Type and Python type mapping

Review comment:
   type mapping -> Type Mapping ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #13097: [FLINK-18864][python] Support key_by() operation for Python DataStrea…

2020-08-09 Thread GitBox



flinkbot commented on pull request #13097:
URL: https://github.com/apache/flink/pull/13097#issuecomment-671143340


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 52e8670eef5651d09ad0defeb1a8729aa7c589ba (Mon Aug 10 
03:05:45 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-18864).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Thesharing commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-09 Thread GitBox



Thesharing commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r467664155



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 发布版本的正式版本。

Review comment:
   I wonder would it be better to say "只发布Apache Flink 的官方发行版"? 

##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 发布版本的正式版本。
   
-  Since you are currently looking at the latest SNAPSHOT
-  version of the documentation, all version references below will not work.
-  Please switch the documentation to the latest released version via the 
release picker which you
-  find on the left side below the menu.
+  由于你当前正在查看文档的最新快照版本，因此以下所有版本引用都将不起作用。请通过菜单左侧的发布选择器将文档切换到最新发布的版本。
 
 {% else %}
-Follow these few steps to download the latest stable versions and get started.
+请按照以下几个步骤下载最新的稳定版本并开始使用。
 
-## Step 1: Download
+
 
-To be able to run Flink, the only requirement is to have a working __Java 8 or 
11__ installation.
-You can check the correct installation of Java by issuing the following 
command:
+## 步骤 1：下载
+
+为了能够运行 Flink，唯一的要求就是安装有效的 __Java 8 或者 Java 11__。你可以通过发出以下命令来检查 Java 的正确安装。
 
 {% highlight bash %}
 java -version
 {% endhighlight %}
 
-[Download](https://flink.apache.org/downloads.html) the {{ site.version }} 
release and un-tar it. 
+[下载](https://flink.apache.org/downloads.html) {{ site.version }} 发行版本并解压。
 
 {% highlight bash %}
 $ tar -xzf flink-{{ site.version }}-bin-scala{{ site.scala_version_suffix 
}}.tgz
 $ cd flink-{{ site.version }}-bin-scala{{ site.scala_version_suffix }}
 {% endhighlight %}
 
-## Step 2: Start a Cluster
+
+
+## 步骤 2：启动集群
 
-Flink ships with a single bash script to start a local cluster.
+Flink 附带一个 bash 脚本来启动本地集群。

Review comment:
   Would it be better if we translate the whole sentence like 
“Flink附带了一个bash脚本，可以用于启动本地集群。”?

##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 发布版本的正式版本。
   
-  Since you are currently looking at the latest SNAPSHOT
-  version of the documentation, all version references below will not work.
-  Please switch the documentation to the latest released version via the 
release picker which you
-  find on the left side below the menu.
+  由于你当前正在查看文档的最新快照版本，因此以下所有版本引用都将不起作用。请通过菜单左侧的发布选择器将文档切换到最新发布的版本。

Review comment:
   Would it be better to replace “引用” with "链接"? In addition, would it 
better to translate the latter sentence like "请通过左侧菜单底部的“版本选择”将文档切换到最新发布的版本。"?

##
File path: docs/try-flink/local_installation.zh.md
##
@@ -64,10 +63,11 @@ Starting standalonesession daemon on host.
 Starting taskexecutor daemon on host.
 {% endhighlight %}
 
-## Step 3: Submit a Job
+
 
-Releases of Flink come with a number of example Jobs.
-You can quickly deploy one of these applications to the running cluster. 
+## 步骤 3：提交作业（Job）
+
+Flink 的发行版本附带了许多示例作业。你可以将这些应用程序之一快速部署到正在运行的集群。

Review comment:
   Would it be better if we remove "之一"?

##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 发布版本的正式版本。
   
-  Since you are currently looking at the latest SNAPSHOT
-  version of the documentation, all version references below will not work.
-  Please switch the documentation to the latest released version via the 
release picker which you
-  find on the left side below the menu.
+  由于你当前正在查看文档的最新快照版本，因此以下所有版本引用都将不起作用。请通过菜单左侧的发布选择器将文档切换到最新发布的版本。
 
 {% else %}
-Follow these few steps to download the latest stable versions and get started.
+请按照以下几个步骤下载最新的稳定版本并开始使用。
 
-## Step 1: Download
+
 
-To be able to run Flink, the only requirement is to have a working __Java 8 or 
11__ installation.
-You can check the correct installation of Java by issuing the following 
command:
+## 步骤 1：下载
+
+为了能够运行 Flink，唯一的要求就是安装有效的 __Java 8 或者 Java 11__。你可以通过发出以下命令来检查 Java 的正确安装。

Review comment:
   I think "发出以下命令" is better to be replaced with "运行以下命令".

##
File path: docs/try-flink/local_installation.zh.md
##
@@ -80,11 +80,13 @@ $ tail log/flink-*-taskexecutor-*.out
   (be,2)
 {% endhighlight %}
 
-Additionally, you

[jira] [Updated] (FLINK-18864) Support key_by() operation for Python DataStream API

2020-08-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18864:
---
Labels: pull-request-available  (was: )

> Support key_by() operation for Python DataStream API
> 
>
> Key: FLINK-18864
> URL: https://issues.apache.org/jira/browse/FLINK-18864
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Shuiqiang Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Support key_by() operation for Python DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] shuiqiangchen opened a new pull request #13097: [FLINK-18864][python] Support key_by() operation for Python DataStrea…

2020-08-09 Thread GitBox



shuiqiangchen opened a new pull request #13097:
URL: https://github.com/apache/flink/pull/13097


   
   
   ## What is the purpose of the change
   
   Support key_by() operation for Python DataStream API.
   
   ## Brief change log
   
   - Add a new class nemed KeyStream to represent a keyed DataStream.
   - Add key_by() interface for DataStream.
   
   ## Verifying this change
   
   This change has test cases covered in test_key_by() and test_key_by_map() in 
test_data_stream.py
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): ( no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: ( no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18695) Allow NettyBufferPool to allocate heap buffers

2020-08-09 Thread Yun Gao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174031#comment-17174031
 ] 

Yun Gao commented on FLINK-18695:
-

Sorry for the long delay for being in vacation, and I will start working on 
this issue from today.

> Allow NettyBufferPool to allocate heap buffers
> --
>
> Key: FLINK-18695
> URL: https://issues.apache.org/jira/browse/FLINK-18695
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Chesnay Schepler
>Assignee: Yun Gao
>Priority: Major
> Fix For: 1.12.0
>
>
> in 4.1.43 netty made a change to their SslHandler to always use heap buffers 
> for JDK SSLEngine implementations, to avoid an additional memory copy.
> However, our {{NettyBufferPool}} forbids heap buffer allocations.
> We will either have to allow heap buffer allocations, or create a custom 
> SslHandler implementation that does not use heap buffers (although this seems 
> ill-adviced?).
> /cc [~sewen] [~uce] [~NicoK] [~zjwang] [~pnowojski]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-18792) TaskManager Start Failure

2020-08-09 Thread Xintong Song (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-18792.

Resolution: Duplicate

Hi [~cr0566], thanks for reporting this.
Usually, for tracking each issue we keep only one JIRA ticket, in order to keep 
all the discussions at one place. Therefore, I'm closing this ticket and we can 
keep the discussion in FLINK-18438.

> TaskManager Start Failure
> -
>
> Key: FLINK-18792
> URL: https://issues.apache.org/jira/browse/FLINK-18792
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.11.0
> Environment: OS: Windows 10/x64
> Terminal: Cygwin
> java version "1.8.0_261"
> Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
>  
> Flink-1.11.0
> Maven 3.63
> Scala: None
>Reporter: Chris R
>Priority: Major
>
> https://issues.apache.org/jira/browse/FLINK-18438?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17147168#comment-17147168
>  
> This is a very similar issue to the link above, I think. I'm getting an issue 
> that is this in the standalone out file:
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> Improperly specified VM option 'MaxMetaspaceSize=268435456 '
>  
> TM_RESOURCE_PARAMS extraction logs:
> jvm_params: -Xmx536870902 -Xms536870902 -XX:MaxDirectMemorySize=268435458 
> -XX:MaxMetaspaceSize=268435456
> dynamic_configs: -D taskmanager.memory.framework.off-heap.size=134217728b -D 
> taskmanager.memory.network.max=134217730b -D 
> taskmanager.memory.network.min=134217730b -D 
> taskmanager.memory.framework.heap.size=134217728b -D 
> taskmanager.memory.managed.size=536870920b -D taskmanager.cpu.cores=1.0 -D 
> taskmanager.memory.task.heap.size=402653174b -D 
> taskmanager.memory.task.off-heap.size=0b
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] XBaith commented on a change in pull request #13088: [FLINK-18814][docs-zh] Translate the 'Side Outputs' page of 'DataStream API' into Chinese

2020-08-09 Thread GitBox



XBaith commented on a change in pull request #13088:
URL: https://github.com/apache/flink/pull/13088#discussion_r467664455



##
File path: docs/dev/stream/side_output.zh.md
##
@@ -110,17 +101,15 @@ val mainDataStream = input
   // emit data to regular output

Review comment:
   发送数据到主要的输出

##
File path: docs/dev/stream/side_output.zh.md
##
@@ -26,21 +26,15 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-In addition to the main stream that results from `DataStream` operations, you 
can also produce any
-number of additional side output result streams. The type of data in the 
result streams does not
-have to match the type of data in the main stream and the types of the 
different side outputs can
-also differ. This operation can be useful when you want to split a stream of 
data where you would
-normally have to replicate the stream and then filter out from each stream the 
data that you don't
-want to have.
+除了由 `DataStream` 
操作产生的主要流之外，你还可以产生任意数量的旁路输出结果流。结果流中的数据类型不必与主要流中的数据类型相匹配，并且不同旁路输出的类型也可以不同。当你想要拆分数据流时，此操作非常有用，通常你必须复制数据流，然后从每个数据流中过滤掉你不想保留的数据。
 
-When using side outputs, you first need to define an `OutputTag` that will be 
used to identify a
-side output stream:
+使用旁路输出时，首先需要定义用于标识旁路输出流的 `OutputTag`：
 
 
 
 
 {% highlight java %}
-// this needs to be an anonymous inner class, so that we can analyze the type
+// 这需要是一个匿名的内部类，这样我们才能分析类型

Review comment:
   ```suggestion
   // 这需要是一个匿名的内部类，以便我们分析类型
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-18824) Support serialization for canal-json format

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-18824:
---

Assignee: CaoZhen

> Support serialization for canal-json format
> ---
>
> Key: FLINK-18824
> URL: https://issues.apache.org/jira/browse/FLINK-18824
> Project: Flink
>  Issue Type: Sub-task
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: CaoZhen
>Priority: Major
>
> Currently, canal-json format only support deserialization, but not support 
> serialization. This is not convenient for users to writing changelogs to an 
> message queue. 
> The serialization for canal-json can follow the json strcuture of Canal, but 
> considering currently Flink can't combine UPDATE_BEFORE and UPDATE_AFTER into 
> a single UPDATE message. We can encode UPDATE_BEFORE and UDPATE_AFTER as 
> DELETE and INSERT canal messages. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18835) sql using group by, duplicated group fileld appears

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174029#comment-17174029
 ] 

Jark Wu commented on FLINK-18835:
-

Usually, we insert the group by query into an upsert sink (e.g. MySQL, HBase), 
the upsert sink will help us to keep only the last record.
If you don't want to see the duplicate updates, you can use window aggregation 
which produces append-only result. 

> sql using group by, duplicated group fileld appears
> ---
>
> Key: FLINK-18835
> URL: https://issues.apache.org/jira/browse/FLINK-18835
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.1
>Reporter: YHF
>Priority: Critical
> Attachments: SumAnalysis.java
>
>
> datasource is kafka,then create a temporary view, group by (fieldA,fieldB) 
> using sql,
> then transform the result table to datastream using toRetractStream, then 
> print the result,
> I find duplicated (fieldA,fieldB)
> see attachment for code
> group by(scanType,scanSite,cmtInf),but result is below
> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})
> 3> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-18835) sql using group by, duplicated group fileld appears

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-18835.
---
Resolution: Not A Bug

> sql using group by, duplicated group fileld appears
> ---
>
> Key: FLINK-18835
> URL: https://issues.apache.org/jira/browse/FLINK-18835
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.1
>Reporter: YHF
>Priority: Critical
> Attachments: SumAnalysis.java
>
>
> datasource is kafka,then create a temporary view, group by (fieldA,fieldB) 
> using sql,
> then transform the result table to datastream using toRetractStream, then 
> print the result,
> I find duplicated (fieldA,fieldB)
> see attachment for code
> group by(scanType,scanSite,cmtInf),but result is below
> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})
> 3> (true,Otm\{, scanType=97, scanSite=14, cmtInf=24,jp=1.00, 
> db=0E-18, dbjp=1.00, pjWei=27.07, 
> dbWei=0E-18, mintime=2020-07-29 11:33:57.679, maxtime=2020-07-29 
> 11:33:57.679})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] wuchong merged pull request #13087: [hotfix][docs] Fix the link-tags in the 'Side Outputs' page of 'DataStream API'

2020-08-09 Thread GitBox



wuchong merged pull request #13087:
URL: https://github.com/apache/flink/pull/13087


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18848) table.to_pandas should handle retraction rows properly

2020-08-09 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-18848:

Description: Currently, the retraction rows are not handled properly and 
should be removed.

> table.to_pandas should handle retraction rows properly
> --
>
> Key: FLINK-18848
> URL: https://issues.apache.org/jira/browse/FLINK-18848
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
> Fix For: 1.12.0, 1.11.2
>
>
> Currently, the retraction rows are not handled properly and should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18848) table.to_pandas should handle retraction rows properly

2020-08-09 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-18848:

Summary: table.to_pandas should handle retraction rows properly  (was: 
table.to_pandas should throw exceptions if table is not append only )

> table.to_pandas should handle retraction rows properly
> --
>
> Key: FLINK-18848
> URL: https://issues.apache.org/jira/browse/FLINK-18848
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
> Fix For: 1.12.0, 1.11.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-18846) Set a meaningful operator name for the filesystem sink

2020-08-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-18846:
---

Assignee: zhuqi

> Set a meaningful operator name for the filesystem sink
> --
>
> Key: FLINK-18846
> URL: https://issues.apache.org/jira/browse/FLINK-18846
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: zhuqi
>Priority: Major
>  Labels: starter
>
> A simple SQL like 'insert into hive_parquet_table select ... from 
> some_kafka_table' will generates an additional operator called 'Sink: 
> Unnamed' with parallelism 1. The operator name can be improved to be a 
> meaningful name. 
> This is reported in the mailing list: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unexpected-unnamed-sink-in-SQL-job-td37185.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-16768) HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart hangs

2020-08-09 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174024#comment-17174024
 ] 

Dian Fu commented on FLINK-16768:
-

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5321=logs=3d12d40f-c62d-5ec4-6acc-0efe94cc3e89=e4f347ab-2a29-5d7c-3685-b0fcd2b6b051]

> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart hangs
> ---
>
> Key: FLINK-16768
> URL: https://issues.apache.org/jira/browse/FLINK-16768
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Tests
>Affects Versions: 1.10.0, 1.11.0, 1.12.0
>Reporter: Zhijiang
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> Logs: 
> [https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6584=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=d26b3528-38b0-53d2-05f7-37557c2405e4]
> {code:java}
> 2020-03-24T15:52:18.9196862Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd36c00b800 nid=0xc21 runnable [0x7fd3743ce000]
> 2020-03-24T15:52:18.9197235Zjava.lang.Thread.State: RUNNABLE
> 2020-03-24T15:52:18.9197536Z  at 
> java.net.SocketInputStream.socketRead0(Native Method)
> 2020-03-24T15:52:18.9197931Z  at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> 2020-03-24T15:52:18.9198340Z  at 
> java.net.SocketInputStream.read(SocketInputStream.java:171)
> 2020-03-24T15:52:18.9198749Z  at 
> java.net.SocketInputStream.read(SocketInputStream.java:141)
> 2020-03-24T15:52:18.9199171Z  at 
> sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> 2020-03-24T15:52:18.9199840Z  at 
> sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
> 2020-03-24T15:52:18.9200265Z  at 
> sun.security.ssl.InputRecord.read(InputRecord.java:532)
> 2020-03-24T15:52:18.9200663Z  at 
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
> 2020-03-24T15:52:18.9201213Z  - locked <0x927583d8> (a 
> java.lang.Object)
> 2020-03-24T15:52:18.9201589Z  at 
> sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
> 2020-03-24T15:52:18.9202026Z  at 
> sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
> 2020-03-24T15:52:18.9202583Z  - locked <0x92758c00> (a 
> sun.security.ssl.AppInputStream)
> 2020-03-24T15:52:18.9203029Z  at 
> org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
> 2020-03-24T15:52:18.9203558Z  at 
> org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
> 2020-03-24T15:52:18.9204121Z  at 
> org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
> 2020-03-24T15:52:18.9204626Z  at 
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
> 2020-03-24T15:52:18.9205121Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9205679Z  at 
> com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> 2020-03-24T15:52:18.9206164Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9206786Z  at 
> com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
> 2020-03-24T15:52:18.9207361Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9207839Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9208327Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9208809Z  at 
> com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> 2020-03-24T15:52:18.9209273Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9210003Z  at 
> com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
> 2020-03-24T15:52:18.9210658Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> 2020-03-24T15:52:18.9211154Z  at 
> org.apache.hadoop.fs.s3a.S3AInputStream.lambda$read$3(S3AInputStream.java:445)
> 2020-03-24T15:52:18.9211631Z  at 
> org.apache.hadoop.fs.s3a.S3AInputStream$$Lambda$42/1936375962.execute(Unknown 
> Source)
> 2020-03-24T15:52:18.9212044Z  at 
> org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
> 2020-03-24T15:52:18.9212553Z  at 
> org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
> 2020-03-24T15:52:18.9212972Z  at 
> org.apache.hadoop.fs.s3a.Invoker$$Lambda$23/1457226878.execute(Unknown Source)
> 2020-03-24T15:52:18.9213408Z  at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
> 2020-03-24T15:52:18.9213866Z  at 
>

[jira] [Commented] (FLINK-18845) class not found exception when i use sql client to try mysql as datasource.

2020-08-09 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174023#comment-17174023
 ] 

Jark Wu commented on FLINK-18845:
-

JDBC connector is not bundled in Flink distribution, you have to manually 
download the jar and put it under {{FLINK_HOME/lib}} or using {{--library}} to 
link the jar. See more the documentation: 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/jdbc.html#dependencies

> class not found exception when i use sql client to try mysql as datasource.
> ---
>
> Key: FLINK-18845
> URL: https://issues.apache.org/jira/browse/FLINK-18845
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.11.0
> Environment: sql-client startup cmd as following:
> ./bin/sql-client.sh embedded
>Reporter: HuiyuZhou
>Priority: Blocker
> Attachments: 
> flink-root-standalonesession-2-jqdev-l-01897.jqdev.shanghaigm.com.log
>
>
> Create table as following:
> USE CATALOG default_catalog;
> USE default_database;
> DROP TABLE IF EXISTS CarsOfFactory;
> CREATE TABLE CarsOfFactory (
>  TS STRING,
>  MANUFACTURE_PLANT STRING,
>  STAGE STRING,
>  CAR_NO BIGINT,
>  UPDATE_TIME TIMESTAMP,
>  PRIMARY KEY (TS,MANUFACTURE_PLANT,STAGE) NOT ENFORCED
> ) WITH (
>  'connector' = 'jdbc',
>  'url' = 'jdbc:mysql://',
>  'table-name' = 'CarsOfFactory',
>  'username' = '',
>  'password' = 'x'
> );
>  
> the sql client startup log as following:
> [^flink-root-standalonesession-2-jqdev-l-01897.jqdev.shanghaigm.com.log]
>  
> i also use arthas to check the class of JdbcRowDataInputFormat, it doesn't 
> exsit.
> [arthas@125257]$ getstatic 
> org.apache.flink.connector.jdbc.table.JdbcRowDataInputFormat *
> No class found for: 
> org.apache.flink.connector.jdbc.table.JdbcRowDataInputFormat
> Affect(row-cnt:0) cost in 29 ms.
>  
>  
> the detail error message in Apache Flink Dashboard as following:
> 2020-08-07 10:54:15
> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load 
> user class: org.apache.flink.connector.jdbc.table.JdbcRowDataInputFormat
> ClassLoader info: URL ClassLoader:
> file: 
> '/tmp/blobStore-20e8ab84-2215-4f5f-b5bb-e7672a43fb43/job_aad9c3d36483cff4d20cc2aba399b8c0/blob_p-0b002ebba0e49cbf5ac62789e6b4fb299b5ae235-8fe568bdcd98caf8fb09c58092083ef4'
>  (valid JAR)
> Class not resolvable through given classloader.
> at 
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:288)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:126)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:453)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:522)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.connector.jdbc.table.JdbcRowDataInputFormat
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:61)
> at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:65)
> at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
> at

[jira] [Reopened] (FLINK-17825) HA end-to-end gets killed due to timeout

2020-08-09 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reopened FLINK-17825:
-

> HA end-to-end gets killed due to timeout
> 
>
> Key: FLINK-17825
> URL: https://issues.apache.org/jira/browse/FLINK-17825
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
>
> CI (normal profile): 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1867=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> {code}
> 2020-05-19T20:46:50.9034002Z Killed TM @ 104061
> 2020-05-19T20:47:05.8510180Z Killed TM @ 107775
> 2020-05-19T20:47:55.1181475Z Killed TM @ 108337
> 2020-05-19T20:48:16.7907005Z Test (pid: 89099) did not finish after 540 
> seconds.
> 2020-05-19T20:48:16.790Z Printing Flink logs and killing it:
> [...]
> 2020-05-19T20:48:19.1016912Z 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh:
>  line 125: 89099 Terminated  ( cmdpid=$BASHPID; ( sleep 
> $TEST_TIMEOUT_SECONDS; echo "Test (pid: $cmdpid) did not finish after 
> $TEST_TIMEOUT_SECONDS seconds."; echo "Printing Flink logs and killing it:"; 
> cat ${FLINK_DIR}/log/*; kill "$cmdpid" ) & watchdog_pid=$!; echo 
> $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid; run_ha_test 4 
> ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} 
> ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} )
> 2020-05-19T20:48:19.1017985Z Stopping job timeout watchdog (with pid=89100)
> 2020-05-19T20:48:19.1018621Z 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh:
>  line 112: kill: (89100) - No such process
> 2020-05-19T20:48:19.1019000Z Killing JM watchdog @ 91127
> 2020-05-19T20:48:19.1019199Z Killing TM watchdog @ 91883
> 2020-05-19T20:48:19.1019424Z [FAIL] Test script contains errors.
> 2020-05-19T20:48:19.1019639Z Checking of logs skipped.
> 2020-05-19T20:48:19.1019785Z 
> 2020-05-19T20:48:19.1020329Z [FAIL] 'Running HA (rocks, non-incremental) 
> end-to-end test' failed after 9 minutes and 0 seconds! Test exited with exit 
> code 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17825) HA end-to-end gets killed due to timeout

2020-08-09 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174022#comment-17174022
 ] 

Dian Fu commented on FLINK-17825:
-

Another instance: 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5321=logs=6caf31d6-847a-526e-9624-468e053467d6=7d4f7375-52df-5ce0-457f-b2ffbb2289a4]

> HA end-to-end gets killed due to timeout
> 
>
> Key: FLINK-17825
> URL: https://issues.apache.org/jira/browse/FLINK-17825
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
>
> CI (normal profile): 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1867=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> {code}
> 2020-05-19T20:46:50.9034002Z Killed TM @ 104061
> 2020-05-19T20:47:05.8510180Z Killed TM @ 107775
> 2020-05-19T20:47:55.1181475Z Killed TM @ 108337
> 2020-05-19T20:48:16.7907005Z Test (pid: 89099) did not finish after 540 
> seconds.
> 2020-05-19T20:48:16.790Z Printing Flink logs and killing it:
> [...]
> 2020-05-19T20:48:19.1016912Z 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh:
>  line 125: 89099 Terminated  ( cmdpid=$BASHPID; ( sleep 
> $TEST_TIMEOUT_SECONDS; echo "Test (pid: $cmdpid) did not finish after 
> $TEST_TIMEOUT_SECONDS seconds."; echo "Printing Flink logs and killing it:"; 
> cat ${FLINK_DIR}/log/*; kill "$cmdpid" ) & watchdog_pid=$!; echo 
> $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid; run_ha_test 4 
> ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} 
> ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} )
> 2020-05-19T20:48:19.1017985Z Stopping job timeout watchdog (with pid=89100)
> 2020-05-19T20:48:19.1018621Z 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh:
>  line 112: kill: (89100) - No such process
> 2020-05-19T20:48:19.1019000Z Killing JM watchdog @ 91127
> 2020-05-19T20:48:19.1019199Z Killing TM watchdog @ 91883
> 2020-05-19T20:48:19.1019424Z [FAIL] Test script contains errors.
> 2020-05-19T20:48:19.1019639Z Checking of logs skipped.
> 2020-05-19T20:48:19.1019785Z 
> 2020-05-19T20:48:19.1020329Z [FAIL] 'Running HA (rocks, non-incremental) 
> end-to-end test' failed after 9 minutes and 0 seconds! Test exited with exit 
> code 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17274) Maven: Premature end of Content-Length delimited message body

2020-08-09 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174020#comment-17174020
 ] 

Dian Fu commented on FLINK-17274:
-

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5320=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=0dbaca5d-7c38-52e6-f4fe-2fb69ccb3ada]

> Maven: Premature end of Content-Length delimited message body
> -
>
> Key: FLINK-17274
> URL: https://issues.apache.org/jira/browse/FLINK-17274
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> CI: 
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7786=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb
> {code}
> [ERROR] Failed to execute goal on project 
> flink-connector-elasticsearch7_2.11: Could not resolve dependencies for 
> project 
> org.apache.flink:flink-connector-elasticsearch7_2.11:jar:1.11-SNAPSHOT: Could 
> not transfer artifact org.apache.lucene:lucene-sandbox:jar:8.3.0 from/to 
> alicloud-mvn-mirror 
> (http://mavenmirror.alicloud.dak8s.net:/repository/maven-central/): GET 
> request of: org/apache/lucene/lucene-sandbox/8.3.0/lucene-sandbox-8.3.0.jar 
> from alicloud-mvn-mirror failed: Premature end of Content-Length delimited 
> message body (expected: 289920; received: 239832 -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] xiaolong-sn closed pull request #13005: Feature/flink 18661 efo de registration

2020-08-09 Thread GitBox



xiaolong-sn closed pull request #13005:
URL: https://github.com/apache/flink/pull/13005


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18862) sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功

2020-08-09 Thread Leonard Xu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174015#comment-17174015
 ] 

Leonard Xu commented on FLINK-18862:


Hi, [~YUJIANBO] 

Thanks for creating this issue. But could you using English to describe this 
issue? It's recommended using English in community.

 

> sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功
> ---
>
> Key: FLINK-18862
> URL: https://issues.apache.org/jira/browse/FLINK-18862
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
>Reporter: YUJIANBO
>Priority: Major
> Attachments: sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功.txt
>
>
> 1、环境：flinksql、 版本是1.11.1，perjob模式
> 2、报错：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
> 3、产生的背景：
> (1)从kafka建一张表：
> {code:java}
> CREATE TABLE kafka(
> x String,
> y String
> )with(
>'connector' = 'kafka',
> ..
> )
> {code}
> (2)建一张view表：
> {code:java}
>CREATE VIEW view1 AS
>SELECT 
>x, 
>y, 
>CAST(COUNT(1) AS VARCHAR) AS ct
>FROM kafka
>GROUP BY 
>x, y
> {code}
> (3)然后利用这个view再做一次agg操作：
> {code:java}
> select 
>  x, 
>  LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
> FROM view1
> GROUP BY x
> {code}
>然后就报错了：org.apache.flink.table.data.binary.BinaryRawValueData cannot be 
> cast to org.apache.flink.table.data.StringData
>
> (4)但是我尝试不做agg操作，没有报错，说明count(1)的结果值是可以转成String。
> {code:java}
> select 
>   x, 
>   CONCAT_WS('=', y, ct)
> from view1
> {code}
> *但是再经过一次agg的操作为什么会报错呢？*
> 4、问题：通过上面的对比说明count(1) 是能够被cast 成string的，可是再经过一次agg的操作怎么就不行了，
>  请问有什么比较好的办法解决这个问题？
> 5、稍微详细点的报错：
> {code:java}
> java.lang.ClassCastException: 
> org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
> org.apache.flink.table.data.StringData
>   at 
> org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
> ~[flink-table-blink_2.11-1.11.1.jar:?]
>   at org.apache.flink.table.data.RowData.get(RowData.java:273) 
> ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
>  ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
>  ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
>   at 
>

[jira] [Assigned] (FLINK-18766) Support add_sink() for Python DataStream API

2020-08-09 Thread Hequn Cheng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng reassigned FLINK-18766:
---

Assignee: Shuiqiang Chen

> Support add_sink() for Python DataStream API
> 
>
> Key: FLINK-18766
> URL: https://issues.apache.org/jira/browse/FLINK-18766
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Hequn Cheng
>Assignee: Shuiqiang Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

2020-08-09 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173885#comment-17173885
 ] 

Zhu Zhu edited comment on FLINK-16069 at 8/10/20, 1:44 AM:
---

Hi Yumeng, I think we have not reached a consensus. I once did a PoC of the 
idea that "cache generated ShuffleDescriptors for ALL-to-ALL edges for reuse". 
But there were no further progress due to some other prioritized work.
I'd like to understand if this has become a blocking problem to your industrial 
practice. If it is, we can resume this discussion and prioritize this 
improvement.


was (Author: zhuzh):
Hi Yumeng, I think we have not reached a consensus. I once did a PoC of the 
idea that "cache generated ShuffleDescriptors for ALL-to-ALL edges for reuse". 
But there were no further progress due to some other prioritized work.
I'd like to understand if this has become a serious and urgent problem for you. 
If it is, we can resume this discussion.

> Creation of TaskDeploymentDescriptor can block main thread for long time
> 
>
> Key: FLINK-16069
> URL: https://issues.apache.org/jira/browse/FLINK-16069
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: huweihua
>Priority: Major
>
> The deploy of tasks will take long time when we submit a high parallelism 
> job. And Execution#deploy run in mainThread, so it will block JobMaster 
> process other akka messages, such as Heartbeat. The creation of 
> TaskDeploymentDescriptor take most of time. We can put the creation in future.
> For example, A job [source(8000)->sink(8000)], the total 16000 tasks from 
> SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of 
> TaskManager timeout and job never success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18862) sql 中agg算子的 count(1) 的结果强转成String类型再经过一次agg的操作就不能成功

2020-08-09 Thread YUJIANBO (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YUJIANBO updated FLINK-18862:
-
Description: 
1、环境：flinksql、 版本是1.11.1，perjob模式
2、报错：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
org.apache.flink.table.data.StringData

3、产生的背景：
(1)从kafka建一张表：
{code:java}
CREATE TABLE kafka(
x String,
y String
)with(
   'connector' = 'kafka',
..
)
{code}


(2)建一张view表：
{code:java}
   CREATE VIEW view1 AS
   SELECT 
   x, 
   y, 
   CAST(COUNT(1) AS VARCHAR) AS ct
   FROM kafka
   GROUP BY 
   x, y
{code}


(3)然后利用这个view再做一次agg操作：
{code:java}
select 
 x, 
 LISTAGG(CONCAT_WS('=', y, ct), ',') AS lists
FROM view1
GROUP BY x
{code}
   然后就报错了：org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast 
to org.apache.flink.table.data.StringData

   
(4)但是我尝试不做agg操作，没有报错，说明count(1)的结果值是可以转成String。
{code:java}
select 
x, 
CONCAT_WS('=', y, ct)
from view1
{code}
*但是再经过一次agg的操作为什么会报错呢？*

4、问题：通过上面的对比说明count(1) 是能够被cast 成string的，可是再经过一次agg的操作怎么就不行了，
 请问有什么比较好的办法解决这个问题？


5、稍微详细点的报错：
{code:java}
java.lang.ClassCastException: 
org.apache.flink.table.data.binary.BinaryRawValueData cannot be cast to 
org.apache.flink.table.data.StringData
at 
org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) 
~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.table.data.JoinedRowData.getString(JoinedRowData.java:139) 
~[flink-table-blink_2.11-1.11.1.jar:?]
at org.apache.flink.table.data.RowData.get(RowData.java:273) 
~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:156)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:123)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:205)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
 ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
at 
org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558)
 ~[ad_features_auto-1.0-SNAPSHOT.jar:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530)

[jira] [Comment Edited] (FLINK-11779) CLI ignores -m parameter if high-availability is ZOOKEEPER

2020-08-09 Thread Guowei Ma (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173995#comment-17173995
 ] 

Guowei Ma edited comment on FLINK-11779 at 8/10/20, 12:47 AM:
--

# What I understand is that the rest server should publish its address to the 
HighAvailabilityService. The rest client should always retrieve the rest server 
address from a HighAvailabilityService. 
 # In the default mode the -m option means two things
 ## User wants to use `StandaloneHaServices`  
 ## Set the value for the `jobmanager.rpc.address`
 #  In the Generic CLI mode there is no -m option. It does not use the 
`AbstractCustomCommandLine`. So in this mode client and server should always 
respect the `HighAvailabilityServcie` in the config. I agree with [~aljoscha] 
that maybe we should not expose the `jobmanager.rpc.address` to the client any 
more.  But this might introduce some incompatible problem.


was (Author: maguowei):
# What I understand is that the rest server should publish its address to the 
HighAvailabilityService. The rest client should always retrieve the rest server 
address from a HighAvailabilityService. 
 # In the default mode the -m option means two things
 ## User wants to use `StandaloneHaServices`  
 ## Set the value for the `jobmanager.rpc.address`
 #  In the Generic CLI mode there is no -m option. It does not use the 
`AbstractCustomCommandLine`. So in this mode client and server should always 
respect the `HighAvailabilityServcie` in the config. I agree with [~aljoscha] 
that maybe we should not expose the `jobmanager.rpc.address` to the client any 
more.  

> CLI ignores -m parameter if high-availability is ZOOKEEPER 
> ---
>
> Key: FLINK-11779
> URL: https://issues.apache.org/jira/browse/FLINK-11779
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Gary Yao
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> The CLI will ignores the host/port provided by the {{-m}} parameter if 
> {{high-availability: ZOOKEEPER}} is configured in {{flink-conf.yaml}}
> *Expected behavior*
> * TBD: either document this behavior or give precedence to {{-m}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-11779) CLI ignores -m parameter if high-availability is ZOOKEEPER

2020-08-09 Thread Guowei Ma (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173995#comment-17173995
 ] 

Guowei Ma edited comment on FLINK-11779 at 8/10/20, 12:45 AM:
--

# What I understand is that the rest server should publish its address to the 
HighAvailabilityService. The rest client should always retrieve the rest server 
address from a HighAvailabilityService. 
 # In the default mode the -m option means two things
 ## User wants to use `StandaloneHaServices`  
 ## Set the value for the `jobmanager.rpc.address`
 #  In the Generic CLI mode there is no -m option. It does not use the 
`AbstractCustomCommandLine`. So in this mode client and server should always 
respect the `HighAvailabilityServcie` in the config. I agree with [~aljoscha] 
that maybe we should not expose the `jobmanager.rpc.address` to the client any 
more.  


was (Author: maguowei):
# What I understand is that the rest server should publish its address to the 
HighAvailabilityService. The rest client should always retrieve the rest server 
address from a HighAvailabilityService. 
 # In the default mode the -m option means two things
 # User wants to use `StandaloneHaServices`  
 # Set the value for the `jobmanager.rpc.address`


 # In the Generic CLI mode there is no -m option. It does not use the 
`AbstractCustomCommandLine`. So in this mode client and server should always 
respect the `HighAvailabilityServcie` in the config. I agree with [~aljoscha] 
that maybe we should not expose the `jobmanager.rpc.address` to the client any 
more.  

> CLI ignores -m parameter if high-availability is ZOOKEEPER 
> ---
>
> Key: FLINK-11779
> URL: https://issues.apache.org/jira/browse/FLINK-11779
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Gary Yao
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> The CLI will ignores the host/port provided by the {{-m}} parameter if 
> {{high-availability: ZOOKEEPER}} is configured in {{flink-conf.yaml}}
> *Expected behavior*
> * TBD: either document this behavior or give precedence to {{-m}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-11779) CLI ignores -m parameter if high-availability is ZOOKEEPER

2020-08-09 Thread Guowei Ma (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173995#comment-17173995
 ] 

Guowei Ma commented on FLINK-11779:
---

# What I understand is that the rest server should publish its address to the 
HighAvailabilityService. The rest client should always retrieve the rest server 
address from a HighAvailabilityService. 
 # In the default mode the -m option means two things
 # User wants to use `StandaloneHaServices`  
 # Set the value for the `jobmanager.rpc.address`


 # In the Generic CLI mode there is no -m option. It does not use the 
`AbstractCustomCommandLine`. So in this mode client and server should always 
respect the `HighAvailabilityServcie` in the config. I agree with [~aljoscha] 
that maybe we should not expose the `jobmanager.rpc.address` to the client any 
more.  

> CLI ignores -m parameter if high-availability is ZOOKEEPER 
> ---
>
> Key: FLINK-11779
> URL: https://issues.apache.org/jira/browse/FLINK-11779
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Gary Yao
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> The CLI will ignores the host/port provided by the {{-m}} parameter if 
> {{high-availability: ZOOKEEPER}} is configured in {{flink-conf.yaml}}
> *Expected behavior*
> * TBD: either document this behavior or give precedence to {{-m}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 167 matches

Mail list logo