date:20190429

[jira] [Updated] (FLINK-12371) Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating optimized logical plan

2019-04-29 Thread godfrey he (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-12371:
---
Description: 
This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to 
anti-join.

In Calcite, 
[SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java]
 only represents semi-join, (could not represent anti-join) and requires equi 
join condition. Queries like `select * from left where left.a1 in (select 
right.a2 from right where left.b1 > right.b2)` and `select * from left where 
not exists (select * from right)` could not be converted to Calcite SemiJoin 
operator.

To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and 
make the following changes:
1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to 
support non-equi join condition) 
2. add {{isAnti}} field attribute to represent anti-join.

Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, so 
we need a whole new rule (named {{FlinkSubQueryRemoveRule}}) to meet our 
requirement.

  was:
This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to 
anti-join.

In Calcite, 
[SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java]
 only represents semi-join, (could not represent anti-join) and requires equi 
join condition. Queries like `select * from left where left.a1 in (select 
right.a2 from right where left.b1 > right.b2)` and `select * from left where 
not exists (select * from right)` could not be converted to Calcite SemiJoin 
operator.

To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and 
make the following changes:
1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to 
support non-equi join condition) 
2. add {{isAnti}} field attribute to represent anti-join.

Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, so 
we need a whole new set of rules to meet our requirement.
1. {{FlinkSubQueryRemoveRule}}, a planner rule that converts IN and EXISTS into 
semi-join, converts NOT IN and NOT EXISTS into anti-join
2. {{FlinkRewriteSubQueryRule}}, a planner rule that converts filter condition 
like: {{(select count(*) from T) > 0}} to {{exists(select * from T)}}


> Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating 
> optimized logical plan
> 
>
> Key: FLINK-12371
> URL: https://issues.apache.org/jira/browse/FLINK-12371
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>
> This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to 
> anti-join.
> In Calcite, 
> [SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java]
>  only represents semi-join, (could not represent anti-join) and requires equi 
> join condition. Queries like `select * from left where left.a1 in (select 
> right.a2 from right where left.b1 > right.b2)` and `select * from left where 
> not exists (select * from right)` could not be converted to Calcite SemiJoin 
> operator.
> To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and 
> make the following changes:
> 1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to 
> support non-equi join condition) 
> 2. add {{isAnti}} field attribute to represent anti-join.
> Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, 
> so we need a whole new rule (named {{FlinkSubQueryRemoveRule}}) to meet our 
> requirement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12371) Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating optimized logical plan

2019-04-29 Thread godfrey he (JIRA)

godfrey he created FLINK-12371:
--

 Summary: Add support for converting (NOT) IN/ (NOT) EXISTS to 
SemiJoin, and generating optimized logical plan
 Key: FLINK-12371
 URL: https://issues.apache.org/jira/browse/FLINK-12371
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to 
anti-join.

In Calcite, 
[SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java]
 only represents semi-join, (could not represent anti-join) and requires equi 
join condition. Queries like `select * from left where left.a1 in (select 
right.a2 from right where left.b1 > right.b2)` and `select * from left where 
not exists (select * from right)` could not be converted to Calcite SemiJoin 
operator.

To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and 
make the following changes:
1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to 
support non-equi join condition) 
2. add {{isAnti}} field attribute to represent anti-join.

Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, so 
we need a whole new set of rules to meet our requirement.
1. {{FlinkSubQueryRemoveRule}}, a planner rule that converts IN and EXISTS into 
semi-join, converts NOT IN and NOT EXISTS into anti-join
2. {{FlinkRewriteSubQueryRule}}, a planner rule that converts filter condition 
like: {{(select count(*) from T) > 0}} to {{exists(select * from T)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] zjffdu commented on issue #8144: [FLINK-12159]. Enable YarnMiniCluster integration test under non-secure mode

2019-04-29 Thread GitBox

zjffdu commented on issue #8144: [FLINK-12159]. Enable YarnMiniCluster 
integration test under non-secure mode
URL: https://github.com/apache/flink/pull/8144#issuecomment-487822515
 
 
   @tillrohrmann  I could do other refactoring/improvement in another followup 
pr. Thanks for the review. Do you have any other suggestion for the current PR ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Assigned] (FLINK-12370) Integrated Travis for Python Table API

2019-04-29 Thread Wei Zhong (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhong reassigned FLINK-12370:
-

Assignee: Wei Zhong

> Integrated Travis for Python Table API
> --
>
> Key: FLINK-12370
> URL: https://issues.apache.org/jira/browse/FLINK-12370
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Affects Versions: 1.9.0
>Reporter: sunjincheng
>Assignee: Wei Zhong
>Priority: Major
>
> Integrated Travis for Python Table API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] sunjincheng121 commented on issue #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on issue #8267: [FLINK-12311][table][python] Add base 
python framework and Add Scan, Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#issuecomment-487818672
 
 
   I have created the JIRA for Integrated Travis for Python Table API. 
https://issues.apache.org/jira/browse/FLINK-12370


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-12370) Integrated Travis for Python Table API

2019-04-29 Thread sunjincheng (JIRA)

sunjincheng created FLINK-12370:
---

 Summary: Integrated Travis for Python Table API
 Key: FLINK-12370
 URL: https://issues.apache.org/jira/browse/FLINK-12370
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.9.0
Reporter: sunjincheng


Integrated Travis for Python Table API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279607459
 
 

 ##
 File path: flink-python/pyflink/table/table_environment.py
 ##
 @@ -0,0 +1,160 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from abc import ABCMeta
+
+from pyflink.java_gateway import get_gateway
+from pyflink.table import Table
+from pyflink.util import type_utils, utils
+
+__all__ = [
+'BatchTableEnvironment',
+'StreamTableEnvironment',
+'TableEnvironment'
+]
+
+
+class TableEnvironment(object):
+"""
+The abstract base class for batch and stream TableEnvironments.
+"""
+
+__metaclass__ = ABCMeta
+
+def __init__(self, j_tenv):
+self._j_tenv = j_tenv
+
+def from_table_source(self, table_source):
+"""
+Creates a table from a table source.
+
+:param table_source: table source used as table
+:return: result table
+"""
+return 
Table(self._j_tenv.fromTableSource(table_source._j_table_source))
+
+def register_table(self, name, table):
+"""
+Registers a :class:`Table` under a unique name in the 
TableEnvironment's catalog.
+Registered tables can be referenced in SQL queries.
+
+:param name: The name under which the table will be registered.
+:param table: The table to register.
+"""
+self._j_tenv.registerTable(name, table._java_table)
+
+def register_table_source(self, name, table_source):
+"""
+Registers an external :class:`TableSource` in this 
:class:`TableEnvironment`'s catalog.
+Registered tables can be referenced in SQL queries.
+
+:param name: The name under which the :class:`TableSource` is 
registered.
+:param table_source: The :class:`TableSource` to register.
+"""
+self._j_tenv.registerTableSource(name, table_source._j_table_source)
+
+def register_table_sink(self, name, field_names, field_types, table_sink):
+"""
+Registers an external :class:`TableSink` with given field names and 
types in this
+:class:`TableEnvironment`\ 's catalog.
 
 Review comment:
   `TableEnvironment`\ 's catalog.` and `TableEnvironment`'s ->  which pattern 
is you want? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279584911
 
 

 ##
 File path: docs/dev/table/tableApi.md
 ##
 @@ -297,6 +297,81 @@ val result = orders.where('b === "red")
 
   
 
+
+
+
+
+  
+
+  Operators
+  Description
+
+  
+  
+   
+   
+Scan
+Batch Streaming
+  
+   
+Similar to the FROM clause in a SQL query. Performs a scan of a 
registered table.
+{% highlight python %}
+orders = table_env.scan("Orders");
+{% endhighlight %}
+  
+   
+
+  
+Select
+Batch Streaming
+  
+  
+Similar to a SQL SELECT statement. Performs a select operation.
+{% highlight python %}
+orders = tableEnv.scan("Orders");
 
 Review comment:
   `tableEnv` ->`table_env`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279600310
 
 

 ##
 File path: flink-python/README.md
 ##
 @@ -0,0 +1,22 @@
+# Apache Flink Python API
+
+Apache Flink is an open source stream processing framework with powerful 
stream- and batch-processing capabilities.
+
+Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/)
+
+This packaging is currently a very initial version and will change in future 
versions.
+
+## Installation
+
+In order to use PyFlink, you need to install Flink on your device and set the 
value of the environment variable FLINK_HOME to the root directory of Flink.
+Then enter the directory where this README.md file is located and execute 
`python setup.py install` to install PyFlink on your device.
+
+## Running Tests
+
+Currently you can perform an end-to-end test of PyFlink in the directory where 
this file is located with the following command:
+
+PYTHONPATH=./ python ./pyflink/table/tests/test_end_to_end.py
 
 Review comment:
   Add PYTHONPATH itself, if user has already set it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279585800
 
 

 ##
 File path: docs/dev/table/tableApi.zh.md
 ##
 @@ -297,6 +297,81 @@ val result = orders.where('b === "red")
 
   
 
+
+
+
+
+  
+
+  Operators
+  Description
+
+  
+  
+   
+   
+Scan
+Batch Streaming
+  
+   
+Similar to the FROM clause in a SQL query. Performs a scan of a 
registered table.
+{% highlight python %}
+orders = table_env.scan("Orders");
+{% endhighlight %}
+  
+   
+
+  
+Select
+Batch Streaming
+  
+  
+Similar to a SQL SELECT statement. Performs a select operation.
+{% highlight python %}
+orders = tableEnv.scan("Orders");
 
 Review comment:
   `tableEnv` -> `table_env`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279590864
 
 

 ##
 File path: flink-python/README.md
 ##
 @@ -0,0 +1,22 @@
+# Apache Flink Python API
+
+Apache Flink is an open source stream processing framework with powerful 
stream- and batch-processing capabilities.
 
 Review comment:
   `powerful ` -> `the powerful `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279586605
 
 

 ##
 File path: flink-dist/src/main/assemblies/opt.xml
 ##
 @@ -177,5 +177,20 @@

flink-streaming-python_${scala.binary.version}-${project.version}.jar
0644

+
+   
 
 Review comment:
   `Batch Python API, flink-libraries/flink-python will be replaced by 
flink-python in the future`
   `Streaming Python API, flink-libraries/flink-python, will be replaced by 
flink-python in the future`
   
   `Python Table API` ->`Python API`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279603205
 
 

 ##
 File path: flink-python/pyflink/table/table.py
 ##
 @@ -0,0 +1,117 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from py4j.java_gateway import get_method
+
+__all__ = ['Table']
+
+
+class Table(object):
+
+"""
+A :class:`Table` is the core component of the Table API.
+Similar to how the batch and streaming APIs have DataSet and DataStream,
+the Table API is built around :class:`Table`.
+
+Use the methods of :class:`Table` to transform data.
+
+Example:
+::
+>>> t_config = 
TableConfig.Builder().as_streaming_execution().set_parallelism(1).build()
+>>> t_env = TableEnvironment.get_table_environment(t_config)
+>>> ...
+>>> t = t_env.scan("source")
+>>> t.select(...)
+...
+>>> t.insert_into("print")
+>>> t_env.execute()
+
+Operations such as :func:`~pyflink.table.Table.join`, 
:func:`~pyflink.table.Table.select`,
+:func:`~pyflink.table.Table.where` and 
:func:`~pyflink.table.Table.group_by`
+take arguments in an expression string. Please refer to the documentation 
for
+the expression syntax.
+"""
+
+def __init__(self, j_table):
+self._j_table = j_table
+
+def select(self, fields):
+"""
+Performs a selection operation. Similar to a SQL SELECT statement. The 
field expressions
+can contain complex expressions.
+
+Example:
+::
+>>> t = tab.select("key, value + 'hello'")
 
 Review comment:
   `t = tab.select("key, value + 'hello'")` -> `tab.select("key, value + 
'hello'")` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279589827
 
 

 ##
 File path: flink-python/pom.xml
 ##
 @@ -0,0 +1,201 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+   4.0.0
+
+   
+   flink-parent
+   org.apache.flink
+   1.9-SNAPSHOT
+   ..
+   
+
+   flink-python
+   flink-python
+
+   jar
+
+   
+   
+   net.sf.py4j
+   py4j
+   0.10.8.1
+   
+   
+
+   
+   
+   
+   org.apache.maven.plugins
+   maven-enforcer-plugin
+   
+   
+   dependency-convergence
+   
+   enforce
+   
+   
+   true
+   
+   
+   
+   
+
+   
+   
+   com.github.siom79.japicmp
+   japicmp-maven-plugin
+   
+
+   
+   org.apache.maven.plugins
+   maven-surefire-plugin
+   
+   
+   
false
+   
+   
+
+   
+   
+   org.apache.maven.plugins
+   maven-eclipse-plugin
+   2.8
+   
+   true
+   
+   
org.scala-ide.sdt.core.scalanature
+   
org.eclipse.jdt.core.javanature
+   
+   
+   
org.scala-ide.sdt.core.scalabuilder
+   
+   
+   
org.scala-ide.sdt.launching.SCALA_CONTAINER
+   
+   
org.eclipse.jdt.launching.JRE_CONTAINER
+   
+   
+   
+   
org.scala-lang:scala-library
+   
org.scala-lang:scala-compiler
+   
+   
+   
**/*.scala
 
 Review comment:
   Do we need scala in this project?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279590263
 
 

 ##
 File path: docs/dev/table/tableApi.zh.md
 ##
 @@ -297,6 +297,81 @@ val result = orders.where('b === "red")
 
   
 
+
+
+
+
+  
+
+  Operators
+  Description
+
+  
+  
+   
+   
+Scan
+Batch Streaming
+  
+   
+Similar to the FROM clause in a SQL query. Performs a scan of a 
registered table.
 
 Review comment:
   I think we can use Chinese for zh version， What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279601910
 
 

 ##
 File path: flink-dist/src/main/flink-bin/bin/pyflink2.sh
 ##
 @@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+# =
+bin=`dirname "$0"`
 
 Review comment:
   I think the responsibility of this script is launcher the 
PythonGetwayServer, so I suggest renaming it to `pyflink-getway-server.sh` what 
do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279605826
 
 

 ##
 File path: flink-python/pyflink/table/table_environment.py
 ##
 @@ -0,0 +1,160 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from abc import ABCMeta
+
+from pyflink.java_gateway import get_gateway
+from pyflink.table import Table
+from pyflink.util import type_utils, utils
+
+__all__ = [
+'BatchTableEnvironment',
+'StreamTableEnvironment',
+'TableEnvironment'
+]
+
+
+class TableEnvironment(object):
+"""
+The abstract base class for batch and stream TableEnvironments.
+"""
+
+__metaclass__ = ABCMeta
+
+def __init__(self, j_tenv):
+self._j_tenv = j_tenv
+
+def from_table_source(self, table_source):
+"""
+Creates a table from a table source.
+
+:param table_source: table source used as table
+:return: result table
+"""
 
 Review comment:
   `param table_source: table source used as table` -> `param table_source: 
Table source used as table`
   ` :return: result table`  -> `:return: Result table`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279589515
 
 

 ##
 File path: flink-python/README.md
 ##
 @@ -0,0 +1,22 @@
+# Apache Flink Python API
+
+Apache Flink is an open source stream processing framework with powerful 
stream- and batch-processing capabilities.
+
+Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/)
+
+This packaging is currently a very initial version and will change in future 
versions.
+
+## Installation
+
+In order to use PyFlink, you need to install Flink on your device and set the 
value of the environment variable FLINK_HOME to the root directory of Flink.
+Then enter the directory where this README.md file is located and execute 
`python setup.py install` to install PyFlink on your device.
+
+## Running Tests
+
+Currently you can perform an end-to-end test of PyFlink in the directory where 
this file is located with the following command:
+
+PYTHONPATH=./ python ./pyflink/table/tests/test_end_to_end.py
+
+## Python Requirements
+
+PyFlink currently depends on `Py4J 0.10.8.1`.
 
 Review comment:
   PyFlink depends on Py4J (currently version 0.10.8.1) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279600480
 
 

 ##
 File path: flink-python/pyflink/find_flink_home.py
 ##
 @@ -0,0 +1,44 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+import os
+import sys
+
+
+def _find_flink_home():
+"""
+Find the FLINK_HOME.
 
 Review comment:
   Please echo the value of  FLINK_HOME. is that make sense toyou?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279600010
 
 

 ##
 File path: flink-python/README.md
 ##
 @@ -0,0 +1,22 @@
+# Apache Flink Python API
+
+Apache Flink is an open source stream processing framework with powerful 
stream- and batch-processing capabilities.
+
+Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/)
+
+This packaging is currently a very initial version and will change in future 
versions.
+
+## Installation
+
+In order to use PyFlink, you need to install Flink on your device and set the 
value of the environment variable FLINK_HOME to the root directory of Flink.
 
 Review comment:
   We  need to add how to set FLINK_HOME.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279591670
 
 

 ##
 File path: flink-python/README.md
 ##
 @@ -0,0 +1,22 @@
+# Apache Flink Python API
+
+Apache Flink is an open source stream processing framework with powerful 
stream- and batch-processing capabilities.
 
 Review comment:
   I think it's better to introduce the  Python API, not the Flink.  And we can 
link the doc for the python api, for the first version only support table api, 
so ,we can link to : 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html 
   What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279594338
 
 

 ##
 File path: flink-python/pyflink/java_gateway.py
 ##
 @@ -0,0 +1,106 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import shutil
+import signal
+import struct
+import tempfile
+import time
+from subprocess import Popen, PIPE
+from threading import RLock
+
+from py4j.java_gateway import java_import, JavaGateway, GatewayParameters
+from pyflink.find_flink_home import _find_flink_home
+
+
+_gateway = None
+_lock = RLock()
+
+
+def get_gateway():
+# type: () -> JavaGateway
+global _gateway
+global _lock
+with _lock:
+if _gateway is None:
+# if Java Gateway is already running
+if 'PYFLINK_GATEWAY_PORT' in os.environ:
+gateway_port = int(os.environ['PYFLINK_GATEWAY_PORT'])
+gateway_param = GatewayParameters(port=gateway_port, 
auto_convert=True)
+_gateway = JavaGateway(gateway_parameters=gateway_param)
+else:
+_gateway = launch_gateway()
+return _gateway
+
+
+def launch_gateway():
+# type: () -> JavaGateway
+"""
+launch jvm gateway
+"""
+
+FLINK_HOME = _find_flink_home()
+# TODO windows support
 
 Review comment:
   Please check the system type and exit 1 with the error message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279603054
 
 

 ##
 File path: flink-python/pyflink/table/table.py
 ##
 @@ -0,0 +1,117 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from py4j.java_gateway import get_method
+
+__all__ = ['Table']
+
+
+class Table(object):
+
+"""
+A :class:`Table` is the core component of the Table API.
+Similar to how the batch and streaming APIs have DataSet and DataStream,
+the Table API is built around :class:`Table`.
+
+Use the methods of :class:`Table` to transform data.
+
+Example:
+::
+>>> t_config = 
TableConfig.Builder().as_streaming_execution().set_parallelism(1).build()
+>>> t_env = TableEnvironment.get_table_environment(t_config)
+>>> ...
+>>> t = t_env.scan("source")
+>>> t.select(...)
+...
+>>> t.insert_into("print")
 
 Review comment:
   It's better to add ` t_env.register_table_sink("result",...)` and 
`t.insert_into("result")


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279594143
 
 

 ##
 File path: .gitignore
 ##
 @@ -14,7 +14,9 @@ tmp
 *.iml
 *.swp
 *.jar
+*.zip
 *.log
+*.pyc
 
 Review comment:
   We also need to add the following ignores:
   ```
   flink-python/build/
   flink-python/dist/
   flink-python/pyflink.egg-info/
   ```
   What to you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279607459
 
 

 ##
 File path: flink-python/pyflink/table/table_environment.py
 ##
 @@ -0,0 +1,160 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from abc import ABCMeta
+
+from pyflink.java_gateway import get_gateway
+from pyflink.table import Table
+from pyflink.util import type_utils, utils
+
+__all__ = [
+'BatchTableEnvironment',
+'StreamTableEnvironment',
+'TableEnvironment'
+]
+
+
+class TableEnvironment(object):
+"""
+The abstract base class for batch and stream TableEnvironments.
+"""
+
+__metaclass__ = ABCMeta
+
+def __init__(self, j_tenv):
+self._j_tenv = j_tenv
+
+def from_table_source(self, table_source):
+"""
+Creates a table from a table source.
+
+:param table_source: table source used as table
+:return: result table
+"""
+return 
Table(self._j_tenv.fromTableSource(table_source._j_table_source))
+
+def register_table(self, name, table):
+"""
+Registers a :class:`Table` under a unique name in the 
TableEnvironment's catalog.
+Registered tables can be referenced in SQL queries.
+
+:param name: The name under which the table will be registered.
+:param table: The table to register.
+"""
+self._j_tenv.registerTable(name, table._java_table)
+
+def register_table_source(self, name, table_source):
+"""
+Registers an external :class:`TableSource` in this 
:class:`TableEnvironment`'s catalog.
+Registered tables can be referenced in SQL queries.
+
+:param name: The name under which the :class:`TableSource` is 
registered.
+:param table_source: The :class:`TableSource` to register.
+"""
+self._j_tenv.registerTableSource(name, table_source._j_table_source)
+
+def register_table_sink(self, name, field_names, field_types, table_sink):
+"""
+Registers an external :class:`TableSink` with given field names and 
types in this
+:class:`TableEnvironment`\ 's catalog.
 
 Review comment:
   `TableEnvironment`\ 's catalog. -> `TableEnvironment`\'s catalog. ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279588033
 
 

 ##
 File path: flink-dist/src/main/flink-bin/bin/pyflink2.sh
 ##
 @@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+# =
+bin=`dirname "$0"`
+bin=`cd "$bin"; pwd`
+
+. "$bin"/config.sh
+
+FLINK_CLASSPATH=`constructFlinkClassPath`
+
+ARGS=()
+
+while [[ $# -gt 0 ]]
+do
+key="$1"
+case $key in
+-c|--class)
+DRIVER=$2
+shift
+shift
+;;
+*)
+   ARGS+=("$1")
+   shift
+   ;;
+esac
+done
+
 
 Review comment:
   I think it's better to add `log_setting`, similar to `flink` shell? 
https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/flink


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279592971
 
 

 ##
 File path: flink-python/setup.py
 ##
 @@ -0,0 +1,63 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+
+import os
+import sys
+from setuptools import setup
+
+if sys.version_info < (2, 7):
+print("Python versions prior to 2.7 are not supported for PyFlink.",
+  file=sys.stderr)
+sys.exit(-1)
+
+this_directory = os.path.abspath(os.path.dirname(__file__))
+version_file = os.path.join(this_directory, 'pyflink/version.py')
+
+try:
+exec(open(version_file).read())
+except IOError:
+print("Failed to load PyFlink version file for packaging. " +
+  "'%s' not found!" % version_file,
+  file=sys.stderr)
+sys.exit(-1)
+VERSION = __version__  # noqa
+
+with open(os.path.join(this_directory, 'README.md'), encoding='utf-8') as f:
 
 Review comment:
   Dose `encoding='utf-8'` can be pasered in py27?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279588708
 
 

 ##
 File path: flink-dist/src/main/flink-bin/bin/pyflink2.sh
 ##
 @@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+# =
+bin=`dirname "$0"`
+bin=`cd "$bin"; pwd`
+
+. "$bin"/config.sh
+
+FLINK_CLASSPATH=`constructFlinkClassPath`
+
+ARGS=()
+
+while [[ $# -gt 0 ]]
+do
+key="$1"
+case $key in
+-c|--class)
+DRIVER=$2
+shift
+shift
+;;
+*)
+   ARGS+=("$1")
+   shift
+   ;;
+esac
+done
+
+PYTHON_JAR_PATH=`echo "$FLINK_ROOT_DIR"/opt/flink-python-*.jar`
+TABLE_JAR_PATH=`echo "$FLINK_ROOT_DIR"/opt/flink-table*.jar`
+exec $JAVA_RUN $JVM_ARGS -cp 
${FLINK_CLASSPATH}:${TABLE_JAR_PATH}:${PYTHON_JAR_PATH} ${DRIVER} ${ARGS[@]}
 
 Review comment:
   Can you add a usage example, and echo to console?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279602913
 
 

 ##
 File path: flink-python/pyflink/table/table.py
 ##
 @@ -0,0 +1,117 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from py4j.java_gateway import get_method
+
+__all__ = ['Table']
+
+
+class Table(object):
+
+"""
+A :class:`Table` is the core component of the Table API.
+Similar to how the batch and streaming APIs have DataSet and DataStream,
+the Table API is built around :class:`Table`.
+
+Use the methods of :class:`Table` to transform data.
+
+Example:
+::
+>>> t_config = 
TableConfig.Builder().as_streaming_execution().set_parallelism(1).build()
+>>> t_env = TableEnvironment.get_table_environment(t_config)
+>>> ...
 
 Review comment:
   I think it's better to add ` t_env.register_table_source("soruce", ...)` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] GJL closed pull request #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder

2019-04-29 Thread GitBox

GJL closed pull request #7999: [hotfix][runtime] Remove unused local variable 
in ExecutionGraphBuilder
URL: https://github.com/apache/flink/pull/7999
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] GJL edited a comment on issue #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder

2019-04-29 Thread GitBox

GJL edited a comment on issue #7999: [hotfix][runtime] Remove unused local 
variable in ExecutionGraphBuilder
URL: https://github.com/apache/flink/pull/7999#issuecomment-487817067
 
 
   Already fixed in 
https://github.com/apache/flink/commit/306a8eb58e261115cad59a533640b56496d10999
   
   Sorry for the late reply.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] GJL commented on issue #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder

2019-04-29 Thread GitBox

GJL commented on issue #7999: [hotfix][runtime] Remove unused local variable in 
ExecutionGraphBuilder
URL: https://github.com/apache/flink/pull/7999#issuecomment-487817067
 
 
   Already fixed in 
https://github.com/apache/flink/commit/306a8eb58e261115cad59a533640b56496d10999


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…

2019-04-29 Thread GitBox

bowenli86 commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs 
to make them more consiste…
URL: https://github.com/apache/flink/pull/8312#issuecomment-487815417
 
 
   @KurtYoung Would be good to have you take a look, and help to merge if 
there's no more concerns. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12369) Implement a region failover strategy regarding the next version FailoverStrategy interfaces

2019-04-29 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12369:
---
Labels: pull-request-available  (was: )

> Implement a region failover strategy regarding the next version 
> FailoverStrategy interfaces
> ---
>
> Key: FLINK-12369
> URL: https://issues.apache.org/jira/browse/FLINK-12369
> Project: Flink
>  Issue Type: Task
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> It is a re-requisite of FLINK-12068. 
> The new strategy interface design can be found at FLINK-10429.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] zhuzhurk opened a new pull request #8316: [FLINK-12369] [coordination] Implement a region failover strategy regarding the next version FailoverStrategy interfaces

2019-04-29 Thread GitBox

zhuzhurk opened a new pull request #8316: [FLINK-12369] [coordination] 
Implement a region failover strategy regarding the next version 
FailoverStrategy interfaces
URL: https://github.com/apache/flink/pull/8316
 
 
   ## What is the purpose of the change
   
   *This pull request implements a region failover strategy regarding the next 
version FailoverStrategy interfaces. The new FailoverStrategy is only 
responsible for returning vertices that need to be restarted. This rewrite is 
necessary to support the backtracking of missing result partitions in region 
failover. *
   
   
   ## Brief change log
   
 - *Add new FailoverStrategy interface*
 - *Add a FailoverTopology and related component interfaces to provide 
necessary topology info to FailoverStrategy*
 - *Add a new version RestartPipelinedRegionStrategy, it is a re-write of 
existing 
rg.apache.flink.runtime.executiongraph.failover.RestartPipelinedRegionStrategy, 
including the faiover handling and region building.*
 - *Add region building tests for the new strategy, which is a re-write of 
existing region building tests.*
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
 - *region building tests are added for the new strategy. It is a re-write 
of existing region building tests*
 - *test cases are added to verify the getTasksNeedingRestart logic*
 - *region building perf test is conducted, for a 1x6000 scale job with 
All-to-All edges, it costs ~4s to build all the regions*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8316: [FLINK-12369] [coordination] Implement a region failover strategy regarding the next version FailoverStrategy interfaces

2019-04-29 Thread GitBox

flinkbot commented on issue #8316: [FLINK-12369] [coordination] Implement a 
region failover strategy regarding the next version FailoverStrategy interfaces
URL: https://github.com/apache/flink/pull/8316#issuecomment-487813728
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12369) Implement a region failover strategy regarding the next version FailoverStrategy interfaces

2019-04-29 Thread Zhu Zhu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-12369:

Description: 
It is a re-requisite of FLINK-12068. 

The new strategy interface design can be found at FLINK-10429.

  was:
It is a re-requisite of FLINK-12068.

The new strategy interface design can be found at FLINK-10429.


> Implement a region failover strategy regarding the next version 
> FailoverStrategy interfaces
> ---
>
> Key: FLINK-12369
> URL: https://issues.apache.org/jira/browse/FLINK-12369
> Project: Flink
>  Issue Type: Task
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
> Fix For: 1.9.0
>
>
> It is a re-requisite of FLINK-12068. 
> The new strategy interface design can be found at FLINK-10429.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12369) Implement a region failover strategy regarding the next version FailoverStrategy interfaces

2019-04-29 Thread Zhu Zhu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-12369:

Summary: Implement a region failover strategy regarding the next version 
FailoverStrategy interfaces  (was: Introducing next version failover strategy 
interfaces and implement a region failover strategy with it)

> Implement a region failover strategy regarding the next version 
> FailoverStrategy interfaces
> ---
>
> Key: FLINK-12369
> URL: https://issues.apache.org/jira/browse/FLINK-12369
> Project: Flink
>  Issue Type: Task
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
> Fix For: 1.9.0
>
>
> It is a re-requisite of FLINK-12068.
> The new strategy interface design can be found at FLINK-10429.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12369) Introducing next version failover strategy interfaces and implement a region failover strategy with it

2019-04-29 Thread Zhu Zhu (JIRA)

Zhu Zhu created FLINK-12369:
---

 Summary: Introducing next version failover strategy interfaces and 
implement a region failover strategy with it
 Key: FLINK-12369
 URL: https://issues.apache.org/jira/browse/FLINK-12369
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Zhu Zhu
Assignee: Zhu Zhu
 Fix For: 1.9.0


It is a re-requisite of FLINK-12068.

The new strategy interface design can be found at FLINK-10429.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] tianchen92 closed pull request #8240: [FLINK-12293][docs, datastream]Fix some comment typos in flink-streaming-java.

2019-04-29 Thread GitBox

tianchen92 closed pull request #8240: [FLINK-12293][docs, datastream]Fix some 
comment typos in flink-streaming-java.
URL: https://github.com/apache/flink/pull/8240
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tianchen92 commented on issue #8240: [FLINK-12293][docs, datastream]Fix some comment typos in flink-streaming-java.

2019-04-29 Thread GitBox

tianchen92 commented on issue #8240: [FLINK-12293][docs, datastream]Fix some 
comment typos in flink-streaming-java.
URL: https://github.com/apache/flink/pull/8240#issuecomment-487809375
 
 
   @rmetzger Thanks for replying, I have closed this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kisimple commented on issue #7757: [FLINK-11630] Triggers the termination of all running Tasks when shutting down TaskExecutor

2019-04-29 Thread GitBox

kisimple commented on issue #7757: [FLINK-11630] Triggers the termination of 
all running Tasks when shutting down TaskExecutor
URL: https://github.com/apache/flink/pull/7757#issuecomment-487809077
 
 
   Hi @azagrebin many thanks for your review and very sorry for the late reply, 
I will address the comments in the next few days :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Comment Edited] (FLINK-10929) Add support for Apache Arrow

2019-04-29 Thread Liya Fan (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826677#comment-16826677
 ] 

Liya Fan edited comment on FLINK-10929 at 4/30/19 3:01 AM:
---

Hi [Stephan 
Ewen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=StephanEwen] 
and [Kurt 
Young|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ykt836], 
thanks a lot for your valuable comments. 

It seems there is not much debate for point 1). 

For point 2), I want to list a few advantages of using Arrow as internal data 
structures for Flink. These conclusions are based on our investigations of 
research papers, as well as our engineering experiences in recent development 
on Blink.
 # Arrow adopts columnar data format, which makes it possible to apply SIMD 
([link|[http://prestodb.rocks/code/simd/])|http://prestodb.rocks/code/simd/%5d).].
 SIMD means higher performance. Java starts to support SIMD in Java 8, and it 
is expected that high versions of Java will provide better support for SIMD. 
 # Arrow provides more compact memory layout. To make it simple, encoding a 
batch of rows as Arrow vectors will require much less memory space, compared 
with encoding by BinaryRow. This, in turn, leads to two results:
 ## With the same amount of memory space, more data can be loaded into memory, 
and fewer spills are required.
 ## The cost of shuffling can be reduced significantly. For example, the amount 
of shuffled data for TPC-H Q4 reduces to almost the half. 
 # Columnar format makes it much easier for data compression (see 
[paper|http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf], for example). 
Our initial experiments have shown some positive results.

There are more advantages for Arrow, but I just list a few, due to space 
constraints. The TPC-H performance in the attachment speaks for itself.

We understand and respect that Blink is currently being merged, and some 
changes should be postponed. We would like to provide our devoted help in this 
process.

However, too much change is not a good reason to refuse Arrow, if we want to 
make Flink the world-top-level computing engine. All these difficulties can be 
overcome. We have a plan to carry out the changes incrementally.

 

 


was (Author: fan_li_ya):
Hi [Stephan 
Ewen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=StephanEwen] 
and [Kurt 
Young|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ykt836], 
thanks a lot for your valuable comments. 

It seems there is not much debate for point 1). 

For point 2), I want to list a few advantages of using Arrow as internal data 
structures for Flink. These conclusions are based on our investigations of 
research papers, as well as our engineering experiences in secondary 
development on Blink.
 # Arrow adopts columnar data format, which makes it possible to apply SIMD 
([link|[http://prestodb.rocks/code/simd/])|http://prestodb.rocks/code/simd/%5d).].
 SIMD means higher performance. Java starts to support SIMD in Java 8, and it 
is expected that high versions of Java will provide better support for SIMD. 
 # Arrow provides more compact memory layout. To make it simple, encoding a 
batch of rows as Arrow vectors will require much less memory space, compared 
with encoding by BinaryRow. This, in turn, leads to two results:
 ## With the same amount of memory space, more data can be loaded into memory, 
and fewer spills are required.
 ## The cost of shuffling can be reduced significantly. For example, the amount 
of shuffled data for TPC-H Q4 reduces to almost the half. 
 # Columnar format makes it much easier for data compression (see 
[paper|http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf], for example). 
Our initial experiments have shown some positive results.

There are more advantages for Arrow, but I just list a few, due to space 
constraints. The TPC-H performance in the attachment speaks for itself.

We understand and respect that Blink is currently being merged, and some 
changes should be postponed. We would like to provide our devoted help in this 
process.

However, too much change is not a good reason to refuse Arrow, if we want to 
make Flink the world-top-level computing engine. All these difficulties can be 
overcome. We have a plan to carry out the changes incrementally.

 

 

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that

[jira] [Assigned] (FLINK-10446) Use the "guava beta checker" plugin to keep off of @Beta API

2019-04-29 Thread Ji Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned FLINK-10446:
--

Assignee: (was: Ji Liu)

> Use the "guava beta checker" plugin to keep off of @Beta API
> 
>
> Key: FLINK-10446
> URL: https://issues.apache.org/jira/browse/FLINK-10446
> Project: Flink
>  Issue Type: Task
>  Components: Build System
>Reporter: Ted Yu
>Priority: Major
>
> The Guava people publish an Error Prone plugin to detect when stuff that's 
> annotated with @Beta gets used. Those things shouldn't be used because the 
> project gives no promises about deprecating before removal.
> plugin:
> https://github.com/google/guava-beta-checker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (FLINK-12145) Host config in flink-web is incorrect

2019-04-29 Thread Ji Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned FLINK-12145:
--

Assignee: (was: Ji Liu)

> Host config in flink-web is incorrect
> -
>
> Key: FLINK-12145
> URL: https://issues.apache.org/jira/browse/FLINK-12145
> Project: Flink
>  Issue Type: Bug
>Reporter: Ji Liu
>Priority: Minor
>
> In [https://flink.apache.org/improve-website.html]
> its description is "Open your browser at 
> [http://localhost:4000|http://localhost:4000/] to view the website",
> but in flink-web project, the host config in _config.yml is 0.0.0.0. In this 
> case when we execute {{./build.sh -p, it will print info like this "Server 
> address: http://0.0.0.0:4000/; which is actually invalid. I think the default 
> host should be replaced with "localhost".}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (FLINK-12349) invalid

2019-04-29 Thread Ji Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned FLINK-12349:
--

Assignee: (was: Ji Liu)

> invalid
> ---
>
> Key: FLINK-12349
> URL: https://issues.apache.org/jira/browse/FLINK-12349
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ji Liu
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align

2019-04-29 Thread GitBox

FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] 
JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align
URL: https://github.com/apache/flink/pull/8248#discussion_r279599610
 
 

 ##
 File path: 
flink-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java
 ##
 @@ -160,7 +160,11 @@ public void writeRecord(Row row) throws IOException {
break;
case 
java.sql.Types.FLOAT:
case 
java.sql.Types.DOUBLE:
-   
upload.setDouble(index + 1, (double) row.getField(index));
+   if 
(row.getField(index).getClass() == Float.class) {
 
 Review comment:
   agree


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align

2019-04-29 Thread GitBox

FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] 
JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align
URL: https://github.com/apache/flink/pull/8248#discussion_r279599519
 
 

 ##
 File path: 
flink-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormatTest.java
 ##
 @@ -133,6 +133,32 @@ public void testExceptionOnInvalidType() throws 
IOException {
jdbcOutputFormat.writeRecord(row);
}
 
+   @Test(expected = RuntimeException.class)
+   public void testCastFloatToDoubleType() throws IOException {
+   jdbcOutputFormat = JDBCOutputFormat.buildJDBCOutputFormat()
+   .setDrivername(DRIVER_CLASS)
+   .setDBUrl(DB_URL)
+   .setQuery(String.format(INSERT_TEMPLATE, INPUT_TABLE))
+   .setSqlTypes(new int[] {
+   Types.INTEGER,
+   Types.VARCHAR,
+   Types.VARCHAR,
+   Types.FLOAT,
+   Types.VARCHAR})
+   .finish();
+   jdbcOutputFormat.open(0, 1);
+
+   Row row = new Row(5);
+   row.setField(0, 4);
+   row.setField(1, "hello");
+   row.setField(2, "world");
+   row.setField(3, 0.99f); // jdbcOutputFormat will cast float to 
double
 
 Review comment:
   agree


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align

2019-04-29 Thread GitBox

FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] 
JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align
URL: https://github.com/apache/flink/pull/8248#discussion_r279599496
 
 

 ##
 File path: 
flink-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormatTest.java
 ##
 @@ -133,6 +133,32 @@ public void testExceptionOnInvalidType() throws 
IOException {
jdbcOutputFormat.writeRecord(row);
}
 
+   @Test(expected = RuntimeException.class)
 
 Review comment:
   agree.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] hequn8128 commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig

2019-04-29 Thread GitBox

hequn8128 commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove 
useless code in TableConfig
URL: https://github.com/apache/flink/pull/8301#issuecomment-487796670
 
 
   @twalthr Thank you very much for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279592781
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
 
 Review comment:
   ok, no problem. Please also take the APIs in ReadableWritableCatalog into 
consideration


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] KurtYoung commented on a change in pull request #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime

2019-04-29 Thread GitBox

KurtYoung commented on a change in pull request #8302: 
[FLINK-12269][table-blink] Support Temporal Table Join in blink planner and 
runtime
URL: https://github.com/apache/flink/pull/8302#discussion_r279592398
 
 

 ##
 File path: 
flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/plan/batch/sql/join/TemporalTableJoinTest.xml
 ##
 @@ -0,0 +1,335 @@
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+

[GitHub] [flink] flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents

2019-04-29 Thread GitBox

flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language 
switch to the sidebar for Flink documents
URL: https://github.com/apache/flink/pull/8238#issuecomment-485405256
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ✅ 1. The [description] looks good.
   - Approved by @sunjincheng121 [committer]
   * ✅ 2. There is [consensus] that the contribution should go into to Flink.
   - Approved by @sunjincheng121 [committer]
   * ❓ 3. Needs [attention] from.
   * ✅ 4. The change fits into the overall [architecture].
   - Approved by @sunjincheng121 [committer]
   * ✅ 5. Overall code [quality] is good.
   - Approved by @sunjincheng121 [committer]
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on a change in pull request #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents

2019-04-29 Thread GitBox

sunjincheng121 commented on a change in pull request #8238: [FLINK-11732] 
[docs] Add a language switch to the sidebar for Flink documents
URL: https://github.com/apache/flink/pull/8238#discussion_r279583940
 
 

 ##
 File path: docs/_layouts/plain.html
 ##
 @@ -64,3 +64,9 @@ {{ page.title }}{% if page.is_beta %} Beta{% endif
 {% endif %}
 
 {{ content }}
+
+{% if page.language == "zh" %}
+  
+https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications;
 target="_blank">想参与贡献翻译？
 
 Review comment:
   I like the idea of add this link in this page. One suggestion is can we add 
this link for EN version so that everyone can know this info in every language 
version. (just a suggestion) :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents

2019-04-29 Thread GitBox

flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language 
switch to the sidebar for Flink documents
URL: https://github.com/apache/flink/pull/8238#issuecomment-485405256
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ✅ 1. The [description] looks good.
   - Approved by @sunjincheng121 [committer]
   * ✅ 2. There is [consensus] that the contribution should go into to Flink.
   - Approved by @sunjincheng121 [committer]
   * ❓ 3. Needs [attention] from.
   * ✅ 4. The change fits into the overall [architecture].
   - Approved by @sunjincheng121 [committer]
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents

2019-04-29 Thread GitBox

sunjincheng121 commented on issue #8238: [FLINK-11732] [docs] Add a language 
switch to the sidebar for Flink documents
URL: https://github.com/apache/flink/pull/8238#issuecomment-487785374
 
 
   @flinkbot approve-until architecture


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279582178
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, 
CatalogPartitionSpec partitionSpec)
 
 Review comment:
   partition not exist exception.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279582074
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
 
 Review comment:
   Called on partitioned table is fine, though it may not exist or be accurate.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279581962
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
 
 Review comment:
   good thinking. Let me rethink this. It seems to me that column stats of a 
table should be a list of stats for all columns. This probably requires some 
rework. Stay tuned.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279581568
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
 
 Review comment:
   Partitioned table has its own, table-level stats, though it may not be 
complete or accurate, as in Hive's case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279581120
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean 
ignoreIfNotExists) throws Ta
tables.remove(tablePath);
 
 
 Review comment:
   Got it. Good catch!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279581079
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String 
newTableName, boolean ignor
if (partitions.containsKey(tablePath)) {
partitions.put(newPath, 
partitions.remove(tablePath));
}
+
 
 Review comment:
   Yeah, you're right. Good catch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279573210
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, 
CatalogPartitionSpec partitionSpec)
+   throws PartitionNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the column statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getPartitionColumnStatistics(ObjectPath 
tablePath, CatalogPartitionSpec partitionSpec)
 
 Review comment:
   ditto. What does this API try to accomplish? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279573127
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, 
CatalogPartitionSpec partitionSpec)
 
 Review comment:
   what happens if this API is called on a non-partitioned table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279572332
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
+
+   /**
+* Get the statistics of a partition.
+*
+* @param tablePath path of the table
+* @param partitionSpec partition spec of the partition
+* @return the statistics of the given partition
+*
+* @throws PartitionNotExistException if the partition is not 
partitioned
 
 Review comment:
   the wordage "if the partition is not partitioned" needs some rework


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279571858
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
 
 Review comment:
   What happens if this API is called on a partitioned table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279572488
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws 
TableNotExistException, CatalogException;
+
+   /**
+* Get the column statistics of a table.
+*
+* @param tablePath path of the table
+* @return the column statistics of the given table
+*
+* @throws TableNotExistException if the table does not exist in the 
catalog
+* @throws CatalogException in case of any runtime exception
+*/
+   CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) 
throws TableNotExistException, CatalogException;
 
 Review comment:
   What does this API try to accomplish?  If this is for all columns of the 
table, should it return a list of `CatalogColumnStatistics`? If it's for a 
single column, should it take a column name?
   
   Besides, what happens if this API is called on a partitioned table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279570916
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean 
ignoreIfNotExists) throws Ta
tables.remove(tablePath);
 
 
 Review comment:
   Aren't line 238/239 only drops partition stats and partition column stats?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279568940
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String 
newTableName, boolean ignor
if (partitions.containsKey(tablePath)) {
partitions.put(newPath, 
partitions.remove(tablePath));
}
+
 
 Review comment:
   Line # 263-271 only relocate table stats and table column stats, did I miss 
anything?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279569853
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table, null if its statistics 
don't exist or are unknown.
 
 Review comment:
   Yeah, it doesn't make sense to throw `stats not exist exception`. Just using 
the table API and an example to demonstrate the behavior difference.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8315: [FLINK-12368] add subtask index to FlinkKafkaConsumerBase logging, wh…

2019-04-29 Thread GitBox

flinkbot commented on issue #8315: [FLINK-12368] add subtask index to 
FlinkKafkaConsumerBase logging, wh…
URL: https://github.com/apache/flink/pull/8315#issuecomment-48772
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12368) add subtask index to FlinkKafkaConsumerBase logging, which can be very useful when debugging problem

2019-04-29 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12368:
---
Labels: pull-request-available  (was: )

> add subtask index to FlinkKafkaConsumerBase logging, which can be very useful 
> when debugging problem
> 
>
> Key: FLINK-12368
> URL: https://issues.apache.org/jira/browse/FLINK-12368
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.8.0
>Reporter: Steven Zhen Wu
>Assignee: Steven Zhen Wu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] stevenzwu opened a new pull request #8315: [FLINK-12368] add subtask index to FlinkKafkaConsumerBase logging, wh…

2019-04-29 Thread GitBox

stevenzwu opened a new pull request #8315: [FLINK-12368] add subtask index to 
FlinkKafkaConsumerBase logging, wh…
URL: https://github.com/apache/flink/pull/8315
 
 
   …ich can be very useful when debugging problem
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-29 Thread GitBox

piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-487765637
 
 
   Thanks for getting back and taking a look @tillrohrmann and @shuai-xu. To 
answer some of your top level comments / questions:
   1) The flink-azure-fs-hadoop jars are written out to the opt/ directory in 
the flink-dist (based on comments in the original review). I've tested this in 
local flink jobs, I've trying to sort out some things to test this out on our 
internal hadoop cluster. 
   2) I can add some E2E tests on the lines of `test_shaded_presto_s3`. Do we 
have an azure bucket at the project level that I should use? Or should I just 
add the tests similar to the IT test and the folks who run it can fill in their 
azure details?
   3) ITCase with HTTP - Seems like they do support retrieving this information 
via their REST API 
(https://docs.microsoft.com/en-us/rest/api/storagerp/storageaccounts/getproperties).
 I can try and hook this up to the IT case to only run the HTTP tests if 
`supportsHttpsTrafficOnly` = false. 
   4) Dependency jars - If I understand correctly, some of these dependent jars 
(like hadoop-azure / azure-storage) should be part of the hadoop install right? 
Or do I need to tweak things to package them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-12368) add subtask index to FlinkKafkaConsumerBase logging, which can be very useful when debugging problem

2019-04-29 Thread Steven Zhen Wu (JIRA)

Steven Zhen Wu created FLINK-12368:
--

 Summary: add subtask index to FlinkKafkaConsumerBase logging, 
which can be very useful when debugging problem
 Key: FLINK-12368
 URL: https://issues.apache.org/jira/browse/FLINK-12368
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Affects Versions: 1.8.0
Reporter: Steven Zhen Wu
Assignee: Steven Zhen Wu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279557567
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
+
+   @Test
+   public void testStatistics() throws Exception {
+   // Table related
+   catalog.createDatabase(db1, createDb(), false);
+   GenericCatalogTable table = createTable();
+   catalog.createTable(path1, table, false);
+   assertTrue(catalog.getTableStatistics(path1) == null);
+   assertTrue(catalog.getTableColumnStatistics(path1) == null);
+   CatalogTableStatistics tableStatistics = new 
CatalogTableStatistics(5, 2, 100, 575);
+   catalog.alterTableStatistics(path1, tableStatistics, false);
+   CatalogTestUtil.checkEquals(tableStatistics, 
catalog.getTableStatistics(path1));
+   CatalogColumnStatistics columnStatistics = new 
CatalogColumnStatistics(20L, 2L, 40.5, 17, 45, 15);
+   catalog.alterTableColumnStatistics(path1, columnStatistics, 
false);
+   CatalogTestUtil.checkEquals(columnStatistics, 
catalog.getTableColumnStatistics(path1));
+
+   // Partition related
+   catalog.createDatabase(db2, createDb(), false);
+   GenericCatalogTable table2 = createPartitionedTable();
+   catalog.createTable(path2, table2, false);
+   CatalogPartitionSpec partitionSpec = createPartitionSpec();
+   catalog.createPartition(path2, partitionSpec, 
createPartition(), false);
+   assertTrue(catalog.getPartitionStatistics(path2, partitionSpec) 
== null);
+   assertTrue(catalog.getPartitionColumnStatistics(path2, 
partitionSpec) == null);
+   catalog.alterPartitionStatistics(path2, partitionSpec, 
tableStatistics, false);
+   CatalogTestUtil.checkEquals(tableStatistics, 
catalog.getPartitionStatistics(path2, partitionSpec));
+   catalog.alterPartitionColumnStatistics(path2, partitionSpec, 
columnStatistics, false);
+   CatalogTestUtil.checkEquals(columnStatistics, 
catalog.getPartitionColumnStatistics(path2, partitionSpec));
+
+   // Clean up
 
 Review comment:
   Yeah. I guess they are harmless to be here and make this testcase complete 
on its own.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279555978
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
 
 Review comment:
   Well, I think the added test is written differently as others, which gives 
such an expression. However, it covers all the added implementation methods, as 
you can see. What's missing, though, is some negative test cases, which I will 
create a followup JIRA for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279554736
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
 ##
 @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, 
CatalogPartitionSpec partiti
 */
boolean functionExists(ObjectPath functionPath) throws CatalogException;
 
+   // -- Statistics --
+
+   /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table, null if its statistics 
don't exist or are unknown.
 
 Review comment:
   Make sense. I will leave out the "null" comment in those APIs and let 
implementations decide. It doesn't seem making sense to throw "stats not exist" 
exception, though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279551799
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogColumnStatistics.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Column statistics a non-partitioned table or a partition of a partitioned 
table.
+ */
+public class CatalogColumnStatistics {
+
+   /**
+* number of distinct values.
+*/
+   private final Long ndv;
+
+   /**
+* number of nulls.
+*/
+   private final Long nullCount;
+
+   /**
+* average length of column values.
+*/
+   private final Double avgLen;
+
+   /**
+* max length of column values.
+*/
+   private final Integer maxLen;
+
+   /**
+* max value of column values.
+*/
+   private final Number max;
 
 Review comment:
   Good point! This is merely copied from Flink's column stats definition. If 
we think rework is needed, I think it's better to do it in a followup JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279550562
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String 
newTableName, boolean ignor
if (partitions.containsKey(tablePath)) {
partitions.put(newPath, 
partitions.remove(tablePath));
}
+
 
 Review comment:
   Line # 263-271 does that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] piyushnarang commented on a change in pull request #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-29 Thread GitBox

piyushnarang commented on a change in pull request #8117: [FLINK-12115] 
[filesystems]: Add support for AzureFS
URL: https://github.com/apache/flink/pull/8117#discussion_r279550173
 
 

 ##
 File path: 
flink-filesystems/flink-azure-fs-hadoop/src/main/java/org/apache/flink/fs/azurefs/AzureFileSystem.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.fs.azurefs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileSystemKind;
+import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem;
+import org.apache.flink.runtime.util.HadoopUtils;
+
+import org.apache.hadoop.fs.azure.NativeAzureFileSystem;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Azure FileSystem connector for Flink. Based on Azure HDFS support in the
+ * https://hadoop.apache.org/docs/current/hadoop-azure/index.html;>hadoop-azure
 module.
+ */
+public class AzureFileSystem extends HadoopFileSystem {
+   private static final Logger LOG = 
LoggerFactory.getLogger(AzureFileSystem.class);
+
+   private static final String[] CONFIG_PREFIXES = { "fs.azure.", "azure." 
};
+
+   public AzureFileSystem(URI fsUri, Configuration flinkConfig) throws 
IOException {
+   super(createInitializedAzureFS(fsUri, flinkConfig));
+   }
+
+   // uri is of the form: 
wasb(s)://yourcontai...@youraccount.blob.core.windows.net/testDir
+   private static org.apache.hadoop.fs.FileSystem 
createInitializedAzureFS(URI fsUri, Configuration flinkConfig) throws 
IOException {
 
 Review comment:
   sure, will update this. I'm thinking I might just do away with the 
AzureFileSystem class in that case. Have the factory create a HadoopFileSystem 
which is passed the NativeAzureFileSystem. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279550088
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean 
ignoreIfNotExists) throws Ta
tables.remove(tablePath);
 
 
 Review comment:
   They are removed in line#238 and #239 below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] piyushnarang commented on a change in pull request #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-29 Thread GitBox

piyushnarang commented on a change in pull request #8117: [FLINK-12115] 
[filesystems]: Add support for AzureFS
URL: https://github.com/apache/flink/pull/8117#discussion_r279549575
 
 

 ##
 File path: flink-filesystems/flink-azure-fs-hadoop/pom.xml
 ##
 @@ -0,0 +1,80 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+   4.0.0
+
+   
+   org.apache.flink
+   flink-filesystems
+   1.9-SNAPSHOT
+   ..
+   
+
+   flink-azure-fs-hadoop
+   flink-azure-fs-hadoop
+
+   jar
+
+   
+   
+   2.7.0
+   
+
+   
+
+   
+   org.apache.flink
+   flink-core
+   ${project.version}
+   provided
+   
+
+   
+   org.apache.flink
+   flink-hadoop-fs
+   ${project.version}
+   
+
+   
+   org.apache.hadoop
+   hadoop-azure
+   ${fs.azure.version}
+   
+
+   
+   
+   org.apache.hadoop
+   hadoop-hdfs
+   ${hadoop.version}
+   test
+   
+
+   
+   
+   org.apache.flink
+   flink-core
+   ${project.version}
+   test
+   test-jar
+   
+
+   
 
 Review comment:
   sure will add. How do I go about adding the various notice / license files? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz edited a comment on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…

2019-04-29 Thread GitBox

xuefuz edited a comment on issue #8312: [FLINK-12366][table] Clean up Catalog 
APIs to make them more consiste…
URL: https://github.com/apache/flink/pull/8312#issuecomment-487736112
 
 
   Thanks, @bowenli86! Updated the PR to address above your review feedback.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…

2019-04-29 Thread GitBox

xuefuz commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to 
make them more consiste…
URL: https://github.com/apache/flink/pull/8312#issuecomment-487736112
 
 
   Thanks, Bowen! Updated the PR to address above your review feedback.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module

2019-04-29 Thread GitBox

bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database 
related operations in GenericHiveMetastoreCatalog and setup 
flink-connector-hive module
URL: https://github.com/apache/flink/pull/8205#issuecomment-487735305
 
 
   @flinkbot attention @KurtYoung 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Comment Edited] (FLINK-9172) Support external catalogs in SQL-Client

2019-04-29 Thread Bowen Li (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829690#comment-16829690
 ] 

Bowen Li edited comment on FLINK-9172 at 4/29/19 8:36 PM:
--

Hi [~eronwright] , sorry to have you wait so long. This is a really valuable 
contribution, and we definitely should get this PR in. 

[~twalthr] and us have discussed the yaml config formats [1], let me know if 
you have any comments. Are you interested in continue working on it and finish 
it in the next couple weeks? I don't want to give you too much pressure on the 
timeline :) Can you please rebase and adapt your PR accordingly? I will be 
available for any followup reviews. Cheers!

[1] 
[https://docs.google.com/document/d/1ALxfiGZBaZ8KUNJtoT443hReoPoJEdt9Db2wiJ2LuWA/edit?usp=sharing]

 


was (Author: phoenixjiangnan):
Hi [~eronwright] , sorry to have you wait so long. This is a really valuable 
contribution, and we definitely should get this PR in. 

[~twalthr] and us have discussed the yaml config formats [1], let me know if 
you have any comments. Can you please rebase and adapt your PR accordingly? I 
will be available for any followup reviews. Cheers!

[1] 
[https://docs.google.com/document/d/1ALxfiGZBaZ8KUNJtoT443hReoPoJEdt9Db2wiJ2LuWA/edit?usp=sharing]

 

> Support external catalogs in SQL-Client
> ---
>
> Key: FLINK-9172
> URL: https://issues.apache.org/jira/browse/FLINK-9172
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Client
>Reporter: Rong Rong
>Assignee: Eron Wright 
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It doesn't seem that the configuration (YAML) file allows specifications of 
> external catalogs currently. The request here is to add support for external 
> catalog specifications in YAML file. User should also be able to specify one 
> catalog is the default.
> It will be great to have SQL-Client to support some external catalogs 
> out-of-the-box for SQL users to configure and utilize easily. I am currently 
> think of having an external catalog factory that spins up both streaming and 
> batch external catalog table sources and sinks. This could greatly unify and 
> provide easy access for SQL users. 
> The catalog-related configurations then need to be processed and passed to 
> TableEnvironment accordingly by calling relevant APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] bowenli86 commented on issue #7393: [FLINK-9172][table][sql-client] - Support external catalogs in SQL-Client

2019-04-29 Thread GitBox

bowenli86 commented on issue #7393: [FLINK-9172][table][sql-client] - Support 
external catalogs in SQL-Client
URL: https://github.com/apache/flink/pull/7393#issuecomment-487733576
 
 
   Thanks for your valuable contribution, @EronWright !  I left some comments 
in the JIRA ticket. Can you please rebase and adapt this PR? I can help review 
it as soon as you submit again. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-9172) Support external catalogs in SQL-Client

2019-04-29 Thread Bowen Li (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829690#comment-16829690
 ] 

Bowen Li commented on FLINK-9172:
-

Hi [~eronwright] , sorry to have you wait so long. This is a really valuable 
contribution, and we definitely should get this PR in. 

[~twalthr] and us have discussed the yaml config formats [1], let me know if 
you have any comments. Can you please rebase and adapt your PR accordingly? I 
will be available for any followup reviews. Cheers!

[1] 
[https://docs.google.com/document/d/1ALxfiGZBaZ8KUNJtoT443hReoPoJEdt9Db2wiJ2LuWA/edit?usp=sharing]

 

> Support external catalogs in SQL-Client
> ---
>
> Key: FLINK-9172
> URL: https://issues.apache.org/jira/browse/FLINK-9172
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Client
>Reporter: Rong Rong
>Assignee: Eron Wright 
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It doesn't seem that the configuration (YAML) file allows specifications of 
> external catalogs currently. The request here is to add support for external 
> catalog specifications in YAML file. User should also be able to specify one 
> catalog is the default.
> It will be great to have SQL-Client to support some external catalogs 
> out-of-the-box for SQL users to configure and utilize easily. I am currently 
> think of having an external catalog factory that spins up both streaming and 
> batch external catalog table sources and sinks. This could greatly unify and 
> provide easy access for SQL users. 
> The catalog-related configurations then need to be processed and passed to 
> TableEnvironment accordingly by calling relevant APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279500369
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
+
+   @Test
+   public void testStatistics() throws Exception {
+   // Table related
+   catalog.createDatabase(db1, createDb(), false);
+   GenericCatalogTable table = createTable();
+   catalog.createTable(path1, table, false);
+   assertTrue(catalog.getTableStatistics(path1) == null);
 
 Review comment:
   nit: to follow the format in this file, can we have an empty line between 
assertions and code to be tested?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279523176

##
File path:
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java
##
@@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath,
CatalogPartitionSpec partiti
*/
boolean functionExists(ObjectPath functionPath) throws CatalogException;

+ // -- Statistics --
+
+ /**
+* Get the statistics of a table.
+*
+* @param tablePath path of the table
+* @return the statistics of the given table, null if its statistics
don't exist or are unknown.

Review comment:
Stats should probably always exist. It's how Hive handles it. Also, the
behavior of "returns null when the stats doesn't exist" doesn't seem to match
behaviors of other metadata APIs and would confuse people, for example
`getTable()` throws `TableNotExistException` when the request table doesn't
exist. It's just the default, "unknown" stats will be with some default value,
for example, rowCount, fileCount, totalSize will all be 0.

This will also better explain the signatures of these APIs.

What do you think?

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279506420
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
 
 Review comment:
   compared to the amount of unit tests for other APIs, the amount of UT for 
stats seems to be too few.
   
   I'm ok with either adding more tests in this PR, or creating an immediate 
followup PR dedicated to adding UT for stats related APIs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279500776
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() 
throws Exception {
catalog.dropDatabase(db1, false);
}
 
+   // -- statistics --
+
+   @Test
+   public void testStatistics() throws Exception {
+   // Table related
+   catalog.createDatabase(db1, createDb(), false);
+   GenericCatalogTable table = createTable();
+   catalog.createTable(path1, table, false);
+   assertTrue(catalog.getTableStatistics(path1) == null);
+   assertTrue(catalog.getTableColumnStatistics(path1) == null);
+   CatalogTableStatistics tableStatistics = new 
CatalogTableStatistics(5, 2, 100, 575);
+   catalog.alterTableStatistics(path1, tableStatistics, false);
+   CatalogTestUtil.checkEquals(tableStatistics, 
catalog.getTableStatistics(path1));
+   CatalogColumnStatistics columnStatistics = new 
CatalogColumnStatistics(20L, 2L, 40.5, 17, 45, 15);
+   catalog.alterTableColumnStatistics(path1, columnStatistics, 
false);
+   CatalogTestUtil.checkEquals(columnStatistics, 
catalog.getTableColumnStatistics(path1));
+
+   // Partition related
+   catalog.createDatabase(db2, createDb(), false);
+   GenericCatalogTable table2 = createPartitionedTable();
+   catalog.createTable(path2, table2, false);
+   CatalogPartitionSpec partitionSpec = createPartitionSpec();
+   catalog.createPartition(path2, partitionSpec, 
createPartition(), false);
+   assertTrue(catalog.getPartitionStatistics(path2, partitionSpec) 
== null);
+   assertTrue(catalog.getPartitionColumnStatistics(path2, 
partitionSpec) == null);
+   catalog.alterPartitionStatistics(path2, partitionSpec, 
tableStatistics, false);
+   CatalogTestUtil.checkEquals(tableStatistics, 
catalog.getPartitionStatistics(path2, partitionSpec));
+   catalog.alterPartitionColumnStatistics(path2, partitionSpec, 
columnStatistics, false);
+   CatalogTestUtil.checkEquals(columnStatistics, 
catalog.getPartitionColumnStatistics(path2, partitionSpec));
+
+   // Clean up
 
 Review comment:
   nit: we can remove these cleanups as they are already in close()


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279504357
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogColumnStatistics.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Column statistics a non-partitioned table or a partition of a partitioned 
table.
+ */
+public class CatalogColumnStatistics {
+
+   /**
+* number of distinct values.
+*/
+   private final Long ndv;
+
+   /**
+* number of nulls.
+*/
+   private final Long nullCount;
+
+   /**
+* average length of column values.
+*/
+   private final Double avgLen;
+
+   /**
+* max length of column values.
+*/
+   private final Integer maxLen;
+
+   /**
+* max value of column values.
+*/
+   private final Number max;
 
 Review comment:
   do we need to make min/max type `Object`? For string and date columns, the 
min/max would be String and Date,
   
   it's always been a bit tricky on stats definition. Hive handles stats for 
different typed columns in a different way by having individual classes for 
different typed stats, e.g. DecimalColumnStatsData, StringColumnStatsData, and 
DateColumnStatsData.
   
   I'm ok to leave this as-is for now, but want to bring up the potential 
issues and the need for a solution soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279499373
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String 
newTableName, boolean ignor
if (partitions.containsKey(tablePath)) {
partitions.put(newPath, 
partitions.remove(tablePath));
}
+
 
 Review comment:
   do we need to relocate partition stats and partition column stats too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add 
stats related catalog APIs
URL: https://github.com/apache/flink/pull/8314#discussion_r279498588
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean 
ignoreIfNotExists) throws Ta
tables.remove(tablePath);
 
 
 Review comment:
   do we need to also remove table stats and column stats of the table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] 
Clean up Catalog APIs to make them more consiste…
URL: https://github.com/apache/flink/pull/8312#discussion_r279492143
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java
 ##
 @@ -416,19 +430,38 @@ public void alterPartition(ObjectPath tablePath, 
CatalogPartitionSpec partitionS
}
 
@Override
-   public List listPartitions(ObjectPath tablePath)
-   throws TableNotExistException, TableNotPartitionedException, 
CatalogException {
+   public List listAllPartitions(ObjectPath 
tablePath)
 
 Review comment:
   can we revert it to `listPartitions()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] 
Clean up Catalog APIs to make them more consiste…
URL: https://github.com/apache/flink/pull/8312#discussion_r279495627
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java
 ##
 @@ -731,32 +727,36 @@ public void testDropPartition() throws Exception {
catalog.createTable(path1, createPartitionedTable(), false);
catalog.createPartition(path1, createPartitionSpec(), 
createPartition(), false);
 
-   assertEquals(Arrays.asList(createPartitionSpec()), 
catalog.listPartitions(path1));
+   assertEquals(Arrays.asList(createPartitionSpec()), 
catalog.listAllPartitions(path1));
 
catalog.dropPartition(path1, createPartitionSpec(), false);
 
-   assertEquals(Arrays.asList(), catalog.listPartitions(path1));
+   assertEquals(Arrays.asList(), catalog.listAllPartitions(path1));
}
 
@Test
public void testDropPartition_TableNotExistException() throws Exception 
{
 
 Review comment:
   We need to update the unit test names to reflect what case the test is 
actually about. E.g. this UT can be renamed to something like 
`testDropParition_tableNotExist()` or 
`testDropParition_ParitionNotExistException_tableNotExist()`.
   
   Same thing for the following changed unit tests in this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…

2019-04-29 Thread GitBox

bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] 
Clean up Catalog APIs to make them more consiste…
URL: https://github.com/apache/flink/pull/8312#discussion_r279494274
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/exceptions/PartitionSpecInvalidException.java
 ##
 @@ -29,15 +29,14 @@
  * when the size of PartitionSpec is 'n' but its keys don't match the first 
'n' keys in partition key list.
  */
 public class PartitionSpecInvalidException extends Exception {
-   private static final String MSG = "PartitionSpec %s does not match 
partition keys %s of table %s in catalog %s.";
+   private static final String MSG = "PartitionSpec %s does not match the 
partition keys of table %s in catalog %s.";
 
 Review comment:
   how about adding back the partition keys to give use adequate information of 
why the exception is thrown?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers

2019-04-29 Thread Zhenqiu Huang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829614#comment-16829614
 ] 

Zhenqiu Huang commented on FLINK-12342:
---

 !flink-1.4.png!  !flink-1.6.png! 

[~till.rohrmann]
This is the same job needs (256 containers). If runs on Flink 1.4, it only 
totally acquired 257 containers. But if run on Flink 1.6, it will acquire much 
large number of containers. The allocation of each container will goes to very 
slow, thus cause issue of deployment job with more than 10 minutes which is 
unacceptable for streaming job use cases.

> Yarn Resource Manager Acquires Too Many Containers
> --
>
> Key: FLINK-12342
> URL: https://issues.apache.org/jira/browse/FLINK-12342
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Environment: We runs job in Flink release 1.6.3. 
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2019-04-29 at 12.06.23 AM.png, 
> container.log, flink-1.4.png, flink-1.6.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In currently implementation of YarnFlinkResourceManager, it starts to acquire 
> new container one by one when get request from SlotManager. The mechanism 
> works when job is still, say less than 32 containers. If the job has 256 
> container, containers can't be immediately allocated and appending requests 
> in AMRMClient will be not removed accordingly. We observe the situation that 
> AMRMClient ask for current pending request + 1 (the new request from slot 
> manager) containers. In this way, during the start time of such job, it asked 
> for 4000+ containers. If there is an external dependency issue happens, for 
> example hdfs access is slow. Then, the whole job will be blocked without 
> getting enough resource and finally killed with SlotManager request timeout.
> Thus, we should use the total number of container asked rather than pending 
> request in AMRMClient as threshold to make decision whether we need to add 
> one more resource request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 >

1 - 100 of 344 matches

Mail list logo