[jira] [Updated] (FLINK-12371) Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating optimized logical plan
[ https://issues.apache.org/jira/browse/FLINK-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] godfrey he updated FLINK-12371: --- Description: This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to anti-join. In Calcite, [SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java] only represents semi-join, (could not represent anti-join) and requires equi join condition. Queries like `select * from left where left.a1 in (select right.a2 from right where left.b1 > right.b2)` and `select * from left where not exists (select * from right)` could not be converted to Calcite SemiJoin operator. To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and make the following changes: 1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to support non-equi join condition) 2. add {{isAnti}} field attribute to represent anti-join. Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, so we need a whole new rule (named {{FlinkSubQueryRemoveRule}}) to meet our requirement. was: This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to anti-join. In Calcite, [SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java] only represents semi-join, (could not represent anti-join) and requires equi join condition. Queries like `select * from left where left.a1 in (select right.a2 from right where left.b1 > right.b2)` and `select * from left where not exists (select * from right)` could not be converted to Calcite SemiJoin operator. To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and make the following changes: 1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to support non-equi join condition) 2. add {{isAnti}} field attribute to represent anti-join. Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, so we need a whole new set of rules to meet our requirement. 1. {{FlinkSubQueryRemoveRule}}, a planner rule that converts IN and EXISTS into semi-join, converts NOT IN and NOT EXISTS into anti-join 2. {{FlinkRewriteSubQueryRule}}, a planner rule that converts filter condition like: {{(select count(*) from T) > 0}} to {{exists(select * from T)}} > Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating > optimized logical plan > > > Key: FLINK-12371 > URL: https://issues.apache.org/jira/browse/FLINK-12371 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > > This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to > anti-join. > In Calcite, > [SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java] > only represents semi-join, (could not represent anti-join) and requires equi > join condition. Queries like `select * from left where left.a1 in (select > right.a2 from right where left.b1 > right.b2)` and `select * from left where > not exists (select * from right)` could not be converted to Calcite SemiJoin > operator. > To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and > make the following changes: > 1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to > support non-equi join condition) > 2. add {{isAnti}} field attribute to represent anti-join. > Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, > so we need a whole new rule (named {{FlinkSubQueryRemoveRule}}) to meet our > requirement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12371) Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating optimized logical plan
godfrey he created FLINK-12371: -- Summary: Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating optimized logical plan Key: FLINK-12371 URL: https://issues.apache.org/jira/browse/FLINK-12371 Project: Flink Issue Type: New Feature Components: Table SQL / Planner Reporter: godfrey he Assignee: godfrey he This issue aims to convert IN/EXISTS to semi-join, and NOT IN/NOT EXISTS to anti-join. In Calcite, [SemiJoin|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/SemiJoin.java] only represents semi-join, (could not represent anti-join) and requires equi join condition. Queries like `select * from left where left.a1 in (select right.a2 from right where left.b1 > right.b2)` and `select * from left where not exists (select * from right)` could not be converted to Calcite SemiJoin operator. To solve the above problem, We need copy the {{SemiJoin}} class to Flink, and make the following changes: 1. make {{SemiJoin}} extending from {{Join}}, not from {{EquiJoin}}. (to support non-equi join condition) 2. add {{isAnti}} field attribute to represent anti-join. Currently, there are no rules to convert (NOT) IN/ (NOT) EXISTS to SemiJoin, so we need a whole new set of rules to meet our requirement. 1. {{FlinkSubQueryRemoveRule}}, a planner rule that converts IN and EXISTS into semi-join, converts NOT IN and NOT EXISTS into anti-join 2. {{FlinkRewriteSubQueryRule}}, a planner rule that converts filter condition like: {{(select count(*) from T) > 0}} to {{exists(select * from T)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zjffdu commented on issue #8144: [FLINK-12159]. Enable YarnMiniCluster integration test under non-secure mode
zjffdu commented on issue #8144: [FLINK-12159]. Enable YarnMiniCluster integration test under non-secure mode URL: https://github.com/apache/flink/pull/8144#issuecomment-487822515 @tillrohrmann I could do other refactoring/improvement in another followup pr. Thanks for the review. Do you have any other suggestion for the current PR ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (FLINK-12370) Integrated Travis for Python Table API
[ https://issues.apache.org/jira/browse/FLINK-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zhong reassigned FLINK-12370: - Assignee: Wei Zhong > Integrated Travis for Python Table API > -- > > Key: FLINK-12370 > URL: https://issues.apache.org/jira/browse/FLINK-12370 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Affects Versions: 1.9.0 >Reporter: sunjincheng >Assignee: Wei Zhong >Priority: Major > > Integrated Travis for Python Table API -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] sunjincheng121 commented on issue #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on issue #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#issuecomment-487818672 I have created the JIRA for Integrated Travis for Python Table API. https://issues.apache.org/jira/browse/FLINK-12370 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12370) Integrated Travis for Python Table API
sunjincheng created FLINK-12370: --- Summary: Integrated Travis for Python Table API Key: FLINK-12370 URL: https://issues.apache.org/jira/browse/FLINK-12370 Project: Flink Issue Type: Sub-task Components: API / Python Affects Versions: 1.9.0 Reporter: sunjincheng Integrated Travis for Python Table API -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279607459 ## File path: flink-python/pyflink/table/table_environment.py ## @@ -0,0 +1,160 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from abc import ABCMeta + +from pyflink.java_gateway import get_gateway +from pyflink.table import Table +from pyflink.util import type_utils, utils + +__all__ = [ +'BatchTableEnvironment', +'StreamTableEnvironment', +'TableEnvironment' +] + + +class TableEnvironment(object): +""" +The abstract base class for batch and stream TableEnvironments. +""" + +__metaclass__ = ABCMeta + +def __init__(self, j_tenv): +self._j_tenv = j_tenv + +def from_table_source(self, table_source): +""" +Creates a table from a table source. + +:param table_source: table source used as table +:return: result table +""" +return Table(self._j_tenv.fromTableSource(table_source._j_table_source)) + +def register_table(self, name, table): +""" +Registers a :class:`Table` under a unique name in the TableEnvironment's catalog. +Registered tables can be referenced in SQL queries. + +:param name: The name under which the table will be registered. +:param table: The table to register. +""" +self._j_tenv.registerTable(name, table._java_table) + +def register_table_source(self, name, table_source): +""" +Registers an external :class:`TableSource` in this :class:`TableEnvironment`'s catalog. +Registered tables can be referenced in SQL queries. + +:param name: The name under which the :class:`TableSource` is registered. +:param table_source: The :class:`TableSource` to register. +""" +self._j_tenv.registerTableSource(name, table_source._j_table_source) + +def register_table_sink(self, name, field_names, field_types, table_sink): +""" +Registers an external :class:`TableSink` with given field names and types in this +:class:`TableEnvironment`\ 's catalog. Review comment: `TableEnvironment`\ 's catalog.` and `TableEnvironment`'s -> which pattern is you want? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279584911 ## File path: docs/dev/table/tableApi.md ## @@ -297,6 +297,81 @@ val result = orders.where('b === "red") + + + + + + + Operators + Description + + + + + +Scan +Batch Streaming + + +Similar to the FROM clause in a SQL query. Performs a scan of a registered table. +{% highlight python %} +orders = table_env.scan("Orders"); +{% endhighlight %} + + + + +Select +Batch Streaming + + +Similar to a SQL SELECT statement. Performs a select operation. +{% highlight python %} +orders = tableEnv.scan("Orders"); Review comment: `tableEnv` ->`table_env` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279600310 ## File path: flink-python/README.md ## @@ -0,0 +1,22 @@ +# Apache Flink Python API + +Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. + +Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/) + +This packaging is currently a very initial version and will change in future versions. + +## Installation + +In order to use PyFlink, you need to install Flink on your device and set the value of the environment variable FLINK_HOME to the root directory of Flink. +Then enter the directory where this README.md file is located and execute `python setup.py install` to install PyFlink on your device. + +## Running Tests + +Currently you can perform an end-to-end test of PyFlink in the directory where this file is located with the following command: + +PYTHONPATH=./ python ./pyflink/table/tests/test_end_to_end.py Review comment: Add PYTHONPATH itself, if user has already set it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279585800 ## File path: docs/dev/table/tableApi.zh.md ## @@ -297,6 +297,81 @@ val result = orders.where('b === "red") + + + + + + + Operators + Description + + + + + +Scan +Batch Streaming + + +Similar to the FROM clause in a SQL query. Performs a scan of a registered table. +{% highlight python %} +orders = table_env.scan("Orders"); +{% endhighlight %} + + + + +Select +Batch Streaming + + +Similar to a SQL SELECT statement. Performs a select operation. +{% highlight python %} +orders = tableEnv.scan("Orders"); Review comment: `tableEnv` -> `table_env` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279590864 ## File path: flink-python/README.md ## @@ -0,0 +1,22 @@ +# Apache Flink Python API + +Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Review comment: `powerful ` -> `the powerful ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279586605 ## File path: flink-dist/src/main/assemblies/opt.xml ## @@ -177,5 +177,20 @@ flink-streaming-python_${scala.binary.version}-${project.version}.jar 0644 + + Review comment: `Batch Python API, flink-libraries/flink-python will be replaced by flink-python in the future` `Streaming Python API, flink-libraries/flink-python, will be replaced by flink-python in the future` `Python Table API` ->`Python API` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279603205 ## File path: flink-python/pyflink/table/table.py ## @@ -0,0 +1,117 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from py4j.java_gateway import get_method + +__all__ = ['Table'] + + +class Table(object): + +""" +A :class:`Table` is the core component of the Table API. +Similar to how the batch and streaming APIs have DataSet and DataStream, +the Table API is built around :class:`Table`. + +Use the methods of :class:`Table` to transform data. + +Example: +:: +>>> t_config = TableConfig.Builder().as_streaming_execution().set_parallelism(1).build() +>>> t_env = TableEnvironment.get_table_environment(t_config) +>>> ... +>>> t = t_env.scan("source") +>>> t.select(...) +... +>>> t.insert_into("print") +>>> t_env.execute() + +Operations such as :func:`~pyflink.table.Table.join`, :func:`~pyflink.table.Table.select`, +:func:`~pyflink.table.Table.where` and :func:`~pyflink.table.Table.group_by` +take arguments in an expression string. Please refer to the documentation for +the expression syntax. +""" + +def __init__(self, j_table): +self._j_table = j_table + +def select(self, fields): +""" +Performs a selection operation. Similar to a SQL SELECT statement. The field expressions +can contain complex expressions. + +Example: +:: +>>> t = tab.select("key, value + 'hello'") Review comment: `t = tab.select("key, value + 'hello'")` -> `tab.select("key, value + 'hello'")` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279589827 ## File path: flink-python/pom.xml ## @@ -0,0 +1,201 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd;> + + 4.0.0 + + + flink-parent + org.apache.flink + 1.9-SNAPSHOT + .. + + + flink-python + flink-python + + jar + + + + net.sf.py4j + py4j + 0.10.8.1 + + + + + + + org.apache.maven.plugins + maven-enforcer-plugin + + + dependency-convergence + + enforce + + + true + + + + + + + + com.github.siom79.japicmp + japicmp-maven-plugin + + + + org.apache.maven.plugins + maven-surefire-plugin + + + false + + + + + + org.apache.maven.plugins + maven-eclipse-plugin + 2.8 + + true + + org.scala-ide.sdt.core.scalanature + org.eclipse.jdt.core.javanature + + + org.scala-ide.sdt.core.scalabuilder + + + org.scala-ide.sdt.launching.SCALA_CONTAINER + + org.eclipse.jdt.launching.JRE_CONTAINER + + + + org.scala-lang:scala-library + org.scala-lang:scala-compiler + + + **/*.scala Review comment: Do we need scala in this project? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279590263 ## File path: docs/dev/table/tableApi.zh.md ## @@ -297,6 +297,81 @@ val result = orders.where('b === "red") + + + + + + + Operators + Description + + + + + +Scan +Batch Streaming + + +Similar to the FROM clause in a SQL query. Performs a scan of a registered table. Review comment: I think we can use Chinese for zh version, What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279601910 ## File path: flink-dist/src/main/flink-bin/bin/pyflink2.sh ## @@ -0,0 +1,48 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# = +bin=`dirname "$0"` Review comment: I think the responsibility of this script is launcher the PythonGetwayServer, so I suggest renaming it to `pyflink-getway-server.sh` what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279605826 ## File path: flink-python/pyflink/table/table_environment.py ## @@ -0,0 +1,160 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from abc import ABCMeta + +from pyflink.java_gateway import get_gateway +from pyflink.table import Table +from pyflink.util import type_utils, utils + +__all__ = [ +'BatchTableEnvironment', +'StreamTableEnvironment', +'TableEnvironment' +] + + +class TableEnvironment(object): +""" +The abstract base class for batch and stream TableEnvironments. +""" + +__metaclass__ = ABCMeta + +def __init__(self, j_tenv): +self._j_tenv = j_tenv + +def from_table_source(self, table_source): +""" +Creates a table from a table source. + +:param table_source: table source used as table +:return: result table +""" Review comment: `param table_source: table source used as table` -> `param table_source: Table source used as table` ` :return: result table` -> `:return: Result table` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279589515 ## File path: flink-python/README.md ## @@ -0,0 +1,22 @@ +# Apache Flink Python API + +Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. + +Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/) + +This packaging is currently a very initial version and will change in future versions. + +## Installation + +In order to use PyFlink, you need to install Flink on your device and set the value of the environment variable FLINK_HOME to the root directory of Flink. +Then enter the directory where this README.md file is located and execute `python setup.py install` to install PyFlink on your device. + +## Running Tests + +Currently you can perform an end-to-end test of PyFlink in the directory where this file is located with the following command: + +PYTHONPATH=./ python ./pyflink/table/tests/test_end_to_end.py + +## Python Requirements + +PyFlink currently depends on `Py4J 0.10.8.1`. Review comment: PyFlink depends on Py4J (currently version 0.10.8.1) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279600480 ## File path: flink-python/pyflink/find_flink_home.py ## @@ -0,0 +1,44 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function +import os +import sys + + +def _find_flink_home(): +""" +Find the FLINK_HOME. Review comment: Please echo the value of FLINK_HOME. is that make sense toyou? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279600010 ## File path: flink-python/README.md ## @@ -0,0 +1,22 @@ +# Apache Flink Python API + +Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. + +Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/) + +This packaging is currently a very initial version and will change in future versions. + +## Installation + +In order to use PyFlink, you need to install Flink on your device and set the value of the environment variable FLINK_HOME to the root directory of Flink. Review comment: We need to add how to set FLINK_HOME. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279591670 ## File path: flink-python/README.md ## @@ -0,0 +1,22 @@ +# Apache Flink Python API + +Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Review comment: I think it's better to introduce the Python API, not the Flink. And we can link the doc for the python api, for the first version only support table api, so ,we can link to : https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279594338 ## File path: flink-python/pyflink/java_gateway.py ## @@ -0,0 +1,106 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import shutil +import signal +import struct +import tempfile +import time +from subprocess import Popen, PIPE +from threading import RLock + +from py4j.java_gateway import java_import, JavaGateway, GatewayParameters +from pyflink.find_flink_home import _find_flink_home + + +_gateway = None +_lock = RLock() + + +def get_gateway(): +# type: () -> JavaGateway +global _gateway +global _lock +with _lock: +if _gateway is None: +# if Java Gateway is already running +if 'PYFLINK_GATEWAY_PORT' in os.environ: +gateway_port = int(os.environ['PYFLINK_GATEWAY_PORT']) +gateway_param = GatewayParameters(port=gateway_port, auto_convert=True) +_gateway = JavaGateway(gateway_parameters=gateway_param) +else: +_gateway = launch_gateway() +return _gateway + + +def launch_gateway(): +# type: () -> JavaGateway +""" +launch jvm gateway +""" + +FLINK_HOME = _find_flink_home() +# TODO windows support Review comment: Please check the system type and exit 1 with the error message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279603054 ## File path: flink-python/pyflink/table/table.py ## @@ -0,0 +1,117 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from py4j.java_gateway import get_method + +__all__ = ['Table'] + + +class Table(object): + +""" +A :class:`Table` is the core component of the Table API. +Similar to how the batch and streaming APIs have DataSet and DataStream, +the Table API is built around :class:`Table`. + +Use the methods of :class:`Table` to transform data. + +Example: +:: +>>> t_config = TableConfig.Builder().as_streaming_execution().set_parallelism(1).build() +>>> t_env = TableEnvironment.get_table_environment(t_config) +>>> ... +>>> t = t_env.scan("source") +>>> t.select(...) +... +>>> t.insert_into("print") Review comment: It's better to add ` t_env.register_table_sink("result",...)` and `t.insert_into("result") This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279594143 ## File path: .gitignore ## @@ -14,7 +14,9 @@ tmp *.iml *.swp *.jar +*.zip *.log +*.pyc Review comment: We also need to add the following ignores: ``` flink-python/build/ flink-python/dist/ flink-python/pyflink.egg-info/ ``` What to you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279607459 ## File path: flink-python/pyflink/table/table_environment.py ## @@ -0,0 +1,160 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from abc import ABCMeta + +from pyflink.java_gateway import get_gateway +from pyflink.table import Table +from pyflink.util import type_utils, utils + +__all__ = [ +'BatchTableEnvironment', +'StreamTableEnvironment', +'TableEnvironment' +] + + +class TableEnvironment(object): +""" +The abstract base class for batch and stream TableEnvironments. +""" + +__metaclass__ = ABCMeta + +def __init__(self, j_tenv): +self._j_tenv = j_tenv + +def from_table_source(self, table_source): +""" +Creates a table from a table source. + +:param table_source: table source used as table +:return: result table +""" +return Table(self._j_tenv.fromTableSource(table_source._j_table_source)) + +def register_table(self, name, table): +""" +Registers a :class:`Table` under a unique name in the TableEnvironment's catalog. +Registered tables can be referenced in SQL queries. + +:param name: The name under which the table will be registered. +:param table: The table to register. +""" +self._j_tenv.registerTable(name, table._java_table) + +def register_table_source(self, name, table_source): +""" +Registers an external :class:`TableSource` in this :class:`TableEnvironment`'s catalog. +Registered tables can be referenced in SQL queries. + +:param name: The name under which the :class:`TableSource` is registered. +:param table_source: The :class:`TableSource` to register. +""" +self._j_tenv.registerTableSource(name, table_source._j_table_source) + +def register_table_sink(self, name, field_names, field_types, table_sink): +""" +Registers an external :class:`TableSink` with given field names and types in this +:class:`TableEnvironment`\ 's catalog. Review comment: `TableEnvironment`\ 's catalog. -> `TableEnvironment`\'s catalog. ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279588033 ## File path: flink-dist/src/main/flink-bin/bin/pyflink2.sh ## @@ -0,0 +1,48 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# = +bin=`dirname "$0"` +bin=`cd "$bin"; pwd` + +. "$bin"/config.sh + +FLINK_CLASSPATH=`constructFlinkClassPath` + +ARGS=() + +while [[ $# -gt 0 ]] +do +key="$1" +case $key in +-c|--class) +DRIVER=$2 +shift +shift +;; +*) + ARGS+=("$1") + shift + ;; +esac +done + Review comment: I think it's better to add `log_setting`, similar to `flink` shell? https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/flink This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279592971 ## File path: flink-python/setup.py ## @@ -0,0 +1,63 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function + +import os +import sys +from setuptools import setup + +if sys.version_info < (2, 7): +print("Python versions prior to 2.7 are not supported for PyFlink.", + file=sys.stderr) +sys.exit(-1) + +this_directory = os.path.abspath(os.path.dirname(__file__)) +version_file = os.path.join(this_directory, 'pyflink/version.py') + +try: +exec(open(version_file).read()) +except IOError: +print("Failed to load PyFlink version file for packaging. " + + "'%s' not found!" % version_file, + file=sys.stderr) +sys.exit(-1) +VERSION = __version__ # noqa + +with open(os.path.join(this_directory, 'README.md'), encoding='utf-8') as f: Review comment: Dose `encoding='utf-8'` can be pasered in py27? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279588708 ## File path: flink-dist/src/main/flink-bin/bin/pyflink2.sh ## @@ -0,0 +1,48 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# = +bin=`dirname "$0"` +bin=`cd "$bin"; pwd` + +. "$bin"/config.sh + +FLINK_CLASSPATH=`constructFlinkClassPath` + +ARGS=() + +while [[ $# -gt 0 ]] +do +key="$1" +case $key in +-c|--class) +DRIVER=$2 +shift +shift +;; +*) + ARGS+=("$1") + shift + ;; +esac +done + +PYTHON_JAR_PATH=`echo "$FLINK_ROOT_DIR"/opt/flink-python-*.jar` +TABLE_JAR_PATH=`echo "$FLINK_ROOT_DIR"/opt/flink-table*.jar` +exec $JAVA_RUN $JVM_ARGS -cp ${FLINK_CLASSPATH}:${TABLE_JAR_PATH}:${PYTHON_JAR_PATH} ${DRIVER} ${ARGS[@]} Review comment: Can you add a usage example, and echo to console? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
sunjincheng121 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279602913 ## File path: flink-python/pyflink/table/table.py ## @@ -0,0 +1,117 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from py4j.java_gateway import get_method + +__all__ = ['Table'] + + +class Table(object): + +""" +A :class:`Table` is the core component of the Table API. +Similar to how the batch and streaming APIs have DataSet and DataStream, +the Table API is built around :class:`Table`. + +Use the methods of :class:`Table` to transform data. + +Example: +:: +>>> t_config = TableConfig.Builder().as_streaming_execution().set_parallelism(1).build() +>>> t_env = TableEnvironment.get_table_environment(t_config) +>>> ... Review comment: I think it's better to add ` t_env.register_table_source("soruce", ...)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] GJL closed pull request #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder
GJL closed pull request #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder URL: https://github.com/apache/flink/pull/7999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] GJL edited a comment on issue #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder
GJL edited a comment on issue #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder URL: https://github.com/apache/flink/pull/7999#issuecomment-487817067 Already fixed in https://github.com/apache/flink/commit/306a8eb58e261115cad59a533640b56496d10999 Sorry for the late reply. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] GJL commented on issue #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder
GJL commented on issue #7999: [hotfix][runtime] Remove unused local variable in ExecutionGraphBuilder URL: https://github.com/apache/flink/pull/7999#issuecomment-487817067 Already fixed in https://github.com/apache/flink/commit/306a8eb58e261115cad59a533640b56496d10999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…
bowenli86 commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste… URL: https://github.com/apache/flink/pull/8312#issuecomment-487815417 @KurtYoung Would be good to have you take a look, and help to merge if there's no more concerns. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12369) Implement a region failover strategy regarding the next version FailoverStrategy interfaces
[ https://issues.apache.org/jira/browse/FLINK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12369: --- Labels: pull-request-available (was: ) > Implement a region failover strategy regarding the next version > FailoverStrategy interfaces > --- > > Key: FLINK-12369 > URL: https://issues.apache.org/jira/browse/FLINK-12369 > Project: Flink > Issue Type: Task > Components: Runtime / Coordination >Affects Versions: 1.9.0 >Reporter: Zhu Zhu >Assignee: Zhu Zhu >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > It is a re-requisite of FLINK-12068. > The new strategy interface design can be found at FLINK-10429. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zhuzhurk opened a new pull request #8316: [FLINK-12369] [coordination] Implement a region failover strategy regarding the next version FailoverStrategy interfaces
zhuzhurk opened a new pull request #8316: [FLINK-12369] [coordination] Implement a region failover strategy regarding the next version FailoverStrategy interfaces URL: https://github.com/apache/flink/pull/8316 ## What is the purpose of the change *This pull request implements a region failover strategy regarding the next version FailoverStrategy interfaces. The new FailoverStrategy is only responsible for returning vertices that need to be restarted. This rewrite is necessary to support the backtracking of missing result partitions in region failover. * ## Brief change log - *Add new FailoverStrategy interface* - *Add a FailoverTopology and related component interfaces to provide necessary topology info to FailoverStrategy* - *Add a new version RestartPipelinedRegionStrategy, it is a re-write of existing rg.apache.flink.runtime.executiongraph.failover.RestartPipelinedRegionStrategy, including the faiover handling and region building.* - *Add region building tests for the new strategy, which is a re-write of existing region building tests.* ## Verifying this change This change added tests and can be verified as follows: - *region building tests are added for the new strategy. It is a re-write of existing region building tests* - *test cases are added to verify the getTasksNeedingRestart logic* - *region building perf test is conducted, for a 1x6000 scale job with All-to-All edges, it costs ~4s to build all the regions* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8316: [FLINK-12369] [coordination] Implement a region failover strategy regarding the next version FailoverStrategy interfaces
flinkbot commented on issue #8316: [FLINK-12369] [coordination] Implement a region failover strategy regarding the next version FailoverStrategy interfaces URL: https://github.com/apache/flink/pull/8316#issuecomment-487813728 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12369) Implement a region failover strategy regarding the next version FailoverStrategy interfaces
[ https://issues.apache.org/jira/browse/FLINK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-12369: Description: It is a re-requisite of FLINK-12068. The new strategy interface design can be found at FLINK-10429. was: It is a re-requisite of FLINK-12068. The new strategy interface design can be found at FLINK-10429. > Implement a region failover strategy regarding the next version > FailoverStrategy interfaces > --- > > Key: FLINK-12369 > URL: https://issues.apache.org/jira/browse/FLINK-12369 > Project: Flink > Issue Type: Task > Components: Runtime / Coordination >Affects Versions: 1.9.0 >Reporter: Zhu Zhu >Assignee: Zhu Zhu >Priority: Major > Fix For: 1.9.0 > > > It is a re-requisite of FLINK-12068. > The new strategy interface design can be found at FLINK-10429. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12369) Implement a region failover strategy regarding the next version FailoverStrategy interfaces
[ https://issues.apache.org/jira/browse/FLINK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-12369: Summary: Implement a region failover strategy regarding the next version FailoverStrategy interfaces (was: Introducing next version failover strategy interfaces and implement a region failover strategy with it) > Implement a region failover strategy regarding the next version > FailoverStrategy interfaces > --- > > Key: FLINK-12369 > URL: https://issues.apache.org/jira/browse/FLINK-12369 > Project: Flink > Issue Type: Task > Components: Runtime / Coordination >Affects Versions: 1.9.0 >Reporter: Zhu Zhu >Assignee: Zhu Zhu >Priority: Major > Fix For: 1.9.0 > > > It is a re-requisite of FLINK-12068. > The new strategy interface design can be found at FLINK-10429. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12369) Introducing next version failover strategy interfaces and implement a region failover strategy with it
Zhu Zhu created FLINK-12369: --- Summary: Introducing next version failover strategy interfaces and implement a region failover strategy with it Key: FLINK-12369 URL: https://issues.apache.org/jira/browse/FLINK-12369 Project: Flink Issue Type: Task Components: Runtime / Coordination Affects Versions: 1.9.0 Reporter: Zhu Zhu Assignee: Zhu Zhu Fix For: 1.9.0 It is a re-requisite of FLINK-12068. The new strategy interface design can be found at FLINK-10429. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] tianchen92 closed pull request #8240: [FLINK-12293][docs, datastream]Fix some comment typos in flink-streaming-java.
tianchen92 closed pull request #8240: [FLINK-12293][docs, datastream]Fix some comment typos in flink-streaming-java. URL: https://github.com/apache/flink/pull/8240 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] tianchen92 commented on issue #8240: [FLINK-12293][docs, datastream]Fix some comment typos in flink-streaming-java.
tianchen92 commented on issue #8240: [FLINK-12293][docs, datastream]Fix some comment typos in flink-streaming-java. URL: https://github.com/apache/flink/pull/8240#issuecomment-487809375 @rmetzger Thanks for replying, I have closed this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] kisimple commented on issue #7757: [FLINK-11630] Triggers the termination of all running Tasks when shutting down TaskExecutor
kisimple commented on issue #7757: [FLINK-11630] Triggers the termination of all running Tasks when shutting down TaskExecutor URL: https://github.com/apache/flink/pull/7757#issuecomment-487809077 Hi @azagrebin many thanks for your review and very sorry for the late reply, I will address the comments in the next few days :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826677#comment-16826677 ] Liya Fan edited comment on FLINK-10929 at 4/30/19 3:01 AM: --- Hi [Stephan Ewen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=StephanEwen] and [Kurt Young|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ykt836], thanks a lot for your valuable comments. It seems there is not much debate for point 1). For point 2), I want to list a few advantages of using Arrow as internal data structures for Flink. These conclusions are based on our investigations of research papers, as well as our engineering experiences in recent development on Blink. # Arrow adopts columnar data format, which makes it possible to apply SIMD ([link|[http://prestodb.rocks/code/simd/])|http://prestodb.rocks/code/simd/%5d).]. SIMD means higher performance. Java starts to support SIMD in Java 8, and it is expected that high versions of Java will provide better support for SIMD. # Arrow provides more compact memory layout. To make it simple, encoding a batch of rows as Arrow vectors will require much less memory space, compared with encoding by BinaryRow. This, in turn, leads to two results: ## With the same amount of memory space, more data can be loaded into memory, and fewer spills are required. ## The cost of shuffling can be reduced significantly. For example, the amount of shuffled data for TPC-H Q4 reduces to almost the half. # Columnar format makes it much easier for data compression (see [paper|http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf], for example). Our initial experiments have shown some positive results. There are more advantages for Arrow, but I just list a few, due to space constraints. The TPC-H performance in the attachment speaks for itself. We understand and respect that Blink is currently being merged, and some changes should be postponed. We would like to provide our devoted help in this process. However, too much change is not a good reason to refuse Arrow, if we want to make Flink the world-top-level computing engine. All these difficulties can be overcome. We have a plan to carry out the changes incrementally. was (Author: fan_li_ya): Hi [Stephan Ewen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=StephanEwen] and [Kurt Young|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ykt836], thanks a lot for your valuable comments. It seems there is not much debate for point 1). For point 2), I want to list a few advantages of using Arrow as internal data structures for Flink. These conclusions are based on our investigations of research papers, as well as our engineering experiences in secondary development on Blink. # Arrow adopts columnar data format, which makes it possible to apply SIMD ([link|[http://prestodb.rocks/code/simd/])|http://prestodb.rocks/code/simd/%5d).]. SIMD means higher performance. Java starts to support SIMD in Java 8, and it is expected that high versions of Java will provide better support for SIMD. # Arrow provides more compact memory layout. To make it simple, encoding a batch of rows as Arrow vectors will require much less memory space, compared with encoding by BinaryRow. This, in turn, leads to two results: ## With the same amount of memory space, more data can be loaded into memory, and fewer spills are required. ## The cost of shuffling can be reduced significantly. For example, the amount of shuffled data for TPC-H Q4 reduces to almost the half. # Columnar format makes it much easier for data compression (see [paper|http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf], for example). Our initial experiments have shown some positive results. There are more advantages for Arrow, but I just list a few, due to space constraints. The TPC-H performance in the attachment speaks for itself. We understand and respect that Blink is currently being merged, and some changes should be postponed. We would like to provide our devoted help in this process. However, too much change is not a good reason to refuse Arrow, if we want to make Flink the world-top-level computing engine. All these difficulties can be overcome. We have a plan to carry out the changes incrementally. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that
[jira] [Assigned] (FLINK-10446) Use the "guava beta checker" plugin to keep off of @Beta API
[ https://issues.apache.org/jira/browse/FLINK-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned FLINK-10446: -- Assignee: (was: Ji Liu) > Use the "guava beta checker" plugin to keep off of @Beta API > > > Key: FLINK-10446 > URL: https://issues.apache.org/jira/browse/FLINK-10446 > Project: Flink > Issue Type: Task > Components: Build System >Reporter: Ted Yu >Priority: Major > > The Guava people publish an Error Prone plugin to detect when stuff that's > annotated with @Beta gets used. Those things shouldn't be used because the > project gives no promises about deprecating before removal. > plugin: > https://github.com/google/guava-beta-checker -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-12145) Host config in flink-web is incorrect
[ https://issues.apache.org/jira/browse/FLINK-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned FLINK-12145: -- Assignee: (was: Ji Liu) > Host config in flink-web is incorrect > - > > Key: FLINK-12145 > URL: https://issues.apache.org/jira/browse/FLINK-12145 > Project: Flink > Issue Type: Bug >Reporter: Ji Liu >Priority: Minor > > In [https://flink.apache.org/improve-website.html] > its description is "Open your browser at > [http://localhost:4000|http://localhost:4000/] to view the website", > but in flink-web project, the host config in _config.yml is 0.0.0.0. In this > case when we execute {{./build.sh -p, it will print info like this "Server > address: http://0.0.0.0:4000/; which is actually invalid. I think the default > host should be replaced with "localhost".}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-12349) invalid
[ https://issues.apache.org/jira/browse/FLINK-12349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned FLINK-12349: -- Assignee: (was: Ji Liu) > invalid > --- > > Key: FLINK-12349 > URL: https://issues.apache.org/jira/browse/FLINK-12349 > Project: Flink > Issue Type: Improvement >Reporter: Ji Liu >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align
FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align URL: https://github.com/apache/flink/pull/8248#discussion_r279599610 ## File path: flink-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java ## @@ -160,7 +160,11 @@ public void writeRecord(Row row) throws IOException { break; case java.sql.Types.FLOAT: case java.sql.Types.DOUBLE: - upload.setDouble(index + 1, (double) row.getField(index)); + if (row.getField(index).getClass() == Float.class) { Review comment: agree This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align
FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align URL: https://github.com/apache/flink/pull/8248#discussion_r279599519 ## File path: flink-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormatTest.java ## @@ -133,6 +133,32 @@ public void testExceptionOnInvalidType() throws IOException { jdbcOutputFormat.writeRecord(row); } + @Test(expected = RuntimeException.class) + public void testCastFloatToDoubleType() throws IOException { + jdbcOutputFormat = JDBCOutputFormat.buildJDBCOutputFormat() + .setDrivername(DRIVER_CLASS) + .setDBUrl(DB_URL) + .setQuery(String.format(INSERT_TEMPLATE, INPUT_TABLE)) + .setSqlTypes(new int[] { + Types.INTEGER, + Types.VARCHAR, + Types.VARCHAR, + Types.FLOAT, + Types.VARCHAR}) + .finish(); + jdbcOutputFormat.open(0, 1); + + Row row = new Row(5); + row.setField(0, 4); + row.setField(1, "hello"); + row.setField(2, "world"); + row.setField(3, 0.99f); // jdbcOutputFormat will cast float to double Review comment: agree This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align
FaxianZhao commented on a change in pull request #8248: [FLINK-12309][JDBC] JDBCOoutputFormat and JDBCAppendTableSink float behavior is not align URL: https://github.com/apache/flink/pull/8248#discussion_r279599496 ## File path: flink-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormatTest.java ## @@ -133,6 +133,32 @@ public void testExceptionOnInvalidType() throws IOException { jdbcOutputFormat.writeRecord(row); } + @Test(expected = RuntimeException.class) Review comment: agree. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] hequn8128 commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig
hequn8128 commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig URL: https://github.com/apache/flink/pull/8301#issuecomment-487796670 @twalthr Thank you very much for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279592781 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; Review comment: ok, no problem. Please also take the APIs in ReadableWritableCatalog into consideration This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] KurtYoung commented on a change in pull request #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime
KurtYoung commented on a change in pull request #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime URL: https://github.com/apache/flink/pull/8302#discussion_r279592398 ## File path: flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/plan/batch/sql/join/TemporalTableJoinTest.xml ## @@ -0,0 +1,335 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
[GitHub] [flink] flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents
flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents URL: https://github.com/apache/flink/pull/8238#issuecomment-485405256 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ✅ 1. The [description] looks good. - Approved by @sunjincheng121 [committer] * ✅ 2. There is [consensus] that the contribution should go into to Flink. - Approved by @sunjincheng121 [committer] * ❓ 3. Needs [attention] from. * ✅ 4. The change fits into the overall [architecture]. - Approved by @sunjincheng121 [committer] * ✅ 5. Overall code [quality] is good. - Approved by @sunjincheng121 [committer] Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents
sunjincheng121 commented on a change in pull request #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents URL: https://github.com/apache/flink/pull/8238#discussion_r279583940 ## File path: docs/_layouts/plain.html ## @@ -64,3 +64,9 @@ {{ page.title }}{% if page.is_beta %} Beta{% endif {% endif %} {{ content }} + +{% if page.language == "zh" %} + +https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications; target="_blank">想参与贡献翻译? Review comment: I like the idea of add this link in this page. One suggestion is can we add this link for EN version so that everyone can know this info in every language version. (just a suggestion) :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents
flinkbot edited a comment on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents URL: https://github.com/apache/flink/pull/8238#issuecomment-485405256 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ✅ 1. The [description] looks good. - Approved by @sunjincheng121 [committer] * ✅ 2. There is [consensus] that the contribution should go into to Flink. - Approved by @sunjincheng121 [committer] * ❓ 3. Needs [attention] from. * ✅ 4. The change fits into the overall [architecture]. - Approved by @sunjincheng121 [committer] * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents
sunjincheng121 commented on issue #8238: [FLINK-11732] [docs] Add a language switch to the sidebar for Flink documents URL: https://github.com/apache/flink/pull/8238#issuecomment-487785374 @flinkbot approve-until architecture This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279582178 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the statistics of a partition. +* +* @param tablePath path of the table +* @param partitionSpec partition spec of the partition +* @return the statistics of the given partition +* +* @throws PartitionNotExistException if the partition is not partitioned +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, CatalogPartitionSpec partitionSpec) Review comment: partition not exist exception. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279582074 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; Review comment: Called on partitioned table is fine, though it may not exist or be accurate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279581962 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; Review comment: good thinking. Let me rethink this. It seems to me that column stats of a table should be a list of stats for all columns. This probably requires some rework. Stay tuned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279581568 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; Review comment: Partitioned table has its own, table-level stats, though it may not be complete or accurate, as in Hive's case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279581120 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean ignoreIfNotExists) throws Ta tables.remove(tablePath); Review comment: Got it. Good catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279581079 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String newTableName, boolean ignor if (partitions.containsKey(tablePath)) { partitions.put(newPath, partitions.remove(tablePath)); } + Review comment: Yeah, you're right. Good catch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279573210 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the statistics of a partition. +* +* @param tablePath path of the table +* @param partitionSpec partition spec of the partition +* @return the statistics of the given partition +* +* @throws PartitionNotExistException if the partition is not partitioned +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, CatalogPartitionSpec partitionSpec) + throws PartitionNotExistException, CatalogException; + + /** +* Get the column statistics of a partition. +* +* @param tablePath path of the table +* @param partitionSpec partition spec of the partition +* @return the column statistics of the given partition +* +* @throws PartitionNotExistException if the partition is not partitioned +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getPartitionColumnStatistics(ObjectPath tablePath, CatalogPartitionSpec partitionSpec) Review comment: ditto. What does this API try to accomplish? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279573127 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the statistics of a partition. +* +* @param tablePath path of the table +* @param partitionSpec partition spec of the partition +* @return the statistics of the given partition +* +* @throws PartitionNotExistException if the partition is not partitioned +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getPartitionStatistics(ObjectPath tablePath, CatalogPartitionSpec partitionSpec) Review comment: what happens if this API is called on a non-partitioned table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279572332 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the statistics of a partition. +* +* @param tablePath path of the table +* @param partitionSpec partition spec of the partition +* @return the statistics of the given partition +* +* @throws PartitionNotExistException if the partition is not partitioned Review comment: the wordage "if the partition is not partitioned" needs some rework This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279571858 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; Review comment: What happens if this API is called on a partitioned table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279572488 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogTableStatistics getTableStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; + + /** +* Get the column statistics of a table. +* +* @param tablePath path of the table +* @return the column statistics of the given table +* +* @throws TableNotExistException if the table does not exist in the catalog +* @throws CatalogException in case of any runtime exception +*/ + CatalogColumnStatistics getTableColumnStatistics(ObjectPath tablePath) throws TableNotExistException, CatalogException; Review comment: What does this API try to accomplish? If this is for all columns of the table, should it return a list of `CatalogColumnStatistics`? If it's for a single column, should it take a column name? Besides, what happens if this API is called on a partitioned table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279570916 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean ignoreIfNotExists) throws Ta tables.remove(tablePath); Review comment: Aren't line 238/239 only drops partition stats and partition column stats? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279568940 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String newTableName, boolean ignor if (partitions.containsKey(tablePath)) { partitions.put(newPath, partitions.remove(tablePath)); } + Review comment: Line # 263-271 only relocate table stats and table column stats, did I miss anything? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279569853 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table, null if its statistics don't exist or are unknown. Review comment: Yeah, it doesn't make sense to throw `stats not exist exception`. Just using the table API and an example to demonstrate the behavior difference. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8315: [FLINK-12368] add subtask index to FlinkKafkaConsumerBase logging, wh…
flinkbot commented on issue #8315: [FLINK-12368] add subtask index to FlinkKafkaConsumerBase logging, wh… URL: https://github.com/apache/flink/pull/8315#issuecomment-48772 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12368) add subtask index to FlinkKafkaConsumerBase logging, which can be very useful when debugging problem
[ https://issues.apache.org/jira/browse/FLINK-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12368: --- Labels: pull-request-available (was: ) > add subtask index to FlinkKafkaConsumerBase logging, which can be very useful > when debugging problem > > > Key: FLINK-12368 > URL: https://issues.apache.org/jira/browse/FLINK-12368 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.8.0 >Reporter: Steven Zhen Wu >Assignee: Steven Zhen Wu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] stevenzwu opened a new pull request #8315: [FLINK-12368] add subtask index to FlinkKafkaConsumerBase logging, wh…
stevenzwu opened a new pull request #8315: [FLINK-12368] add subtask index to FlinkKafkaConsumerBase logging, wh… URL: https://github.com/apache/flink/pull/8315 …ich can be very useful when debugging problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS URL: https://github.com/apache/flink/pull/8117#issuecomment-487765637 Thanks for getting back and taking a look @tillrohrmann and @shuai-xu. To answer some of your top level comments / questions: 1) The flink-azure-fs-hadoop jars are written out to the opt/ directory in the flink-dist (based on comments in the original review). I've tested this in local flink jobs, I've trying to sort out some things to test this out on our internal hadoop cluster. 2) I can add some E2E tests on the lines of `test_shaded_presto_s3`. Do we have an azure bucket at the project level that I should use? Or should I just add the tests similar to the IT test and the folks who run it can fill in their azure details? 3) ITCase with HTTP - Seems like they do support retrieving this information via their REST API (https://docs.microsoft.com/en-us/rest/api/storagerp/storageaccounts/getproperties). I can try and hook this up to the IT case to only run the HTTP tests if `supportsHttpsTrafficOnly` = false. 4) Dependency jars - If I understand correctly, some of these dependent jars (like hadoop-azure / azure-storage) should be part of the hadoop install right? Or do I need to tweak things to package them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12368) add subtask index to FlinkKafkaConsumerBase logging, which can be very useful when debugging problem
Steven Zhen Wu created FLINK-12368: -- Summary: add subtask index to FlinkKafkaConsumerBase logging, which can be very useful when debugging problem Key: FLINK-12368 URL: https://issues.apache.org/jira/browse/FLINK-12368 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Affects Versions: 1.8.0 Reporter: Steven Zhen Wu Assignee: Steven Zhen Wu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279557567 ## File path: flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java ## @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() throws Exception { catalog.dropDatabase(db1, false); } + // -- statistics -- + + @Test + public void testStatistics() throws Exception { + // Table related + catalog.createDatabase(db1, createDb(), false); + GenericCatalogTable table = createTable(); + catalog.createTable(path1, table, false); + assertTrue(catalog.getTableStatistics(path1) == null); + assertTrue(catalog.getTableColumnStatistics(path1) == null); + CatalogTableStatistics tableStatistics = new CatalogTableStatistics(5, 2, 100, 575); + catalog.alterTableStatistics(path1, tableStatistics, false); + CatalogTestUtil.checkEquals(tableStatistics, catalog.getTableStatistics(path1)); + CatalogColumnStatistics columnStatistics = new CatalogColumnStatistics(20L, 2L, 40.5, 17, 45, 15); + catalog.alterTableColumnStatistics(path1, columnStatistics, false); + CatalogTestUtil.checkEquals(columnStatistics, catalog.getTableColumnStatistics(path1)); + + // Partition related + catalog.createDatabase(db2, createDb(), false); + GenericCatalogTable table2 = createPartitionedTable(); + catalog.createTable(path2, table2, false); + CatalogPartitionSpec partitionSpec = createPartitionSpec(); + catalog.createPartition(path2, partitionSpec, createPartition(), false); + assertTrue(catalog.getPartitionStatistics(path2, partitionSpec) == null); + assertTrue(catalog.getPartitionColumnStatistics(path2, partitionSpec) == null); + catalog.alterPartitionStatistics(path2, partitionSpec, tableStatistics, false); + CatalogTestUtil.checkEquals(tableStatistics, catalog.getPartitionStatistics(path2, partitionSpec)); + catalog.alterPartitionColumnStatistics(path2, partitionSpec, columnStatistics, false); + CatalogTestUtil.checkEquals(columnStatistics, catalog.getPartitionColumnStatistics(path2, partitionSpec)); + + // Clean up Review comment: Yeah. I guess they are harmless to be here and make this testcase complete on its own. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279555978 ## File path: flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java ## @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() throws Exception { catalog.dropDatabase(db1, false); } + // -- statistics -- Review comment: Well, I think the added test is written differently as others, which gives such an expression. However, it covers all the added implementation methods, as you can see. What's missing, though, is some negative test cases, which I will create a followup JIRA for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279554736 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table, null if its statistics don't exist or are unknown. Review comment: Make sense. I will leave out the "null" comment in those APIs and let implementations decide. It doesn't seem making sense to throw "stats not exist" exception, though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279551799 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogColumnStatistics.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog; + +import java.util.HashMap; +import java.util.Map; + +/** + * Column statistics a non-partitioned table or a partition of a partitioned table. + */ +public class CatalogColumnStatistics { + + /** +* number of distinct values. +*/ + private final Long ndv; + + /** +* number of nulls. +*/ + private final Long nullCount; + + /** +* average length of column values. +*/ + private final Double avgLen; + + /** +* max length of column values. +*/ + private final Integer maxLen; + + /** +* max value of column values. +*/ + private final Number max; Review comment: Good point! This is merely copied from Flink's column stats definition. If we think rework is needed, I think it's better to do it in a followup JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279550562 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String newTableName, boolean ignor if (partitions.containsKey(tablePath)) { partitions.put(newPath, partitions.remove(tablePath)); } + Review comment: Line # 263-271 does that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] piyushnarang commented on a change in pull request #8117: [FLINK-12115] [filesystems]: Add support for AzureFS
piyushnarang commented on a change in pull request #8117: [FLINK-12115] [filesystems]: Add support for AzureFS URL: https://github.com/apache/flink/pull/8117#discussion_r279550173 ## File path: flink-filesystems/flink-azure-fs-hadoop/src/main/java/org/apache/flink/fs/azurefs/AzureFileSystem.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.fs.azurefs; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.fs.FileSystemKind; +import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem; +import org.apache.flink.runtime.util.HadoopUtils; + +import org.apache.hadoop.fs.azure.NativeAzureFileSystem; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.net.URI; + +/** + * Azure FileSystem connector for Flink. Based on Azure HDFS support in the + * https://hadoop.apache.org/docs/current/hadoop-azure/index.html;>hadoop-azure module. + */ +public class AzureFileSystem extends HadoopFileSystem { + private static final Logger LOG = LoggerFactory.getLogger(AzureFileSystem.class); + + private static final String[] CONFIG_PREFIXES = { "fs.azure.", "azure." }; + + public AzureFileSystem(URI fsUri, Configuration flinkConfig) throws IOException { + super(createInitializedAzureFS(fsUri, flinkConfig)); + } + + // uri is of the form: wasb(s)://yourcontai...@youraccount.blob.core.windows.net/testDir + private static org.apache.hadoop.fs.FileSystem createInitializedAzureFS(URI fsUri, Configuration flinkConfig) throws IOException { Review comment: sure, will update this. I'm thinking I might just do away with the AzureFileSystem class in that case. Have the factory create a HadoopFileSystem which is passed the NativeAzureFileSystem. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
xuefuz commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279550088 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean ignoreIfNotExists) throws Ta tables.remove(tablePath); Review comment: They are removed in line#238 and #239 below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] piyushnarang commented on a change in pull request #8117: [FLINK-12115] [filesystems]: Add support for AzureFS
piyushnarang commented on a change in pull request #8117: [FLINK-12115] [filesystems]: Add support for AzureFS URL: https://github.com/apache/flink/pull/8117#discussion_r279549575 ## File path: flink-filesystems/flink-azure-fs-hadoop/pom.xml ## @@ -0,0 +1,80 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd;> + + 4.0.0 + + + org.apache.flink + flink-filesystems + 1.9-SNAPSHOT + .. + + + flink-azure-fs-hadoop + flink-azure-fs-hadoop + + jar + + + + 2.7.0 + + + + + + org.apache.flink + flink-core + ${project.version} + provided + + + + org.apache.flink + flink-hadoop-fs + ${project.version} + + + + org.apache.hadoop + hadoop-azure + ${fs.azure.version} + + + + + org.apache.hadoop + hadoop-hdfs + ${hadoop.version} + test + + + + + org.apache.flink + flink-core + ${project.version} + test + test-jar + + + Review comment: sure will add. How do I go about adding the various notice / license files? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz edited a comment on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…
xuefuz edited a comment on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste… URL: https://github.com/apache/flink/pull/8312#issuecomment-487736112 Thanks, @bowenli86! Updated the PR to address above your review feedback. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…
xuefuz commented on issue #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste… URL: https://github.com/apache/flink/pull/8312#issuecomment-487736112 Thanks, Bowen! Updated the PR to address above your review feedback. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module URL: https://github.com/apache/flink/pull/8205#issuecomment-487735305 @flinkbot attention @KurtYoung This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-9172) Support external catalogs in SQL-Client
[ https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829690#comment-16829690 ] Bowen Li edited comment on FLINK-9172 at 4/29/19 8:36 PM: -- Hi [~eronwright] , sorry to have you wait so long. This is a really valuable contribution, and we definitely should get this PR in. [~twalthr] and us have discussed the yaml config formats [1], let me know if you have any comments. Are you interested in continue working on it and finish it in the next couple weeks? I don't want to give you too much pressure on the timeline :) Can you please rebase and adapt your PR accordingly? I will be available for any followup reviews. Cheers! [1] [https://docs.google.com/document/d/1ALxfiGZBaZ8KUNJtoT443hReoPoJEdt9Db2wiJ2LuWA/edit?usp=sharing] was (Author: phoenixjiangnan): Hi [~eronwright] , sorry to have you wait so long. This is a really valuable contribution, and we definitely should get this PR in. [~twalthr] and us have discussed the yaml config formats [1], let me know if you have any comments. Can you please rebase and adapt your PR accordingly? I will be available for any followup reviews. Cheers! [1] [https://docs.google.com/document/d/1ALxfiGZBaZ8KUNJtoT443hReoPoJEdt9Db2wiJ2LuWA/edit?usp=sharing] > Support external catalogs in SQL-Client > --- > > Key: FLINK-9172 > URL: https://issues.apache.org/jira/browse/FLINK-9172 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Reporter: Rong Rong >Assignee: Eron Wright >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It doesn't seem that the configuration (YAML) file allows specifications of > external catalogs currently. The request here is to add support for external > catalog specifications in YAML file. User should also be able to specify one > catalog is the default. > It will be great to have SQL-Client to support some external catalogs > out-of-the-box for SQL users to configure and utilize easily. I am currently > think of having an external catalog factory that spins up both streaming and > batch external catalog table sources and sinks. This could greatly unify and > provide easy access for SQL users. > The catalog-related configurations then need to be processed and passed to > TableEnvironment accordingly by calling relevant APIs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] bowenli86 commented on issue #7393: [FLINK-9172][table][sql-client] - Support external catalogs in SQL-Client
bowenli86 commented on issue #7393: [FLINK-9172][table][sql-client] - Support external catalogs in SQL-Client URL: https://github.com/apache/flink/pull/7393#issuecomment-487733576 Thanks for your valuable contribution, @EronWright ! I left some comments in the JIRA ticket. Can you please rebase and adapt this PR? I can help review it as soon as you submit again. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-9172) Support external catalogs in SQL-Client
[ https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829690#comment-16829690 ] Bowen Li commented on FLINK-9172: - Hi [~eronwright] , sorry to have you wait so long. This is a really valuable contribution, and we definitely should get this PR in. [~twalthr] and us have discussed the yaml config formats [1], let me know if you have any comments. Can you please rebase and adapt your PR accordingly? I will be available for any followup reviews. Cheers! [1] [https://docs.google.com/document/d/1ALxfiGZBaZ8KUNJtoT443hReoPoJEdt9Db2wiJ2LuWA/edit?usp=sharing] > Support external catalogs in SQL-Client > --- > > Key: FLINK-9172 > URL: https://issues.apache.org/jira/browse/FLINK-9172 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Reporter: Rong Rong >Assignee: Eron Wright >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It doesn't seem that the configuration (YAML) file allows specifications of > external catalogs currently. The request here is to add support for external > catalog specifications in YAML file. User should also be able to specify one > catalog is the default. > It will be great to have SQL-Client to support some external catalogs > out-of-the-box for SQL users to configure and utilize easily. I am currently > think of having an external catalog factory that spins up both streaming and > batch external catalog table sources and sinks. This could greatly unify and > provide easy access for SQL users. > The catalog-related configurations then need to be processed and passed to > TableEnvironment accordingly by calling relevant APIs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279500369 ## File path: flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java ## @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() throws Exception { catalog.dropDatabase(db1, false); } + // -- statistics -- + + @Test + public void testStatistics() throws Exception { + // Table related + catalog.createDatabase(db1, createDb(), false); + GenericCatalogTable table = createTable(); + catalog.createTable(path1, table, false); + assertTrue(catalog.getTableStatistics(path1) == null); Review comment: nit: to follow the format in this file, can we have an empty line between assertions and code to be tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279523176 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java ## @@ -224,4 +224,54 @@ CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partiti */ boolean functionExists(ObjectPath functionPath) throws CatalogException; + // -- Statistics -- + + /** +* Get the statistics of a table. +* +* @param tablePath path of the table +* @return the statistics of the given table, null if its statistics don't exist or are unknown. Review comment: Stats should probably always exist. It's how Hive handles it. Also, the behavior of "returns null when the stats doesn't exist" doesn't seem to match behaviors of other metadata APIs and would confuse people, for example `getTable()` throws `TableNotExistException` when the request table doesn't exist. It's just the default, "unknown" stats will be with some default value, for example, rowCount, fileCount, totalSize will all be 0. This will also better explain the signatures of these APIs. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279506420 ## File path: flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java ## @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() throws Exception { catalog.dropDatabase(db1, false); } + // -- statistics -- Review comment: compared to the amount of unit tests for other APIs, the amount of UT for stats seems to be too few. I'm ok with either adding more tests in this PR, or creating an immediate followup PR dedicated to adding UT for stats related APIs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279500776 ## File path: flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java ## @@ -1104,6 +1104,43 @@ public void testDropFunction_FunctionNotExist_ignored() throws Exception { catalog.dropDatabase(db1, false); } + // -- statistics -- + + @Test + public void testStatistics() throws Exception { + // Table related + catalog.createDatabase(db1, createDb(), false); + GenericCatalogTable table = createTable(); + catalog.createTable(path1, table, false); + assertTrue(catalog.getTableStatistics(path1) == null); + assertTrue(catalog.getTableColumnStatistics(path1) == null); + CatalogTableStatistics tableStatistics = new CatalogTableStatistics(5, 2, 100, 575); + catalog.alterTableStatistics(path1, tableStatistics, false); + CatalogTestUtil.checkEquals(tableStatistics, catalog.getTableStatistics(path1)); + CatalogColumnStatistics columnStatistics = new CatalogColumnStatistics(20L, 2L, 40.5, 17, 45, 15); + catalog.alterTableColumnStatistics(path1, columnStatistics, false); + CatalogTestUtil.checkEquals(columnStatistics, catalog.getTableColumnStatistics(path1)); + + // Partition related + catalog.createDatabase(db2, createDb(), false); + GenericCatalogTable table2 = createPartitionedTable(); + catalog.createTable(path2, table2, false); + CatalogPartitionSpec partitionSpec = createPartitionSpec(); + catalog.createPartition(path2, partitionSpec, createPartition(), false); + assertTrue(catalog.getPartitionStatistics(path2, partitionSpec) == null); + assertTrue(catalog.getPartitionColumnStatistics(path2, partitionSpec) == null); + catalog.alterPartitionStatistics(path2, partitionSpec, tableStatistics, false); + CatalogTestUtil.checkEquals(tableStatistics, catalog.getPartitionStatistics(path2, partitionSpec)); + catalog.alterPartitionColumnStatistics(path2, partitionSpec, columnStatistics, false); + CatalogTestUtil.checkEquals(columnStatistics, catalog.getPartitionColumnStatistics(path2, partitionSpec)); + + // Clean up Review comment: nit: we can remove these cleanups as they are already in close() This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279504357 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogColumnStatistics.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog; + +import java.util.HashMap; +import java.util.Map; + +/** + * Column statistics a non-partitioned table or a partition of a partitioned table. + */ +public class CatalogColumnStatistics { + + /** +* number of distinct values. +*/ + private final Long ndv; + + /** +* number of nulls. +*/ + private final Long nullCount; + + /** +* average length of column values. +*/ + private final Double avgLen; + + /** +* max length of column values. +*/ + private final Integer maxLen; + + /** +* max value of column values. +*/ + private final Number max; Review comment: do we need to make min/max type `Object`? For string and date columns, the min/max would be String and Date, it's always been a bit tricky on stats definition. Hive handles stats for different typed columns in a different way by having individual classes for different typed stats, e.g. DecimalColumnStatsData, StringColumnStatsData, and DateColumnStatsData. I'm ok to leave this as-is for now, but want to bring up the potential issues and the need for a solution soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279499373 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -246,6 +259,16 @@ public void renameTable(ObjectPath tablePath, String newTableName, boolean ignor if (partitions.containsKey(tablePath)) { partitions.put(newPath, partitions.remove(tablePath)); } + Review comment: do we need to relocate partition stats and partition column stats too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs
bowenli86 commented on a change in pull request #8314: [FLINK-12365][table] Add stats related catalog APIs URL: https://github.com/apache/flink/pull/8314#discussion_r279498588 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -224,6 +235,8 @@ public void dropTable(ObjectPath tablePath, boolean ignoreIfNotExists) throws Ta tables.remove(tablePath); Review comment: do we need to also remove table stats and column stats of the table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…
bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste… URL: https://github.com/apache/flink/pull/8312#discussion_r279492143 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java ## @@ -416,19 +430,38 @@ public void alterPartition(ObjectPath tablePath, CatalogPartitionSpec partitionS } @Override - public List listPartitions(ObjectPath tablePath) - throws TableNotExistException, TableNotPartitionedException, CatalogException { + public List listAllPartitions(ObjectPath tablePath) Review comment: can we revert it to `listPartitions()`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…
bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste… URL: https://github.com/apache/flink/pull/8312#discussion_r279495627 ## File path: flink-table/flink-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java ## @@ -731,32 +727,36 @@ public void testDropPartition() throws Exception { catalog.createTable(path1, createPartitionedTable(), false); catalog.createPartition(path1, createPartitionSpec(), createPartition(), false); - assertEquals(Arrays.asList(createPartitionSpec()), catalog.listPartitions(path1)); + assertEquals(Arrays.asList(createPartitionSpec()), catalog.listAllPartitions(path1)); catalog.dropPartition(path1, createPartitionSpec(), false); - assertEquals(Arrays.asList(), catalog.listPartitions(path1)); + assertEquals(Arrays.asList(), catalog.listAllPartitions(path1)); } @Test public void testDropPartition_TableNotExistException() throws Exception { Review comment: We need to update the unit test names to reflect what case the test is actually about. E.g. this UT can be renamed to something like `testDropParition_tableNotExist()` or `testDropParition_ParitionNotExistException_tableNotExist()`. Same thing for the following changed unit tests in this file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste…
bowenli86 commented on a change in pull request #8312: [FLINK-12366][table] Clean up Catalog APIs to make them more consiste… URL: https://github.com/apache/flink/pull/8312#discussion_r279494274 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/exceptions/PartitionSpecInvalidException.java ## @@ -29,15 +29,14 @@ * when the size of PartitionSpec is 'n' but its keys don't match the first 'n' keys in partition key list. */ public class PartitionSpecInvalidException extends Exception { - private static final String MSG = "PartitionSpec %s does not match partition keys %s of table %s in catalog %s."; + private static final String MSG = "PartitionSpec %s does not match the partition keys of table %s in catalog %s."; Review comment: how about adding back the partition keys to give use adequate information of why the exception is thrown? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers
[ https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829614#comment-16829614 ] Zhenqiu Huang commented on FLINK-12342: --- !flink-1.4.png! !flink-1.6.png! [~till.rohrmann] This is the same job needs (256 containers). If runs on Flink 1.4, it only totally acquired 257 containers. But if run on Flink 1.6, it will acquire much large number of containers. The allocation of each container will goes to very slow, thus cause issue of deployment job with more than 10 minutes which is unacceptable for streaming job use cases. > Yarn Resource Manager Acquires Too Many Containers > -- > > Key: FLINK-12342 > URL: https://issues.apache.org/jira/browse/FLINK-12342 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Environment: We runs job in Flink release 1.6.3. >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2019-04-29 at 12.06.23 AM.png, > container.log, flink-1.4.png, flink-1.6.png > > Time Spent: 10m > Remaining Estimate: 0h > > In currently implementation of YarnFlinkResourceManager, it starts to acquire > new container one by one when get request from SlotManager. The mechanism > works when job is still, say less than 32 containers. If the job has 256 > container, containers can't be immediately allocated and appending requests > in AMRMClient will be not removed accordingly. We observe the situation that > AMRMClient ask for current pending request + 1 (the new request from slot > manager) containers. In this way, during the start time of such job, it asked > for 4000+ containers. If there is an external dependency issue happens, for > example hdfs access is slow. Then, the whole job will be blocked without > getting enough resource and finally killed with SlotManager request timeout. > Thus, we should use the total number of container asked rather than pending > request in AMRMClient as threshold to make decision whether we need to add > one more resource request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)