[GitHub] [flink] flinkbot edited a comment on pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12420:
URL: https://github.com/apache/flink/pull/12420#issuecomment-636504607


   
   ## CI report:
   
   * 6c369fc8eb4709738b70d9fe065c1ac088e181d5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2495)
 
   * d0f0b15cc5289803cdbde65b26bc66f0542da5f1 UNKNOWN
   * 19b463b5eac22be3a724e9437fd0be0d9b3d5d3a UNKNOWN
   * 2b6feb97e452779487c38f13c260aeb0a6e3f5c7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on pull request #12460: [FLINK-18063][checkpointing] Fix the race condition for aborting current checkpoint in CheckpointBarrierUnaligner

2020-06-06 Thread GitBox


zhijiangW commented on pull request #12460:
URL: https://github.com/apache/flink/pull/12460#issuecomment-640161331


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12420:
URL: https://github.com/apache/flink/pull/12420#issuecomment-636504607


   
   ## CI report:
   
   * 6c369fc8eb4709738b70d9fe065c1ac088e181d5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2495)
 
   * d0f0b15cc5289803cdbde65b26bc66f0542da5f1 UNKNOWN
   * 19b463b5eac22be3a724e9437fd0be0d9b3d5d3a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18145) Segment optimization does not work in blink ?

2020-06-06 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127505#comment-17127505
 ] 

Benchao Li commented on FLINK-18145:


[~hehuiyuan] If you want to use subgraph optimization, you should use 
`TableEnvironment`, instead of `StreamTableEnvironment`.

> Segment optimization does not work in blink ?
> -
>
> Key: FLINK-18145
> URL: https://issues.apache.org/jira/browse/FLINK-18145
> Project: Flink
>  Issue Type: Wish
>  Components: Table SQL / Planner
>Reporter: hehuiyuan
>Priority: Minor
> Attachments: image-2020-06-05-14-56-01-710.png, 
> image-2020-06-05-14-56-48-625.png, image-2020-06-05-14-57-11-287.png
>
>
> DAG Segement Optimization: 
>  
> !image-2020-06-05-14-56-01-710.png|width=762,height=264!
> Code:
> {code:java}
>   StreamExecutionEnvironment env = EnvUtil.getEnv();
> env.setParallelism(1);
>   env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
>   EnvironmentSettings bsSettings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
>   StreamTableEnvironment tableEnv = 
> StreamTableEnvironment.create(env,bsSettings);
>   GeneratorTableSource tableSource = new GeneratorTableSource(2, 1, 70, 0);
>   tableEnv.registerTableSource("myTble",tableSource);
>   Table mytable = tableEnv.scan("myTble");
>   mytable.printSchema();
>   tableEnv.toAppendStream(mytable,Row.class).addSink(new 
> PrintSinkFunction<>()).setParallelism(2);
>   Table tableproc = tableEnv.sqlQuery("SELECT key, count(rowtime_string) as 
> countkey,TUMBLE_START(proctime, INTERVAL '30' SECOND) as tumblestart FROM 
> myTble group by TUMBLE(proctime, INTERVAL '30' SECOND) ,key");
>   tableproc.printSchema();
>   tableEnv.registerTable("t4",tableproc);
>   Table table = tableEnv.sqlQuery("SELECT key,count(rowtime_string) as 
> countkey,TUMBLE_START(proctime,  INTERVAL '24' HOUR) as tumblestart FROM 
> myTble group by TUMBLE(proctime,  INTERVAL '24' HOUR) ,key");
>   table.printSchema();
>   tableEnv.registerTable("t3",table);
>   String[] fields = new String[]{"key","countkey","tumblestart"};
>  TypeInformation[] fieldsType = new TypeInformation[3];
> fieldsType[0] = Types.INT;
> fieldsType[1] = Types.LONG;
>   fieldsType[2] = Types.SQL_TIMESTAMP;
>   PrintTableUpsertSink printTableSink = new 
> PrintTableUpsertSink(fields,fieldsType,true);
> tableEnv.registerTableSink("inserttable",printTableSink);
> tableEnv.sqlUpdate("insert into inserttable  select key,countkey,tumblestart 
> from t3");
>   String[] fieldsproc = new String[]{"key","countkey","tumblestart"};
>   TypeInformation[] fieldsTypeproc = new TypeInformation[3];
>   fieldsTypeproc[0] = Types.INT;
>   fieldsTypeproc[1] = Types.LONG;
>   fieldsTypeproc[2] = Types.SQL_TIMESTAMP;
>   PrintTableUpsertSink printTableSinkproc = new 
> PrintTableUpsertSink(fieldsproc,fieldsTypeproc,true);
>   tableEnv.registerTableSink("inserttableproc",printTableSinkproc);
>   tableEnv.sqlUpdate("insert into inserttableproc  select 
> key,countkey,tumblestart from t4");
> {code}
> I have a custom  table source , then
>     (1) transform datastream to use `toAppendStream` method   , then  sink
>     (2) use tumble ,then sink
>     (3) use another tumbel ,then sink
> but the segement optimization did't work.
>  
> !image-2020-06-05-14-57-11-287.png|width=546,height=388!  
>  
> *The source is executed by 3 threads  and generate duplicate data for 3 times*
>  
> !image-2020-06-05-14-56-48-625.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12420:
URL: https://github.com/apache/flink/pull/12420#issuecomment-636504607


   
   ## CI report:
   
   * 6c369fc8eb4709738b70d9fe065c1ac088e181d5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2495)
 
   * d0f0b15cc5289803cdbde65b26bc66f0542da5f1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436322260



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -22,37 +22,35 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Joins are a common and well-understood operation in batch data processing to 
connect the rows of two relations. However, the semantics of joins on [dynamic 
tables](dynamic_tables.html) are much less obvious or even confusing.
+Join 在批数据处理中是比较常见且广为人知的运算,一般用于连接两张关系表。然而在[动态表](dynamic_tables.html)中 Join 
的语义会更难以理解甚至让人困惑。
 
-Because of that, there are a couple of ways to actually perform a join using 
either Table API or SQL.
+因而,Flink 提供了几种基于 Table API 和 SQL 的 Join 方法。
 
-For more information regarding the syntax, please check the join sections in 
[Table API](../tableApi.html#joins) and [SQL]({{ site.baseurl 
}}/dev/table/sql/queries.html#joins).
+欲获取更多关于 Join 语法的细节,请参考 [Table API](../tableApi.html#joins) 和 [SQL]({{ 
site.baseurl }}/dev/table/sql/queries.html#joins) 中的 Join 章节。
 
 * This will be replaced by the TOC
 {:toc}
 
-Regular Joins
+常规 Join
 -
 
-Regular joins are the most generic type of join in which any new records or 
changes to either side of the join input are visible and are affecting the 
whole join result.
-For example, if there is a new record on the left side, it will be joined with 
all of the previous and future records on the right side.
+常规 Join 是最常用的 Join 用法。在常规 Join 中,任何新记录或对 Join 两侧的表的任何更改都是可见的,并会影响最终整个 Join 
的结果。例如,如果 Join 左侧插入了一条新的记录,那么它将会与 Join 右侧过去与将来的所有记录一起合并查询。

Review comment:
   改成了“所有记录进行 Join 运算”,不知是否妥当。





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436322054



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -189,50 +183,42 @@ val result = orders
 
 
 
-**Note**: State retention defined in a [query 
configuration](query_configuration.html) is not yet implemented for temporal 
joins.
-This means that the required state to compute the query result might grow 
infinitely depending on the number of distinct primary keys for the history 
table.
+**注意**: 临时 Join中的 State 保留(在 [查询配置](query_configuration.html) 
中定义)还未实现。这意味着计算的查询结果所需的状态可能会无限增长,具体数量取决于历史记录表的不重复主键个数。
 
-### Processing-time Temporal Joins
+### 基于 Processing-time 临时 Join
 
-With a processing-time time attribute, it is impossible to pass _past_ time 
attributes as an argument to the temporal table function.
-By definition, it is always the current timestamp. Thus, invocations of a 
processing-time temporal table function will always return the latest known 
versions of the underlying table
-and any updates in the underlying history table will also immediately 
overwrite the current values.
+如果将 processing-time 作为时间属性,将无法将 _past_ 时间属性作为参数传递给临时表函数。
+根据定义,processing-time 总会是当前时间戳。因此,基于 processing-time 
的临时表函数将始终返回基础表的最新已知版本,时态表函数的调用将始终返回基础表的最新已知版本,并且基础历史表中的任何更新也将立即覆盖当前值。
 
-Only the latest versions (with respect to the defined primary key) of the 
build side records are kept in the state.
-Updates of the build side will have no effect on previously emitted join 
results.
+只有最新版本的构建侧记录(是否最新由所定义的主键所决定)会被保存在 state 中。
+构建侧的更新不会对之前 Join 的结果产生影响。
 
-One can think about a processing-time temporal join as a simple `HashMap` that stores all of the records from the build side.
-When a new record from the build side has the same key as some previous 
record, the old value is just simply overwritten.
-Every record from the probe side is always evaluated against the most 
recent/current state of the `HashMap`.
+可以将 processing-time 的临时 Join 视作简单的哈希Map `HashMap `,HashMap 中存储来自构建侧的所有记录。
+当来自构建侧的新插入的记录与旧值具有相同的 Key 时,旧值仅会被覆盖。
+探针侧的每条记录将总会根据 `HashMap` 的最新/当前状态来计算。
 
-### Event-time Temporal Joins
+### 基于 Event-time 临时 Join
 
-With an event-time time attribute (i.e., a rowtime attribute), it is possible 
to pass _past_ time attributes to the temporal table function.
-This allows for joining the two tables at a common point in time.
+将 event-time 作为时间属性时,可将 _past_ 时间属性作为参数传递给临时表函数。
+这允许对两个表中在相同时间点的记录执行 Join 操作。
 
-Compared to processing-time temporal joins, the temporal table does not only 
keep the latest version (with respect to the defined primary key) of the build 
side records in the state
-but stores all versions (identified by time) since the last watermark.
+与基于 processing-time 的临时 Join 相比,临时表不仅将构建侧记录的最新版本(是否最新由所定义的主键所决定)保存在 state 
中,同时也会存储自上一个水印以来的所有版本(按时间区分)。
 
-For example, an incoming row with an event-time timestamp of `12:30:00` that 
is appended to the probe side table
-is joined with the version of the build side table at time `12:30:00` 
according to the [concept of temporal tables](temporal_tables.html).
-Thus, the incoming row is only joined with rows that have a timestamp lower or 
equal to `12:30:00` with
-applied updates according to the primary key until this point in time.
+例如,在探针侧表新插入一条 event-time 时间为 `12:30:00` 的记录,它将和构建侧表时间点为 `12:30:00` 
的版本根据[临时表的概念](temporal_tables.html)进行 Join 运算。
+因此,新插入的记录仅与时间戳小于等于 `12:30:00` 的记录进行 Join 计算(由主键决定哪些时间点的数据将参与计算)。
 
-By definition of event time, [watermarks]({{ site.baseurl 
}}/dev/event_time.html) allow the join operation to move
-forward in time and discard versions of the build table that are no longer 
necessary because no incoming row with
-lower or equal timestamp is expected.
+通过定义事件时间(event time),[watermarks]({{ site.baseurl }}/dev/event_time.html) 允许 
Join 运算不断向前滚动,丢弃不再需要的构建侧快照。因为不再需要时间戳更低或相等的记录。
 
-Join with a Temporal Table
+临时表 Join
 --
 
-A join with a temporal table joins an arbitrary table (left input/probe side) 
with a temporal table (right input/build side),
-i.e., an external dimension table that changes over time. Please check the 
corresponding page for more information about [temporal 
tables](temporal_tables.html#temporal-table).
+临时表 Join 意味着对任意表(左输入/探针侧)和一个临时表(右输入/构建侧)执行的 Join 
操作,即随时间变化的的扩展表。请参考相应的页面以获取更多有关[临时表](temporal_tables.html#temporal-table)的信息。
 
-Attention Users can not use arbitrary 
tables as a temporal table, but need to use a table backed by a 
`LookupableTableSource`. A `LookupableTableSource` can only be used for 
temporal join as a temporal table. See the page for more details about [how to 
define 
LookupableTableSource](../sourceSinks.html#defining-a-tablesource-with-lookupable).
+注意 不是任何表都能用作临时表,用户必须使用来自接口 
`LookupableTableSource` 的表。接口 `LookupableTableSource` 的实例只能作为临时表用于临时 Join 
。查看此页面获取更多关于[如何实现接口 
`LookupableTableSource`](../sourceSinks.html#defining-a-tablesource-with-lookupable)
 的详细内容。

Review comment:
   Fixed





[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436321416



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -1,5 +1,5 @@
 ---
-title: "Joins in Continuous Queries"
+title: "流上的 Join"

Review comment:
   
我看到这儿是这么翻译标题的,就沿用这个翻译了:https://github.com/apache/flink/blob/master/docs/dev/table/streaming/index.zh.md





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #10622: [FLINK-14364] [flink-formats] Add validation when format.allow-comments is used with format.ignore-parse-errors in CsvValidator

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #10622:
URL: https://github.com/apache/flink/pull/10622#issuecomment-567106606


   
   ## CI report:
   
   * d5427d4042db477fa0858e553928c506919778c9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2624)
 
   * c0906aff135a2e1cc76eb0bf0c6bab7458d573c3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2863)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320979



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -327,10 +313,10 @@ FROM table1 [AS ]
 ON table1.column-name1 = table2.column-name1
 {% endhighlight %}
 
-Currently, only support INNER JOIN and LEFT JOIN. The `FOR SYSTEM_TIME AS OF 
table1.proctime` should be followed after temporal table. `proctime` is a 
[processing time attribute](time_attributes.html#processing-time) of `table1`.
-This means that it takes a snapshot of the temporal table at processing time 
when joining every record from left table.
+目前只支持 INNER JOIN 和 LEFT JOIN,`FOR SYSTEM_TIME AS OF table1.proctime` 应位于临时表之后. 
`proctime` 是 `table1` 的 [processing time 
属性](time_attributes.html#processing-time).

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320945



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -189,50 +183,42 @@ val result = orders
 
 
 
-**Note**: State retention defined in a [query 
configuration](query_configuration.html) is not yet implemented for temporal 
joins.
-This means that the required state to compute the query result might grow 
infinitely depending on the number of distinct primary keys for the history 
table.
+**注意**: 临时 Join中的 State 保留(在 [查询配置](query_configuration.html) 
中定义)还未实现。这意味着计算的查询结果所需的状态可能会无限增长,具体数量取决于历史记录表的不重复主键个数。
 
-### Processing-time Temporal Joins
+### 基于 Processing-time 临时 Join
 
-With a processing-time time attribute, it is impossible to pass _past_ time 
attributes as an argument to the temporal table function.
-By definition, it is always the current timestamp. Thus, invocations of a 
processing-time temporal table function will always return the latest known 
versions of the underlying table
-and any updates in the underlying history table will also immediately 
overwrite the current values.
+如果将 processing-time 作为时间属性,将无法将 _past_ 时间属性作为参数传递给临时表函数。
+根据定义,processing-time 总会是当前时间戳。因此,基于 processing-time 
的临时表函数将始终返回基础表的最新已知版本,时态表函数的调用将始终返回基础表的最新已知版本,并且基础历史表中的任何更新也将立即覆盖当前值。
 
-Only the latest versions (with respect to the defined primary key) of the 
build side records are kept in the state.
-Updates of the build side will have no effect on previously emitted join 
results.
+只有最新版本的构建侧记录(是否最新由所定义的主键所决定)会被保存在 state 中。
+构建侧的更新不会对之前 Join 的结果产生影响。
 
-One can think about a processing-time temporal join as a simple `HashMap` that stores all of the records from the build side.
-When a new record from the build side has the same key as some previous 
record, the old value is just simply overwritten.
-Every record from the probe side is always evaluated against the most 
recent/current state of the `HashMap`.
+可以将 processing-time 的临时 Join 视作简单的哈希Map `HashMap `,HashMap 中存储来自构建侧的所有记录。
+当来自构建侧的新插入的记录与旧值具有相同的 Key 时,旧值仅会被覆盖。
+探针侧的每条记录将总会根据 `HashMap` 的最新/当前状态来计算。
 
-### Event-time Temporal Joins
+### 基于 Event-time 临时 Join
 
-With an event-time time attribute (i.e., a rowtime attribute), it is possible 
to pass _past_ time attributes to the temporal table function.
-This allows for joining the two tables at a common point in time.
+将 event-time 作为时间属性时,可将 _past_ 时间属性作为参数传递给临时表函数。
+这允许对两个表中在相同时间点的记录执行 Join 操作。
 
-Compared to processing-time temporal joins, the temporal table does not only 
keep the latest version (with respect to the defined primary key) of the build 
side records in the state
-but stores all versions (identified by time) since the last watermark.
+与基于 processing-time 的临时 Join 相比,临时表不仅将构建侧记录的最新版本(是否最新由所定义的主键所决定)保存在 state 
中,同时也会存储自上一个水印以来的所有版本(按时间区分)。
 
-For example, an incoming row with an event-time timestamp of `12:30:00` that 
is appended to the probe side table
-is joined with the version of the build side table at time `12:30:00` 
according to the [concept of temporal tables](temporal_tables.html).
-Thus, the incoming row is only joined with rows that have a timestamp lower or 
equal to `12:30:00` with
-applied updates according to the primary key until this point in time.
+例如,在探针侧表新插入一条 event-time 时间为 `12:30:00` 的记录,它将和构建侧表时间点为 `12:30:00` 
的版本根据[临时表的概念](temporal_tables.html)进行 Join 运算。
+因此,新插入的记录仅与时间戳小于等于 `12:30:00` 的记录进行 Join 计算(由主键决定哪些时间点的数据将参与计算)。
 
-By definition of event time, [watermarks]({{ site.baseurl 
}}/dev/event_time.html) allow the join operation to move
-forward in time and discard versions of the build table that are no longer 
necessary because no incoming row with
-lower or equal timestamp is expected.
+通过定义事件时间(event time),[watermarks]({{ site.baseurl }}/dev/event_time.html) 允许 
Join 运算不断向前滚动,丢弃不再需要的构建侧快照。因为不再需要时间戳更低或相等的记录。

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320339



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -189,50 +183,42 @@ val result = orders
 
 
 
-**Note**: State retention defined in a [query 
configuration](query_configuration.html) is not yet implemented for temporal 
joins.
-This means that the required state to compute the query result might grow 
infinitely depending on the number of distinct primary keys for the history 
table.
+**注意**: 临时 Join中的 State 保留(在 [查询配置](query_configuration.html) 
中定义)还未实现。这意味着计算的查询结果所需的状态可能会无限增长,具体数量取决于历史记录表的不重复主键个数。
 
-### Processing-time Temporal Joins
+### 基于 Processing-time 临时 Join
 
-With a processing-time time attribute, it is impossible to pass _past_ time 
attributes as an argument to the temporal table function.
-By definition, it is always the current timestamp. Thus, invocations of a 
processing-time temporal table function will always return the latest known 
versions of the underlying table
-and any updates in the underlying history table will also immediately 
overwrite the current values.
+如果将 processing-time 作为时间属性,将无法将 _past_ 时间属性作为参数传递给临时表函数。
+根据定义,processing-time 总会是当前时间戳。因此,基于 processing-time 
的临时表函数将始终返回基础表的最新已知版本,时态表函数的调用将始终返回基础表的最新已知版本,并且基础历史表中的任何更新也将立即覆盖当前值。
 
-Only the latest versions (with respect to the defined primary key) of the 
build side records are kept in the state.
-Updates of the build side will have no effect on previously emitted join 
results.
+只有最新版本的构建侧记录(是否最新由所定义的主键所决定)会被保存在 state 中。
+构建侧的更新不会对之前 Join 的结果产生影响。
 
-One can think about a processing-time temporal join as a simple `HashMap` that stores all of the records from the build side.
-When a new record from the build side has the same key as some previous 
record, the old value is just simply overwritten.
-Every record from the probe side is always evaluated against the most 
recent/current state of the `HashMap`.
+可以将 processing-time 的临时 Join 视作简单的哈希Map `HashMap `,HashMap 中存储来自构建侧的所有记录。
+当来自构建侧的新插入的记录与旧值具有相同的 Key 时,旧值仅会被覆盖。

Review comment:
   确实会通顺些,已fix





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320356



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -189,50 +183,42 @@ val result = orders
 
 
 
-**Note**: State retention defined in a [query 
configuration](query_configuration.html) is not yet implemented for temporal 
joins.
-This means that the required state to compute the query result might grow 
infinitely depending on the number of distinct primary keys for the history 
table.
+**注意**: 临时 Join中的 State 保留(在 [查询配置](query_configuration.html) 
中定义)还未实现。这意味着计算的查询结果所需的状态可能会无限增长,具体数量取决于历史记录表的不重复主键个数。
 
-### Processing-time Temporal Joins
+### 基于 Processing-time 临时 Join
 
-With a processing-time time attribute, it is impossible to pass _past_ time 
attributes as an argument to the temporal table function.
-By definition, it is always the current timestamp. Thus, invocations of a 
processing-time temporal table function will always return the latest known 
versions of the underlying table
-and any updates in the underlying history table will also immediately 
overwrite the current values.
+如果将 processing-time 作为时间属性,将无法将 _past_ 时间属性作为参数传递给临时表函数。
+根据定义,processing-time 总会是当前时间戳。因此,基于 processing-time 
的临时表函数将始终返回基础表的最新已知版本,时态表函数的调用将始终返回基础表的最新已知版本,并且基础历史表中的任何更新也将立即覆盖当前值。
 
-Only the latest versions (with respect to the defined primary key) of the 
build side records are kept in the state.
-Updates of the build side will have no effect on previously emitted join 
results.
+只有最新版本的构建侧记录(是否最新由所定义的主键所决定)会被保存在 state 中。
+构建侧的更新不会对之前 Join 的结果产生影响。
 
-One can think about a processing-time temporal join as a simple `HashMap` that stores all of the records from the build side.
-When a new record from the build side has the same key as some previous 
record, the old value is just simply overwritten.
-Every record from the probe side is always evaluated against the most 
recent/current state of the `HashMap`.
+可以将 processing-time 的临时 Join 视作简单的哈希Map `HashMap `,HashMap 中存储来自构建侧的所有记录。
+当来自构建侧的新插入的记录与旧值具有相同的 Key 时,旧值仅会被覆盖。
+探针侧的每条记录将总会根据 `HashMap` 的最新/当前状态来计算。
 
-### Event-time Temporal Joins
+### 基于 Event-time 临时 Join
 
-With an event-time time attribute (i.e., a rowtime attribute), it is possible 
to pass _past_ time attributes to the temporal table function.
-This allows for joining the two tables at a common point in time.
+将 event-time 作为时间属性时,可将 _past_ 时间属性作为参数传递给临时表函数。
+这允许对两个表中在相同时间点的记录执行 Join 操作。

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12444: [FLINK-18067][yarn] Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12444:
URL: https://github.com/apache/flink/pull/12444#issuecomment-637596755


   
   ## CI report:
   
   * 16ec8b694dc07564b09b60040d20b0da9c0e80c4 UNKNOWN
   * b2e066ffdb1333b0b14aacc5955e4d0841a57096 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2690)
 
   * 09eb5249403896c0892c3681f2d8325055a59c97 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2862)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320317



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -22,37 +22,35 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Joins are a common and well-understood operation in batch data processing to 
connect the rows of two relations. However, the semantics of joins on [dynamic 
tables](dynamic_tables.html) are much less obvious or even confusing.
+Join 在批数据处理中是比较常见且广为人知的运算,一般用于连接两张关系表。然而在[动态表](dynamic_tables.html)中 Join 
的语义会更难以理解甚至让人困惑。
 
-Because of that, there are a couple of ways to actually perform a join using 
either Table API or SQL.
+因而,Flink 提供了几种基于 Table API 和 SQL 的 Join 方法。
 
-For more information regarding the syntax, please check the join sections in 
[Table API](../tableApi.html#joins) and [SQL]({{ site.baseurl 
}}/dev/table/sql/queries.html#joins).
+欲获取更多关于 Join 语法的细节,请参考 [Table API](../tableApi.html#joins) 和 [SQL]({{ 
site.baseurl }}/dev/table/sql/queries.html#joins) 中的 Join 章节。
 
 * This will be replaced by the TOC
 {:toc}
 
-Regular Joins
+常规 Join
 -
 
-Regular joins are the most generic type of join in which any new records or 
changes to either side of the join input are visible and are affecting the 
whole join result.
-For example, if there is a new record on the left side, it will be joined with 
all of the previous and future records on the right side.
+常规 Join 是最常用的 Join 用法。在常规 Join 中,任何新记录或对 Join 两侧的表的任何更改都是可见的,并会影响最终整个 Join 
的结果。例如,如果 Join 左侧插入了一条新的记录,那么它将会与 Join 右侧过去与将来的所有记录一起合并查询。
 
 {% highlight sql %}
 SELECT * FROM Orders
 INNER JOIN Product
 ON Orders.productId = Product.id
 {% endhighlight %}
 
-These semantics allow for any kind of updating (insert, update, delete) input 
tables.
+上述语意允许对输入表进行任意类型的更新操作(insert, update, delete)。
 
-However, this operation has an important implication: it requires to keep both 
sides of the join input in Flink's state forever.
-Thus, the resource usage will grow indefinitely as well, if one or both input 
tables are continuously growing.
+然而,常规 Join 隐含了一个重要的前提:即它需要在 Flink 的状态中永久保存 Join 两侧的数据。
+因而,如果 Join 操作中的一方或双方输入表持续增长的话,资源消耗也将会随之无限增长。

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320268



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -22,37 +22,35 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Joins are a common and well-understood operation in batch data processing to 
connect the rows of two relations. However, the semantics of joins on [dynamic 
tables](dynamic_tables.html) are much less obvious or even confusing.
+Join 在批数据处理中是比较常见且广为人知的运算,一般用于连接两张关系表。然而在[动态表](dynamic_tables.html)中 Join 
的语义会更难以理解甚至让人困惑。

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] authuir commented on a change in pull request #12420: [FLINK-16085][docs] Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-06-06 Thread GitBox


authuir commented on a change in pull request #12420:
URL: https://github.com/apache/flink/pull/12420#discussion_r436320273



##
File path: docs/dev/table/streaming/joins.zh.md
##
@@ -22,37 +22,35 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Joins are a common and well-understood operation in batch data processing to 
connect the rows of two relations. However, the semantics of joins on [dynamic 
tables](dynamic_tables.html) are much less obvious or even confusing.
+Join 在批数据处理中是比较常见且广为人知的运算,一般用于连接两张关系表。然而在[动态表](dynamic_tables.html)中 Join 
的语义会更难以理解甚至让人困惑。
 
-Because of that, there are a couple of ways to actually perform a join using 
either Table API or SQL.
+因而,Flink 提供了几种基于 Table API 和 SQL 的 Join 方法。
 
-For more information regarding the syntax, please check the join sections in 
[Table API](../tableApi.html#joins) and [SQL]({{ site.baseurl 
}}/dev/table/sql/queries.html#joins).
+欲获取更多关于 Join 语法的细节,请参考 [Table API](../tableApi.html#joins) 和 [SQL]({{ 
site.baseurl }}/dev/table/sql/queries.html#joins) 中的 Join 章节。

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #10622: [FLINK-14364] [flink-formats] Add validation when format.allow-comments is used with format.ignore-parse-errors in CsvValidator

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #10622:
URL: https://github.com/apache/flink/pull/10622#issuecomment-567106606


   
   ## CI report:
   
   * d5427d4042db477fa0858e553928c506919778c9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2624)
 
   * c0906aff135a2e1cc76eb0bf0c6bab7458d573c3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-18164) null <> 'str' should be true

2020-06-06 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127225#comment-17127225
 ] 

Benchao Li edited comment on FLINK-18164 at 6/7/20, 3:05 AM:
-

[~ykt836] I've checked current implementation, we indeed returns null for this 
case. For example, `SELECT null <> null` returns `null`.

However, in where clause, we don't return this value to user, we must evaluate 
it to `true` or `false`.


was (Author: libenchao):
[~ykt836] I've checked current implementation, we indeed returns null for this 
case. For example, `SELECT null <> null` returns `null`.

However, if where clause, we don't return this value to user, we must evaluate 
it to `true` or `false`.

> null <> 'str' should be true
> 
>
> Key: FLINK-18164
> URL: https://issues.apache.org/jira/browse/FLINK-18164
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>
> Currently, if we compare null with other literals, the result will always be 
> false.
> It's because the code gen always gives a default value (false) for the 
> result. And I think it's a bug if `null <> 'str'` is false.
> It's reported from user-zh: 
> http://apache-flink.147419.n8.nabble.com/flink-sql-null-false-td3640.html
> CC [~jark] [~ykt836]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12444: [FLINK-18067][yarn] Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12444:
URL: https://github.com/apache/flink/pull/12444#issuecomment-637596755


   
   ## CI report:
   
   * 16ec8b694dc07564b09b60040d20b0da9c0e80c4 UNKNOWN
   * b2e066ffdb1333b0b14aacc5955e4d0841a57096 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2690)
 
   * 09eb5249403896c0892c3681f2d8325055a59c97 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong commented on pull request #12444: [FLINK-18067][yarn] Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


xintongsong commented on pull request #12444:
URL: https://github.com/apache/flink/pull/12444#issuecomment-640148683


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong edited a comment on pull request #12444: [FLINK-18067][yarn] Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


xintongsong edited a comment on pull request #12444:
URL: https://github.com/apache/flink/pull/12444#issuecomment-640148564


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong commented on pull request #12444: [FLINK-18067][yarn] Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


xintongsong commented on pull request #12444:
URL: https://github.com/apache/flink/pull/12444#issuecomment-640148564


   @flinkbot run azur



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Jiayi-Liao commented on pull request #12444: [FLINK-18067][runtime] Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


Jiayi-Liao commented on pull request #12444:
URL: https://github.com/apache/flink/pull/12444#issuecomment-640147147


   @xintongsong Thanks for the review. I've addressed your comments and the PR 
should be okay now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong commented on a change in pull request #12444: [FLINK-18067]Change default value of yarnMinAllocationMB from zero to DEFAULT_RM_S…

2020-06-06 Thread GitBox


xintongsong commented on a change in pull request #12444:
URL: https://github.com/apache/flink/pull/12444#discussion_r436315438



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -508,7 +508,11 @@ public void killCluster(ApplicationId applicationId) 
throws FlinkException {
throw new YarnDeploymentException("Could not retrieve 
information about free cluster resources.", e);
}
 
-   final int yarnMinAllocationMB = 
yarnConfiguration.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 
0);
+   final int yarnMinAllocationMB = yarnConfiguration.getInt(
+   
YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
+   
YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
+   Preconditions.checkArgument(yarnMinAllocationMB > 0, "The 
minimum allocation memory "
+   + "(" + yarnMinAllocationMB + " MB) configured 
via 'yarn.scheduler.minimum-allocation-mb' should be greater than 0.");

Review comment:
   The string literal can be replaced with 
`YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB`.

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -508,7 +508,9 @@ public void killCluster(ApplicationId applicationId) throws 
FlinkException {
throw new YarnDeploymentException("Could not retrieve 
information about free cluster resources.", e);
}
 
-   final int yarnMinAllocationMB = 
yarnConfiguration.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 
0);
+   final int yarnMinAllocationMB = yarnConfiguration.getInt(

Review comment:
   I think it would be good to always try to throw an exception with the 
most descriptive type if possible. IMO, this should have higher priority than 
keeping the codes concise.
   
   In this case, I would suggest to throw either a `YarnDeploymentException` or 
a `IllegalConfigurationException`, rather than `IllegalStateException` that 
`Preconditions.checkArgument` throws.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18067) Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor

2020-06-06 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-18067:
-
Fix Version/s: 1.12.0

> Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor
> --
>
> Key: FLINK-18067
> URL: https://issues.apache.org/jira/browse/FLINK-18067
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.10.1
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently Flink sets {{yarnMinAllocationMB}} default value to 0, which will 
> crash the job in normalizing the allocation memory. There should be two minor 
> changes after discussion with [~fly_in_gis] in 
> https://github.com/apache/flink/pull/11176#discussion_r433629963:
> * Make the default value to 
> YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB, not 0
> * Add pre-check for yarnMinAllocationMB, it should be greater than 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18067) Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor

2020-06-06 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127467#comment-17127467
 ] 

Xintong Song commented on FLINK-18067:
--

Thanks for working on this, [~wind_ljy].
I've assigned this ticket to you. Please move its status to "In-Progress".

> Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor
> --
>
> Key: FLINK-18067
> URL: https://issues.apache.org/jira/browse/FLINK-18067
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.10.1
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Minor
>  Labels: pull-request-available
>
> Currently Flink sets {{yarnMinAllocationMB}} default value to 0, which will 
> crash the job in normalizing the allocation memory. There should be two minor 
> changes after discussion with [~fly_in_gis] in 
> https://github.com/apache/flink/pull/11176#discussion_r433629963:
> * Make the default value to 
> YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB, not 0
> * Add pre-check for yarnMinAllocationMB, it should be greater than 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18067) Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor

2020-06-06 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-18067:


Assignee: Xintong Song

> Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor
> --
>
> Key: FLINK-18067
> URL: https://issues.apache.org/jira/browse/FLINK-18067
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.10.1
>Reporter: Jiayi Liao
>Assignee: Xintong Song
>Priority: Minor
>  Labels: pull-request-available
>
> Currently Flink sets {{yarnMinAllocationMB}} default value to 0, which will 
> crash the job in normalizing the allocation memory. There should be two minor 
> changes after discussion with [~fly_in_gis] in 
> https://github.com/apache/flink/pull/11176#discussion_r433629963:
> * Make the default value to 
> YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB, not 0
> * Add pre-check for yarnMinAllocationMB, it should be greater than 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18067) Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor

2020-06-06 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-18067:


Assignee: Jiayi Liao  (was: Xintong Song)

> Invalid default value for yarnMinAllocationMB in YarnClusterDescriptor
> --
>
> Key: FLINK-18067
> URL: https://issues.apache.org/jira/browse/FLINK-18067
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.10.1
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Minor
>  Labels: pull-request-available
>
> Currently Flink sets {{yarnMinAllocationMB}} default value to 0, which will 
> crash the job in normalizing the allocation memory. There should be two minor 
> changes after discussion with [~fly_in_gis] in 
> https://github.com/apache/flink/pull/11176#discussion_r433629963:
> * Make the default value to 
> YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB, not 0
> * Add pre-check for yarnMinAllocationMB, it should be greater than 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17560) No Slots available exception in Apache Flink Job Manager while Scheduling

2020-06-06 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127464#comment-17127464
 ] 

Xintong Song commented on FLINK-17560:
--

bq. Anybody reported this issue before.
Not that I'm aware of.

Have you tried an official Flink release? Can this problem be reproduced?

> No Slots available exception in Apache Flink Job Manager while Scheduling
> -
>
> Key: FLINK-17560
> URL: https://issues.apache.org/jira/browse/FLINK-17560
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.3
> Environment: Flink verson 1.8.3
> Session cluster
>Reporter: josson paul kalapparambath
>Priority: Major
> Attachments: jobmgr.log, threaddump-tm.txt, tm.log
>
>
> Set up
> --
> Flink verson 1.8.3
> Zookeeper HA cluster
> 1 ResourceManager/Dispatcher (Same Node)
> 1 TaskManager
> 4 pipelines running with various parallelism's
> Issue
> --
> Occationally when the Job Manager gets restarted we noticed that all the 
> pipelines are not getting scheduled. The error that is reporeted by the Job 
> Manger is 'not enough slots are available'. This should not be the case 
> because task manager was deployed with sufficient slots for the number of 
> pipelines/parallelism we have.
> We further noticed that the slot report sent by the taskmanger contains solts 
> filled with old CANCELLED job Ids. I am not sure why the task manager still 
> holds the details of the old jobs. Thread dump on the task manager confirms 
> that old pipelines are not running.
> I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is 
> not the issue happening in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wanglijie95 commented on a change in pull request #12427: [FLINK-16681][jdbc] Fix the bug that jdbc lost connection after a lon…

2020-06-06 Thread GitBox


wanglijie95 commented on a change in pull request #12427:
URL: https://github.com/apache/flink/pull/12427#discussion_r436310785



##
File path: 
flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/internal/executor/InsertOrUpdateJdbcExecutor.java
##
@@ -83,6 +88,21 @@ public void open(Connection connection) throws SQLException {
updateStatement = connection.prepareStatement(updateSQL);
}
 
+   @Override
+   public void reopen(Connection connection) throws SQLException {
+   try {
+   existStatement.close();
+   insertStatement.close();
+   updateStatement.close();
+   } catch (SQLException e) {
+   LOG.info("PreparedStatement close failed.", e);
+   }
+
+   existStatement = connection.prepareStatement(existSQL);
+   insertStatement = connection.prepareStatement(insertSQL);
+   updateStatement = connection.prepareStatement(updateSQL);

Review comment:
   > Then I suggest to call them `prepareStatements(Connection)` and 
`closeStatements(Connection)`.
   
   OK, I will rename them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17560) No Slots available exception in Apache Flink Job Manager while Scheduling

2020-06-06 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127426#comment-17127426
 ] 

josson paul kalapparambath commented on FLINK-17560:


[~xintongsong] [~chesnay]

We are using customized Flink. I do have modification in the JobManager 
scheduler code. I fixed the ConcurrentModificationException which was happening 
in the Job Manager code. But the original issue still happens. 

If you see my above message (also I have attached the thread dump), you can see 
that Tasks are getting stuck for ever in the JVM and the 'finally' method for 
those 'Tasks' are never called. Because of this, the slots will never go into 
'FREE' state. 

Anybody reported this issue before. This happens only if number of threads in 
Task manger is around 950. 

 

> No Slots available exception in Apache Flink Job Manager while Scheduling
> -
>
> Key: FLINK-17560
> URL: https://issues.apache.org/jira/browse/FLINK-17560
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.3
> Environment: Flink verson 1.8.3
> Session cluster
>Reporter: josson paul kalapparambath
>Priority: Major
> Attachments: jobmgr.log, threaddump-tm.txt, tm.log
>
>
> Set up
> --
> Flink verson 1.8.3
> Zookeeper HA cluster
> 1 ResourceManager/Dispatcher (Same Node)
> 1 TaskManager
> 4 pipelines running with various parallelism's
> Issue
> --
> Occationally when the Job Manager gets restarted we noticed that all the 
> pipelines are not getting scheduled. The error that is reporeted by the Job 
> Manger is 'not enough slots are available'. This should not be the case 
> because task manager was deployed with sufficient slots for the number of 
> pipelines/parallelism we have.
> We further noticed that the slot report sent by the taskmanger contains solts 
> filled with old CANCELLED job Ids. I am not sure why the task manager still 
> holds the details of the old jobs. Thread dump on the task manager confirms 
> that old pipelines are not running.
> I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is 
> not the issue happening in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18165) When savingpoint is restored, select the checkpoint directory and stateBackend

2020-06-06 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127403#comment-17127403
 ] 

Yu Li commented on FLINK-18165:
---

ps. to be explicit, once the binary format of savepoint is unified, we could 
switch state backend easily through savepoint.

> When savingpoint is restored, select the checkpoint directory and stateBackend
> --
>
> Key: FLINK-18165
> URL: https://issues.apache.org/jira/browse/FLINK-18165
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.9.0
> Environment: flink 1.9
>Reporter: Xinyuan Liu
>Priority: Major
>
> If the checkpoint file is used as the initial state of the savepoint startup, 
> it must be ensured that the state backend used before and after is the same 
> type, but in actual production, there will be more and more state, the 
> taskmanager memory is insufficient and the cluster cannot be expanded, and 
> the state backend needs to be switched at this time. And there is a need to 
> ensure data consistency. Unfortunately, currently flink does not provide an 
> elegant way to switch state backend, can the community consider this proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18165) When savingpoint is restored, select the checkpoint directory and stateBackend

2020-06-06 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127402#comment-17127402
 ] 

Yu Li commented on FLINK-18165:
---

There's already a proposal of unifying the binary format of savepoint for 
different backends, more details please refer to 
[FLIP-41|https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State]

Not sure but maybe we could revive the work of FLIP-41 in 1.12.0 release cycle? 
What do you think? [~klion26] [~tzulitai] Thanks.

> When savingpoint is restored, select the checkpoint directory and stateBackend
> --
>
> Key: FLINK-18165
> URL: https://issues.apache.org/jira/browse/FLINK-18165
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.9.0
> Environment: flink 1.9
>Reporter: Xinyuan Liu
>Priority: Major
>
> If the checkpoint file is used as the initial state of the savepoint startup, 
> it must be ensured that the state backend used before and after is the same 
> type, but in actual production, there will be more and more state, the 
> taskmanager memory is insufficient and the cluster cannot be expanded, and 
> the state backend needs to be switched at this time. And there is a need to 
> ensure data consistency. Unfortunately, currently flink does not provide an 
> elegant way to switch state backend, can the community consider this proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-web] t-eimizu opened a new pull request #346: Create Japanese Edition.

2020-06-06 Thread GitBox


t-eimizu opened a new pull request #346:
URL: https://github.com/apache/flink-web/pull/346


   A Japanese version of this page has been created. There are still a few more 
pages in English, but I will continue to translate them into Japanese.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18167) Flink Job hangs there when one vertex is failed and another is cancelled.

2020-06-06 Thread Jeff Zhang (Jira)
Jeff Zhang created FLINK-18167:
--

 Summary: Flink Job hangs there when one vertex is failed and 
another is cancelled. 
 Key: FLINK-18167
 URL: https://issues.apache.org/jira/browse/FLINK-18167
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.10.0
Reporter: Jeff Zhang
 Attachments: image-2020-06-06-15-39-35-441.png

After I call cancel with savepoint, the cancel operation is failed. The 
following is what I see in client side. 
{code:java}

WARN [2020-06-06 13:45:16,003] ({Thread-1241} JobManager.java[cancelJob]:137) - 
Fail to cancel job 7e5492f35c1a7f5dad7c805ba943ea52 that is associated with 
paragraph paragraph_1586733868269_783581378
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator 
is suspending.
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.zeppelin.flink.JobManager.cancelJob(JobManager.java:129)
at 
org.apache.zeppelin.flink.FlinkScalaInterpreter.cancel(FlinkScalaInterpreter.scala:648)
at 
org.apache.zeppelin.flink.FlinkInterpreter.cancel(FlinkInterpreter.java:101)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:119)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.lambda$cancel$1(RemoteInterpreterServer.java:800)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator 
is suspending.
at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$stopWithSavepoint$9(SchedulerBase.java:873)
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator 
is suspending.
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$triggerSynchronousSavepoint$0(CheckpointCoordinator.java:428)
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$1(CheckpointCoordinator.java:457)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at 

[jira] [Comment Edited] (FLINK-18091) Test Relocatable Savepoints

2020-06-06 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126778#comment-17126778
 ] 

Congxian Qiu(klion26) edited comment on FLINK-18091 at 6/6/20, 5:13 AM:


[~sewen] [~liyu] Do you have any other concerns about this issue, if not I'll 
close this issue, thanks.


was (Author: klion26):
I'll close this issue in 6.10(Wednesday) if it's ok to you. [~sewen] [~liyu] 

> Test Relocatable Savepoints
> ---
>
> Key: FLINK-18091
> URL: https://issues.apache.org/jira/browse/FLINK-18091
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Stephan Ewen
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> The test should do the following:
>  * take a savepoint. needs to make sure the job has enough state that there 
> is more than just the "_metadata" file
>  * copy it to another directory
>  * start the job from that savepoint by addressing the metadata file and by 
> addressing the savepoint directory
> We should also test that an incremental checkpoint that gets moved fails with 
> a reasonable exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18091) Test Relocatable Savepoints

2020-06-06 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127260#comment-17127260
 ] 

Yu Li commented on FLINK-18091:
---

The tests and results LGTM, thanks for the efforts [~klion26].

> Test Relocatable Savepoints
> ---
>
> Key: FLINK-18091
> URL: https://issues.apache.org/jira/browse/FLINK-18091
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Stephan Ewen
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> The test should do the following:
>  * take a savepoint. needs to make sure the job has enough state that there 
> is more than just the "_metadata" file
>  * copy it to another directory
>  * start the job from that savepoint by addressing the metadata file and by 
> addressing the savepoint directory
> We should also test that an incremental checkpoint that gets moved fails with 
> a reasonable exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18164) null <> 'str' should be true

2020-06-06 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127200#comment-17127200
 ] 

Kurt Young commented on FLINK-18164:


It's also incorrect to return `true` in this case, I think we should return 
`null`

> null <> 'str' should be true
> 
>
> Key: FLINK-18164
> URL: https://issues.apache.org/jira/browse/FLINK-18164
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>
> Currently, if we compare null with other literals, the result will always be 
> false.
> It's because the code gen always gives a default value (false) for the 
> result. And I think it's a bug if `null <> 'str'` is false.
> It's reported from user-zh: 
> http://apache-flink.147419.n8.nabble.com/flink-sql-null-false-td3640.html
> CC [~jark] [~ykt836]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18044) Add the subtask index information to the SourceReaderContext.

2020-06-06 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated FLINK-18044:
-
Description: It is useful for the `SourceReader` to retrieve its subtask 
id. For example, Kafka readers can create a consumer with proper client id.  
(was: It is useful for the `SourceReader` to achieve its subtask id. For 
example, Kafka readers can create a consumer with proper client id.)

> Add the subtask index information to the SourceReaderContext.
> -
>
> Key: FLINK-18044
> URL: https://issues.apache.org/jira/browse/FLINK-18044
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Reporter: Jiangjie Qin
>Priority: Major
>
> It is useful for the `SourceReader` to retrieve its subtask id. For example, 
> Kafka readers can create a consumer with proper client id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18164) null <> 'str' should be true

2020-06-06 Thread Benchao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benchao Li closed FLINK-18164.
--
Resolution: Invalid

> null <> 'str' should be true
> 
>
> Key: FLINK-18164
> URL: https://issues.apache.org/jira/browse/FLINK-18164
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>
> Currently, if we compare null with other literals, the result will always be 
> false.
> It's because the code gen always gives a default value (false) for the 
> result. And I think it's a bug if `null <> 'str'` is false.
> It's reported from user-zh: 
> http://apache-flink.147419.n8.nabble.com/flink-sql-null-false-td3640.html
> CC [~jark] [~ykt836]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (FLINK-16087) Translate "Detecting Patterns" page of "Streaming Concepts" into Chinese

2020-06-06 Thread Steven Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Fan updated FLINK-16087:
---
Comment: was deleted

(was: [~jark]  i want to  translate this page. can you assign it to me ? thx)

> Translate "Detecting Patterns" page of "Streaming Concepts" into Chinese 
> -
>
> Key: FLINK-16087
> URL: https://issues.apache.org/jira/browse/FLINK-16087
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Jark Wu
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/streaming/match_recognize.html
> The markdown file is located in 
> {{flink/docs/dev/table/streaming/match_recognize.zh.md}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18164) null <> 'str' should be true

2020-06-06 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127225#comment-17127225
 ] 

Benchao Li commented on FLINK-18164:


[~ykt836] I've checked current implementation, we indeed returns null for this 
case. For example, `SELECT null <> null` returns `null`.

However, if where clause, we don't return this value to user, we must evaluate 
it to `true` or `false`.

> null <> 'str' should be true
> 
>
> Key: FLINK-18164
> URL: https://issues.apache.org/jira/browse/FLINK-18164
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>
> Currently, if we compare null with other literals, the result will always be 
> false.
> It's because the code gen always gives a default value (false) for the 
> result. And I think it's a bug if `null <> 'str'` is false.
> It's reported from user-zh: 
> http://apache-flink.147419.n8.nabble.com/flink-sql-null-false-td3640.html
> CC [~jark] [~ykt836]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17384) Support reading hbase conf dir from flink-conf.yaml

2020-06-06 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127257#comment-17127257
 ] 

Yu Li commented on FLINK-17384:
---

I see, thanks for the clarification [~jackylau]

> Support reading hbase conf dir from flink-conf.yaml
> ---
>
> Key: FLINK-17384
> URL: https://issues.apache.org/jira/browse/FLINK-17384
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase, Deployment / Scripts
>Affects Versions: 1.10.0
>Reporter: jackylau
>Assignee: jackylau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently when using hbase connector with Flink SQL, below manual steps are 
> required:
>  # export HBASE_CONF_DIR
>  # add hbase libs to flink_lib(because the hbase connnector doesn't have 
> client's( and others) jar)
> And we should improve this.
> For 1) we could support read hbase conf dir from flink-conf.yaml just like 
> hadoop/yarn does.
> For 2) we should support HBASE_CLASSPATH in config.sh. In case of jar 
> conflicts such as guava, we also should support flink-hbase-shaded just like 
> hadoop does
> In this JIRA we focus on implementing the 1st proposal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11838) Create RecoverableWriter for GCS

2020-06-06 Thread Tomoyuki NAKAMURA (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127324#comment-17127324
 ] 

Tomoyuki NAKAMURA commented on FLINK-11838:
---

Is there any progress on this ticket?

> Create RecoverableWriter for GCS
> 
>
> Key: FLINK-11838
> URL: https://issues.apache.org/jira/browse/FLINK-11838
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.8.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GCS supports the resumable upload which we can use to create a Recoverable 
> writer similar to the S3 implementation:
> https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
> After using the Hadoop compatible interface: 
> https://github.com/apache/flink/pull/7519
> We've noticed that the current implementation relies heavily on the renaming 
> of the files on the commit: 
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
> This is suboptimal on an object store such as GCS. Therefore we would like to 
> implement a more GCS native RecoverableWriter 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18112) Single Task Failure Recovery Prototype

2020-06-06 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127206#comment-17127206
 ] 

Congxian Qiu(klion26) commented on FLINK-18112:
---

[~ym] Thanks for creating the ticket, I think this feature can be very helpful 
for some scenarios. I'm interested in it. Does there any public documentation 
for this feature? Thanks

> Single Task Failure Recovery Prototype
> --
>
> Key: FLINK-18112
> URL: https://issues.apache.org/jira/browse/FLINK-18112
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Coordination, Runtime 
> / Network
>Affects Versions: 1.12.0
>Reporter: Yuan Mei
>Assignee: Yuan Mei
>Priority: Major
> Fix For: 1.12.0
>
>
> Build a prototype of single task failure recovery to address and answer the 
> following questions:
> *Step 1*: Scheduling part, restart a single node without restarting the 
> upstream or downstream nodes.
> *Step 2*: Checkpointing part, as my understanding of how regional failover 
> works, this part might not need modification.
> *Step 3*: Network part
>   - how the recovered node able to link to the upstream ResultPartitions, and 
> continue getting data
>   - how the downstream node able to link to the recovered node, and continue 
> getting node
>   - how different netty transit mode affects the results
>   - what if the failed node buffered data pool is full
> *Step 4*: Failover process verification



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18166) JAVA_HOME is not read from .bashrc when start flink

2020-06-06 Thread appleyuchi (Jira)
appleyuchi created FLINK-18166:
--

 Summary: JAVA_HOME is not read from .bashrc when start flink
 Key: FLINK-18166
 URL: https://issues.apache.org/jira/browse/FLINK-18166
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.10.1
Reporter: appleyuchi


when I start flink

I got error

Please specify JAVA_HOME. Either in Flink config ./conf/flink-conf.yaml or as 
system-wide JAVA_HOME.

But I have already written:

export JAVA_HOME=~/Java/jdk1.8.0_131

in .bashrc and open a New Terminal to Fresh it.

Why my Flink can not recognise the variable set in environment ?

 

Thanks for your help~!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17560) No Slots available exception in Apache Flink Job Manager while Scheduling

2020-06-06 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127203#comment-17127203
 ] 

Xintong Song commented on FLINK-17560:
--

[~josson],

Sorry for the late response. I've no idea how I overlooked the update 
notification on this ticket.

As [~chesnay] already mentioned, there's a {{ConcurrentModificationException}} 
on the JM side when it tries to accept the slots offered by the TM.

This is not any known issue that I'm aware of. The error stack seems not 
matching the code base of Flink 1.8.3. So I have the same question, is this a 
customized Flink version?

> No Slots available exception in Apache Flink Job Manager while Scheduling
> -
>
> Key: FLINK-17560
> URL: https://issues.apache.org/jira/browse/FLINK-17560
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.3
> Environment: Flink verson 1.8.3
> Session cluster
>Reporter: josson paul kalapparambath
>Priority: Major
> Attachments: jobmgr.log, threaddump-tm.txt, tm.log
>
>
> Set up
> --
> Flink verson 1.8.3
> Zookeeper HA cluster
> 1 ResourceManager/Dispatcher (Same Node)
> 1 TaskManager
> 4 pipelines running with various parallelism's
> Issue
> --
> Occationally when the Job Manager gets restarted we noticed that all the 
> pipelines are not getting scheduled. The error that is reporeted by the Job 
> Manger is 'not enough slots are available'. This should not be the case 
> because task manager was deployed with sufficient slots for the number of 
> pipelines/parallelism we have.
> We further noticed that the slot report sent by the taskmanger contains solts 
> filled with old CANCELLED job Ids. I am not sure why the task manager still 
> holds the details of the old jobs. Thread dump on the task manager confirms 
> that old pipelines are not running.
> I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is 
> not the issue happening in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18162) AddSplitEvents should serialize the splits into bytes.

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18162:
---
Labels: pull-request-available  (was: )

> AddSplitEvents should serialize the splits into bytes.
> --
>
> Key: FLINK-18162
> URL: https://issues.apache.org/jira/browse/FLINK-18162
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> {{AddSplitsEvent}} is a serializable at the moment which contains a list of 
> splits. However, the {{SourceSplit}} is not a subclass of {{Serializable}}. 
> We need to serialize the splits in the {{AddSplitsEvent}} using 
> SplitSerializer before sending it over the wire and deserialize the splits on 
> the reader side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18165) When savingpoint is restored, select the checkpoint directory and stateBackend

2020-06-06 Thread Xinyuan Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyuan Liu updated FLINK-18165:

Issue Type: Improvement  (was: New Feature)

> When savingpoint is restored, select the checkpoint directory and stateBackend
> --
>
> Key: FLINK-18165
> URL: https://issues.apache.org/jira/browse/FLINK-18165
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.9.0
> Environment: flink 1.9
>Reporter: Xinyuan Liu
>Priority: Major
>
> If the checkpoint file is used as the initial state of the savepoint startup, 
> it must be ensured that the state backend used before and after is the same 
> type, but in actual production, there will be more and more state, the 
> taskmanager memory is insufficient and the cluster cannot be expanded, and 
> the state backend needs to be switched at this time. And there is a need to 
> ensure data consistency. Unfortunately, currently flink does not provide an 
> elegant way to switch state backend, can the community consider this proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18165) When savingpoint is restored, select the checkpoint directory and stateBackend

2020-06-06 Thread Xinyuan Liu (Jira)
Xinyuan Liu created FLINK-18165:
---

 Summary: When savingpoint is restored, select the checkpoint 
directory and stateBackend
 Key: FLINK-18165
 URL: https://issues.apache.org/jira/browse/FLINK-18165
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.9.0
 Environment: flink 1.9
Reporter: Xinyuan Liu


If the checkpoint file is used as the initial state of the savepoint startup, 
it must be ensured that the state backend used before and after is the same 
type, but in actual production, there will be more and more state, the 
taskmanager memory is insufficient and the cluster cannot be expanded, and the 
state backend needs to be switched at this time. And there is a need to ensure 
data consistency. Unfortunately, currently flink does not provide an elegant 
way to switch state backend, can the community consider this proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhijiangW removed a comment on pull request #12460: [FLINK-18063][checkpointing] Fix the race condition for aborting current checkpoint in CheckpointBarrierUnaligner

2020-06-06 Thread GitBox


zhijiangW removed a comment on pull request #12460:
URL: https://github.com/apache/flink/pull/12460#issuecomment-638963043


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW merged pull request #12506: [FLINK-18139][checkpointing] Fixing unaligned checkpoints checks wrong channels for inflight data.

2020-06-06 Thread GitBox


zhijiangW merged pull request #12506:
URL: https://github.com/apache/flink/pull/12506


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW removed a comment on pull request #12460: [FLINK-18063][checkpointing] Fix the race condition for aborting current checkpoint in CheckpointBarrierUnaligner

2020-06-06 Thread GitBox


zhijiangW removed a comment on pull request #12460:
URL: https://github.com/apache/flink/pull/12460#issuecomment-638773131


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on pull request #12460: [FLINK-18063][checkpointing] Fix the race condition for aborting current checkpoint in CheckpointBarrierUnaligner

2020-06-06 Thread GitBox


zhijiangW commented on pull request #12460:
URL: https://github.com/apache/flink/pull/12460#issuecomment-640070985


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12256:
URL: https://github.com/apache/flink/pull/12256#issuecomment-631025695


   
   ## CI report:
   
   * eba9c31264c2a0fe9f5bf955ba21de081bd1aa25 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2858)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12508: [hotfix][state] Add error message to precondition in KeyGroupPartitio…

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12508:
URL: https://github.com/apache/flink/pull/12508#issuecomment-639985108


   
   ## CI report:
   
   * 8953e51d845b3a61608d6e08f9caf462e43775bc Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2856)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12256:
URL: https://github.com/apache/flink/pull/12256#issuecomment-631025695


   
   ## CI report:
   
   * f181cf943442db6173eba4c817f40bbab49538da Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2857)
 
   * eba9c31264c2a0fe9f5bf955ba21de081bd1aa25 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2858)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12509: update]e

2020-06-06 Thread GitBox


flinkbot commented on pull request #12509:
URL: https://github.com/apache/flink/pull/12509#issuecomment-640004515


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 5e1a5ae7e7c1142322b564b39b52508a795c25dd (Sat Jun 06 
07:34:53 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **Invalid pull request title: No valid Jira ID provided**
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TeresaTyne closed pull request #12509: update]e

2020-06-06 Thread GitBox


TeresaTyne closed pull request #12509:
URL: https://github.com/apache/flink/pull/12509


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TeresaTyne opened a new pull request #12509: update]e

2020-06-06 Thread GitBox


TeresaTyne opened a new pull request #12509:
URL: https://github.com/apache/flink/pull/12509


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-06-06 Thread GitBox


flinkbot edited a comment on pull request #12256:
URL: https://github.com/apache/flink/pull/12256#issuecomment-631025695


   
   ## CI report:
   
   * ca05fc241f6fd0550a39612b4ddef86b63a304a6 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2625)
 
   * f181cf943442db6173eba4c817f40bbab49538da Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2857)
 
   * eba9c31264c2a0fe9f5bf955ba21de081bd1aa25 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2858)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org