[GitHub] [flink] wuchong commented on a change in pull request #14029: [FLINK-20084][planner] Fix NPE when generating watermark for record w…

2020-11-10 Thread GitBox


wuchong commented on a change in pull request #14029:
URL: https://github.com/apache/flink/pull/14029#discussion_r521176325



##
File path: 
flink-table/flink-table-planner-blink/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushWatermarkIntoTableSourceScanRuleBase.java
##
@@ -180,7 +180,10 @@ public DefaultWatermarkGenerator(
@Override
public void onEvent(RowData event, long eventTimestamp, 
WatermarkOutput output) {
try {
-   currentWatermark = 
innerWatermarkGenerator.currentWatermark(event);
+   Long watermark = 
innerWatermarkGenerator.currentWatermark(event);
+   if (watermark != null) {
+   currentWatermark = 
innerWatermarkGenerator.currentWatermark(event);

Review comment:
   Do not calculate it again. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-11-10 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229818#comment-17229818
 ] 

Rui Li commented on FLINK-19630:


Yeah I think documentation is all we can do at the moment.
And BTW, instead of local patch, I think simply using the 
{{flink-sql-connector-hive-2.2.0}} uber jar as your dependency (download link 
can be found 
[here|https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/#dependencies])
 may also solve the problem.

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.12.0
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14029: [FLINK-20084][planner] Fix NPE when generating watermark for record w…

2020-11-10 Thread GitBox


flinkbot edited a comment on pull request #14029:
URL: https://github.com/apache/flink/pull/14029#issuecomment-725258730


   
   ## CI report:
   
   * a128055b2b0d97ce6673363c7885c97b17ba6e07 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9447)
 
   * 820895764bfe9e83648c470a53f19fe232ce6518 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-15906) physical memory exceeded causing being killed by yarn

2020-11-10 Thread yang gang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229817#comment-17229817
 ] 

yang gang commented on FLINK-15906:
---

{code:java}
Closing TaskExecutor connection container_1597847003686_0079_01_000121. 
Because: Container 
[pid=4269,containerID=container_1597847003686_0079_01_000121] is running beyond 
physical memory limits. Current usage: 20.0 GB of 20 GB physical memory used; 
24.9 GB of 100 GB virtual memory used. Killing container.
Dump of the process-tree for container_1597847003686_0079_01_000121 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) 
VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 4298 4269 4269 4269 (java) 104835705 33430931 26634625024 5242644 
/usr/local/jdk1.8/bin/java -Xmx10871635848 -Xms10871635848 
-XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -server 
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=75 -XX:ParallelGCThreads=4 
-XX:+AlwaysPreTouch -XX:+UseCMSCompactAtFullCollection 
-XX:CMSFullGCsBeforeCompaction=3 -DjobName=ck_local_growthline_new-10 
-Dlog.file=/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.log
 -Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=1073741824b -D 
taskmanager.memory.network.min=1073741824b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=8053063800b -D taskmanager.cpu.cores=10.0 -D 
taskmanager.memory.task.heap.size=10737418120b -D 
taskmanager.memory.task.off-heap.size=0b --configDir . 
-Djobmanager.rpc.address={address} -Dweb.port=0 
-Dweb.tmpdir=/tmp/flink-web-0874be2a-720d-443c-a069-0bb1fad69433 
-Djobmanager.rpc.port=36047 -Drest.address={address}
|- 4269 4267 4269 4269 (bash) 0 0 115904512 359 /bin/bash -c 
/usr/local/jdk1.8/bin/java -Xmx10871635848 -Xms10871635848 
-XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -server 
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=75 -XX:ParallelGCThreads=4 
-XX:+AlwaysPreTouch -XX:+UseCMSCompactAtFullCollection 
-XX:CMSFullGCsBeforeCompaction=3 -DjobName=ck_local_growthline_new-10 
-Dlog.file=/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.log
 -Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=1073741824b -D 
taskmanager.memory.network.min=1073741824b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=8053063800b -D taskmanager.cpu.cores=10.0 -D 
taskmanager.memory.task.heap.size=10737418120b -D 
taskmanager.memory.task.off-heap.size=0b --configDir . 
-Djobmanager.rpc.address={address} -Dweb.port='0' 
-Dweb.tmpdir='/tmp/flink-web-0874be2a-720d-443c-a069-0bb1fad69433' 
-Djobmanager.rpc.port='36047' -Drest.address={address} 1> 
/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.out
 2> 
/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.err
{code}
[~xintongsong] I have also encountered this kind of problem. This is a task of 
calculating DAU indicators. But this exception does not happen frequently. I 
have observed the memory metrics and logs of  this task, but have not found 
useful information, so I would like to ask you how to solve this problem?

> physical memory exceeded causing being killed by yarn
> -
>
> Key: FLINK-15906
> URL: https://issues.apache.org/jira/browse/FLINK-15906
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Reporter: liupengcheng
>Priority: Major
>
> Recently, we encoutered this issue when testing TPCDS query with 100g data. 
> I first meet this issue when I only set the 
> `taskmanager.memory.total-process.size` to `4g` with `-tm` option. Then I try 
> to increase the jvmOverhead size with following arguments, but still failed.
> {code:java}
> taskmanager.memory.jvm-overhead.min: 640m
> taskmanager.memory.jvm-metaspace: 128m
> taskmanager.memory.task.heap.size: 1408m
> taskmanager.memory.framework.heap.size: 128m
> taskmanager.memory.framework.off-heap.size: 128m
> taskmanager.memory.managed.size: 1408m
> taskmanager.memory.shuffle.max: 256m
> {code}
> {code:java}
> java.lang.Exception: [2020-02-05 11:31:32.345]Container 
> [pid=101677,containerID=container_e08_1578903621081_4785_01_51] is 
> running 46342144B beyond the 'PHYSICAL' memory limit. Current usage: 4.04 GB 
> of 4 GB 

[GitHub] [flink] Shawn-Hx commented on pull request #13664: [FLINK-19673] Translate "Standalone Cluster" page into Chinese

2020-11-10 Thread GitBox


Shawn-Hx commented on pull request #13664:
URL: https://github.com/apache/flink/pull/13664#issuecomment-725266494


   @klion26 Thanks for your reminder. I have resolved the conflict. PTAL.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wuchong commented on pull request #14027: [FLINK-20074][table-planner-blink] Fix can't generate plan when joining on changelog source without updates

2020-11-10 Thread GitBox


wuchong commented on pull request #14027:
URL: https://github.com/apache/flink/pull/14027#issuecomment-725265512


   The changes on `JoinTest.xml` is a fix of UpdateKindTrait inference of Join 
operator. The streaming join operator just forward all the input records and 
can't drop UPDATE_BEFORE messages. Therefore, the plan is wrong for a long 
time. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19303) Disable WAL in RocksDB recovery

2020-11-10 Thread Juha Mynttinen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229815#comment-17229815
 ] 

Juha Mynttinen commented on FLINK-19303:


It's possible that a later RocksDB version (>5.17.2) has fixes related to this 
issue. Thus it makes sense to wait for the RocksDB bump before working on this 
issue. In FLINK-14482 there's an effort to bump RocksDB to 6.x.y.

> Disable WAL in RocksDB recovery
> ---
>
> Key: FLINK-19303
> URL: https://issues.apache.org/jira/browse/FLINK-19303
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Juha Mynttinen
>Assignee: Juha Mynttinen
>Priority: Major
>
> During recovery of {{RocksDBStateBackend}} the recovery mechanism puts the 
> key value pairs to local RocksDB instance(s). To speed up the process, the 
> recovery process uses RocskDB write batch mechanism. [RocksDB 
> WAL|https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log]  is enabled 
> during this process.
> During normal operations, i.e. when the state backend has been recovered and 
> the Flink application is running (on RocksDB state backend) WAL is disabled.
> The recovery process doesn't need WAL. In fact the recovery should be much 
> faster without WAL. Thus, WAL should be disabled in the recovery process.
> AFAIK the last thing that was done with WAL during recovery was an attempt to 
> remove it. Later that removal was removed because it causes stability issues 
> (https://issues.apache.org/jira/browse/FLINK-8922).
> Unfortunately the root cause why disabling WAL causes segfault during 
> recovery is unknown. After all, WAL is not used during normal operations.
> Potential explanation is some kind of bug in RocksDB write batch when using 
> WAL. It is possible later RocksDB versions have fixes / workarounds for the 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14482) Bump up rocksdb version

2020-11-10 Thread Juha Mynttinen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229814#comment-17229814
 ] 

Juha Mynttinen commented on FLINK-14482:


Hey, I'm trying to understand this JIRA. The RocksDB state backend that uses 
managed memory does use RocksDB write buffer manager (since Flink 1.10). The 
description of this JIRA is "Current rocksDB-5.17.2 does not support write 
buffer manager well". What's the actual issue is 5.17.2 with write buffer 
manager? I cabn't find more detailed explanation.

> Bump up rocksdb version
> ---
>
> Key: FLINK-14482
> URL: https://issues.apache.org/jira/browse/FLINK-14482
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> Current rocksDB-5.17.2 does not support write buffer manager well, we need to 
> bump rocksdb version to support that feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-20084) Fix NPE when generating watermark for record whose rowtime field is null after watermark push down

2020-11-10 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-20084:
---

Assignee: Shengkai Fang

> Fix NPE when generating watermark for record whose rowtime field is null 
> after watermark push down
> --
>
> Key: FLINK-20084
> URL: https://issues.apache.org/jira/browse/FLINK-20084
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.0
>Reporter: Shengkai Fang
>Assignee: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The problem is from the class 
> {{PushWatermarkIntoTableSourceScanRuleBase$DefaultWatermarkGeneratorSupplier$DefaultWatermarkGenerator#onEvent}}.
>  It doesn't test whether the calculated watermark is null before set.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20085) Remove RemoteFunctionStateMigrator code paths from StateFun

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20085:

Issue Type: Task  (was: Bug)

> Remove RemoteFunctionStateMigrator code paths from StateFun 
> 
>
> Key: FLINK-20085
> URL: https://issues.apache.org/jira/browse/FLINK-20085
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: statefun-2.3.0
>
>
> The {{RemoteFunctionStateMigrator}} was added to allow savepoints with 
> versions <= 2.1.0 to have a migration path for upgrading to versions >= 
> 2.2.0. The binary format of remote function state was changed due to 
> demultiplexed remote state, introduced in 2.2.0.
> With 2.2.0 already released with the new formats, it is now safe to fully 
> remove this migration path.
> For users, what this means that it would not be possible to directly upgrade 
> from 2.0.x / 2.1.x to 2.3.x+. They'd have to perform incremental upgrades via 
> 2.2.x, by restoring first with 2.2.x and then taking another savepoint, 
> before upgrading to 2.3.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20085) Remove RemoteFunctionStateMigrator code paths from StateFun

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-20085:
---

 Summary: Remove RemoteFunctionStateMigrator code paths from 
StateFun 
 Key: FLINK-20085
 URL: https://issues.apache.org/jira/browse/FLINK-20085
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
 Fix For: statefun-2.3.0


The {{RemoteFunctionStateMigrator}} was added to allow savepoints with versions 
<= 2.1.0 to have a migration path for upgrading to versions >= 2.2.0. The 
binary format of remote function state was changed due to demultiplexed remote 
state, introduced in 2.2.0.

With 2.2.0 already released with the new formats, it is now safe to fully 
remove this migration path.

For users, what this means that it would not be possible to directly upgrade 
from 2.0.x / 2.1.x to 2.3.x+. They'd have to perform incremental upgrades via 
2.2.x, by restoring first with 2.2.x and then taking another savepoint, before 
upgrading to 2.3.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19979) Sanity check after bash e2e tests for no leftover processes

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229811#comment-17229811
 ] 

Robert Metzger commented on FLINK-19979:


This is making good progress. I can probably open a PR in the next 24 hours.

> Sanity check after bash e2e tests for no leftover processes
> ---
>
> Key: FLINK-19979
> URL: https://issues.apache.org/jira/browse/FLINK-19979
> Project: Flink
>  Issue Type: New Feature
>  Components: Test Infrastructure
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
> Fix For: 1.12.0
>
>
> As seen in FLINK-19974, if an e2e test is not cleaning up properly, other e2e 
> tests might fail with difficult to diagnose issues.
> I propose to check that no leftover processes (including docker containers) 
> are running after each bash e2e test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] klion26 commented on pull request #13664: [FLINK-19673] Translate "Standalone Cluster" page into Chinese

2020-11-10 Thread GitBox


klion26 commented on pull request #13664:
URL: https://github.com/apache/flink/pull/13664#issuecomment-725259657


   @Shawn-Hx when I was to merge this pr, found that there is some conflict, 
could you please resolve the conflict, so that I can merge this. thanks and 
sorry for the inconvenience



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14029: [FLINK-20084][planner] Fix NPE when generating watermark for record w…

2020-11-10 Thread GitBox


flinkbot commented on pull request #14029:
URL: https://github.com/apache/flink/pull/14029#issuecomment-725258730


   
   ## CI report:
   
   * a128055b2b0d97ce6673363c7885c97b17ba6e07 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-16947) ArtifactResolutionException: Could not transfer artifact. Entry [...] has not been leased from this pool

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-16947:
--

Assignee: (was: Robert Metzger)

> ArtifactResolutionException: Could not transfer artifact.  Entry [...] has 
> not been leased from this pool
> -
>
> Key: FLINK-16947
> URL: https://issues.apache.org/jira/browse/FLINK-16947
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Piotr Nowojski
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6982=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> Build of flink-metrics-availability-test failed with:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (end-to-end-tests) 
> on project flink-metrics-availability-test: Unable to generate classpath: 
> org.apache.maven.artifact.resolver.ArtifactResolutionException: Could not 
> transfer artifact org.apache.maven.surefire:surefire-grouper:jar:2.22.1 
> from/to google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/): Entry 
> [id:13][route:{s}->https://maven-central-eu.storage-download.googleapis.com:443][state:null]
>  has not been leased from this pool
> [ERROR] org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] 
> [ERROR] from the specified remote repositories:
> [ERROR] google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/, 
> releases=true, snapshots=false),
> [ERROR] apache.snapshots (https://repository.apache.org/snapshots, 
> releases=false, snapshots=true)
> [ERROR] Path to dependency:
> [ERROR] 1) dummy:dummy:jar:1.0
> [ERROR] 2) org.apache.maven.surefire:surefire-junit47:jar:2.22.1
> [ERROR] 3) org.apache.maven.surefire:common-junit48:jar:2.22.1
> [ERROR] 4) org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-metrics-availability-test
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-20035.
--
Resolution: Fixed

> BlockingShuffleITCase unstable with "Could not start rest endpoint on any 
> port in port range 8081"
> --
>
> Key: FLINK-20035
> URL: https://issues.apache.org/jira/browse/FLINK-20035
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Yingjie Cao
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9178=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0
> {code}
> 2020-11-06T13:52:56.6369221Z [ERROR] 
> testBoundedBlockingShuffle(org.apache.flink.test.runtime.BlockingShuffleITCase)
>   Time elapsed: 3.522 s  <<< ERROR!
> 2020-11-06T13:52:56.6370005Z org.apache.flink.util.FlinkException: Could not 
> create the DispatcherResourceManagerComponent.
> 2020-11-06T13:52:56.6370649Z  at 
> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:257)
> 2020-11-06T13:52:56.6371371Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.createDispatcherResourceManagerComponents(MiniCluster.java:412)
> 2020-11-06T13:52:56.6372258Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.setupDispatcherResourceManagerComponents(MiniCluster.java:378)
> 2020-11-06T13:52:56.6373276Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:334)
> 2020-11-06T13:52:56.6374182Z  at 
> org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:50)
> 2020-11-06T13:52:56.6375055Z  at 
> org.apache.flink.test.runtime.BlockingShuffleITCase.testBoundedBlockingShuffle(BlockingShuffleITCase.java:53)
> 2020-11-06T13:52:56.6375787Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-11-06T13:52:56.6376546Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-11-06T13:52:56.6377514Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-11-06T13:52:56.6378008Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-11-06T13:52:56.6378774Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-11-06T13:52:56.6379350Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-11-06T13:52:56.6458094Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-11-06T13:52:56.6459047Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-11-06T13:52:56.6459678Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-11-06T13:52:56.6460182Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-11-06T13:52:56.6460770Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-11-06T13:52:56.6461210Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-11-06T13:52:56.6461649Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-11-06T13:52:56.6462089Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-11-06T13:52:56.6462736Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-11-06T13:52:56.6463286Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-11-06T13:52:56.6463728Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-11-06T13:52:56.6464344Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-11-06T13:52:56.6464918Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-11-06T13:52:56.6465428Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-11-06T13:52:56.6465915Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-11-06T13:52:56.6466405Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-11-06T13:52:56.6467050Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-11-06T13:52:56.6468341Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2020-11-06T13:52:56.6468794Z  at 
> 

[jira] [Commented] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229808#comment-17229808
 ] 

Robert Metzger commented on FLINK-20035:


Fixed in 
https://github.com/apache/flink/commit/0c36c842c3cb0c3aa75e7e94c56dce217ee548de.

> BlockingShuffleITCase unstable with "Could not start rest endpoint on any 
> port in port range 8081"
> --
>
> Key: FLINK-20035
> URL: https://issues.apache.org/jira/browse/FLINK-20035
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Yingjie Cao
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9178=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0
> {code}
> 2020-11-06T13:52:56.6369221Z [ERROR] 
> testBoundedBlockingShuffle(org.apache.flink.test.runtime.BlockingShuffleITCase)
>   Time elapsed: 3.522 s  <<< ERROR!
> 2020-11-06T13:52:56.6370005Z org.apache.flink.util.FlinkException: Could not 
> create the DispatcherResourceManagerComponent.
> 2020-11-06T13:52:56.6370649Z  at 
> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:257)
> 2020-11-06T13:52:56.6371371Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.createDispatcherResourceManagerComponents(MiniCluster.java:412)
> 2020-11-06T13:52:56.6372258Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.setupDispatcherResourceManagerComponents(MiniCluster.java:378)
> 2020-11-06T13:52:56.6373276Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:334)
> 2020-11-06T13:52:56.6374182Z  at 
> org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:50)
> 2020-11-06T13:52:56.6375055Z  at 
> org.apache.flink.test.runtime.BlockingShuffleITCase.testBoundedBlockingShuffle(BlockingShuffleITCase.java:53)
> 2020-11-06T13:52:56.6375787Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-11-06T13:52:56.6376546Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-11-06T13:52:56.6377514Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-11-06T13:52:56.6378008Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-11-06T13:52:56.6378774Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-11-06T13:52:56.6379350Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-11-06T13:52:56.6458094Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-11-06T13:52:56.6459047Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-11-06T13:52:56.6459678Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-11-06T13:52:56.6460182Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-11-06T13:52:56.6460770Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-11-06T13:52:56.6461210Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-11-06T13:52:56.6461649Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-11-06T13:52:56.6462089Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-11-06T13:52:56.6462736Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-11-06T13:52:56.6463286Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-11-06T13:52:56.6463728Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-11-06T13:52:56.6464344Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-11-06T13:52:56.6464918Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-11-06T13:52:56.6465428Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-11-06T13:52:56.6465915Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-11-06T13:52:56.6466405Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-11-06T13:52:56.6467050Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-11-06T13:52:56.6468341Z  at 
> 

[GitHub] [flink] rmetzger merged pull request #13992: [FLINK-20035][tests] Use random port for rest endpoint in BlockingShuffleITCase and ShuffleCompressionITCase

2020-11-10 Thread GitBox


rmetzger merged pull request #13992:
URL: https://github.com/apache/flink/pull/13992


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #13992: [FLINK-20035][tests] Use random port for rest endpoint in BlockingShuffleITCase and ShuffleCompressionITCase

2020-11-10 Thread GitBox


rmetzger commented on pull request #13992:
URL: https://github.com/apache/flink/pull/13992#issuecomment-725258225


   Merging ...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] godfreyhe commented on a change in pull request #14029: [FLINK-20084][planner] Fix NPE when generating watermark for record w…

2020-11-10 Thread GitBox


godfreyhe commented on a change in pull request #14029:
URL: https://github.com/apache/flink/pull/14029#discussion_r521164123



##
File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/sql/SourceWatermarkITCase.scala
##
@@ -39,7 +39,7 @@ class SourceWatermarkITCase extends StreamingTestBase{
   def testWatermarkWithNestedRow(): Unit = {
 val data = Seq(
   row(1, 2L, row("i1", row("i2", 
LocalDateTime.parse("2020-11-21T19:00:05.23",
-  row(2, 3L, row("j1", row("j2", 
LocalDateTime.parse("2020-11-21T21:00:05.23"
+  row(2, 3L, row("j1", row("j2", null)))

Review comment:
   instead of replacing the existing row, I tend to add a new row





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-16947) ArtifactResolutionException: Could not transfer artifact. Entry [...] has not been leased from this pool

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229806#comment-17229806
 ] 

Robert Metzger commented on FLINK-16947:


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9437=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529

> ArtifactResolutionException: Could not transfer artifact.  Entry [...] has 
> not been leased from this pool
> -
>
> Key: FLINK-16947
> URL: https://issues.apache.org/jira/browse/FLINK-16947
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Piotr Nowojski
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6982=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> Build of flink-metrics-availability-test failed with:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (end-to-end-tests) 
> on project flink-metrics-availability-test: Unable to generate classpath: 
> org.apache.maven.artifact.resolver.ArtifactResolutionException: Could not 
> transfer artifact org.apache.maven.surefire:surefire-grouper:jar:2.22.1 
> from/to google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/): Entry 
> [id:13][route:{s}->https://maven-central-eu.storage-download.googleapis.com:443][state:null]
>  has not been leased from this pool
> [ERROR] org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] 
> [ERROR] from the specified remote repositories:
> [ERROR] google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/, 
> releases=true, snapshots=false),
> [ERROR] apache.snapshots (https://repository.apache.org/snapshots, 
> releases=false, snapshots=true)
> [ERROR] Path to dependency:
> [ERROR] 1) dummy:dummy:jar:1.0
> [ERROR] 2) org.apache.maven.surefire:surefire-junit47:jar:2.22.1
> [ERROR] 3) org.apache.maven.surefire:common-junit48:jar:2.22.1
> [ERROR] 4) org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-metrics-availability-test
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14029: [FLINK-20084][planner] Fix NPE when generating watermark for record w…

2020-11-10 Thread GitBox


flinkbot commented on pull request #14029:
URL: https://github.com/apache/flink/pull/14029#issuecomment-725254026


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit a128055b2b0d97ce6673363c7885c97b17ba6e07 (Wed Nov 11 
07:21:54 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-20084).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19585) UnalignedCheckpointCompatibilityITCase.test:97->runAndTakeSavepoint: "Not all required tasks are currently running."

2020-11-10 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229805#comment-17229805
 ] 

Arvid Heise commented on FLINK-19585:
-

The last two reports of master are FLINK-20065 (AskTimeout). I guess the 
reopened one is more about a backport to 1.11. I'm waiting for the other ticket 
to be resolved though as it may be related to the fix of this ticket.

> UnalignedCheckpointCompatibilityITCase.test:97->runAndTakeSavepoint: "Not all 
> required tasks are currently running."
> 
>
> Key: FLINK-19585
> URL: https://issues.apache.org/jira/browse/FLINK-19585
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Arvid Heise
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=7419=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0
> {code}
> 2020-10-12T10:27:51.7667213Z [ERROR] Tests run: 4, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 13.146 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase
> 2020-10-12T10:27:51.7675454Z [ERROR] test[type: SAVEPOINT, startAligned: 
> false](org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase)
>   Time elapsed: 2.168 s  <<< ERROR!
> 2020-10-12T10:27:51.7676759Z java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Not all required 
> tasks are currently running.
> 2020-10-12T10:27:51.7686572Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-10-12T10:27:51.7688239Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-10-12T10:27:51.7689543Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase.runAndTakeSavepoint(UnalignedCheckpointCompatibilityITCase.java:113)
> 2020-10-12T10:27:51.7690681Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase.test(UnalignedCheckpointCompatibilityITCase.java:97)
> 2020-10-12T10:27:51.7691513Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-10-12T10:27:51.7692182Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-10-12T10:27:51.7692964Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-10-12T10:27:51.7693655Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-10-12T10:27:51.7694489Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-10-12T10:27:51.7707103Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-10-12T10:27:51.7729199Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-10-12T10:27:51.7730097Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-10-12T10:27:51.7730833Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-10-12T10:27:51.7731500Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-10-12T10:27:51.7732086Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-10-12T10:27:51.7732781Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-10-12T10:27:51.7733563Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-10-12T10:27:51.7734735Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-10-12T10:27:51.7735400Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-10-12T10:27:51.7736075Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-10-12T10:27:51.7736757Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-10-12T10:27:51.7737432Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-10-12T10:27:51.7738081Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-10-12T10:27:51.7739008Z  at 
> org.junit.runners.Suite.runChild(Suite.java:128)
> 2020-10-12T10:27:51.7739583Z  at 
> org.junit.runners.Suite.runChild(Suite.java:27)
> 2020-10-12T10:27:51.7740173Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-10-12T10:27:51.7740800Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-10-12T10:27:51.7741470Z  at 
> 

[jira] [Updated] (FLINK-20084) Fix NPE when generating watermark for record whose rowtime field is null after watermark push down

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20084:
---
Labels: pull-request-available  (was: )

> Fix NPE when generating watermark for record whose rowtime field is null 
> after watermark push down
> --
>
> Key: FLINK-20084
> URL: https://issues.apache.org/jira/browse/FLINK-20084
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.0
>Reporter: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The problem is from the class 
> {{PushWatermarkIntoTableSourceScanRuleBase$DefaultWatermarkGeneratorSupplier$DefaultWatermarkGenerator#onEvent}}.
>  It doesn't test whether the calculated watermark is null before set.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] fsk119 opened a new pull request #14029: [FLINK-20084][planner] Fix NPE when generating watermark for record w…

2020-11-10 Thread GitBox


fsk119 opened a new pull request #14029:
URL: https://github.com/apache/flink/pull/14029


   …hose rowtime field is null after watermark push down
   
   
   
   ## What is the purpose of the change
   
   *Fix a bug.*
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - add IT Case.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20068) KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229803#comment-17229803
 ] 

Robert Metzger commented on FLINK-20068:


Thanks a lot for the explanation and fix!

> KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results
> -
>
> Key: FLINK-20068
> URL: https://issues.apache.org/jira/browse/FLINK-20068
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Jiangjie Qin
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9365=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5
> {code}
> 2020-11-10T00:14:22.7658242Z [ERROR] 
> testTopicPatternSubscriber(org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest)
>   Time elapsed: 0.012 s  <<< FAILURE!
> 2020-11-10T00:14:22.7659838Z java.lang.AssertionError: 
> expected:<[pattern-topic-5, pattern-topic-4, pattern-topic-7, 
> pattern-topic-6, pattern-topic-9, pattern-topic-8, pattern-topic-1, 
> pattern-topic-0, pattern-topic-3]> but was:<[]>
> 2020-11-10T00:14:22.7660740Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-10T00:14:22.7661245Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-11-10T00:14:22.7661788Z  at 
> org.junit.Assert.assertEquals(Assert.java:118)
> 2020-11-10T00:14:22.7662312Z  at 
> org.junit.Assert.assertEquals(Assert.java:144)
> 2020-11-10T00:14:22.7663051Z  at 
> org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest.testTopicPatternSubscriber(KafkaSubscriberTest.java:94)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-20068) KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin reassigned FLINK-20068:


Assignee: Jiangjie Qin

> KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results
> -
>
> Key: FLINK-20068
> URL: https://issues.apache.org/jira/browse/FLINK-20068
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Jiangjie Qin
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9365=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5
> {code}
> 2020-11-10T00:14:22.7658242Z [ERROR] 
> testTopicPatternSubscriber(org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest)
>   Time elapsed: 0.012 s  <<< FAILURE!
> 2020-11-10T00:14:22.7659838Z java.lang.AssertionError: 
> expected:<[pattern-topic-5, pattern-topic-4, pattern-topic-7, 
> pattern-topic-6, pattern-topic-9, pattern-topic-8, pattern-topic-1, 
> pattern-topic-0, pattern-topic-3]> but was:<[]>
> 2020-11-10T00:14:22.7660740Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-10T00:14:22.7661245Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-11-10T00:14:22.7661788Z  at 
> org.junit.Assert.assertEquals(Assert.java:118)
> 2020-11-10T00:14:22.7662312Z  at 
> org.junit.Assert.assertEquals(Assert.java:144)
> 2020-11-10T00:14:22.7663051Z  at 
> org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest.testTopicPatternSubscriber(KafkaSubscriberTest.java:94)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-20068) KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin resolved FLINK-20068.
--
Fix Version/s: 1.11.3
   Resolution: Fixed

Merged to master.
cacb4c1fb1d6123d0aeb93d550ffcafa6ad4a8fb

Cherry-picked to 1.11:
18e4c1ba5b66c704b3ea7e5ba8fb7f3499207903

> KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results
> -
>
> Key: FLINK-20068
> URL: https://issues.apache.org/jira/browse/FLINK-20068
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9365=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5
> {code}
> 2020-11-10T00:14:22.7658242Z [ERROR] 
> testTopicPatternSubscriber(org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest)
>   Time elapsed: 0.012 s  <<< FAILURE!
> 2020-11-10T00:14:22.7659838Z java.lang.AssertionError: 
> expected:<[pattern-topic-5, pattern-topic-4, pattern-topic-7, 
> pattern-topic-6, pattern-topic-9, pattern-topic-8, pattern-topic-1, 
> pattern-topic-0, pattern-topic-3]> but was:<[]>
> 2020-11-10T00:14:22.7660740Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-10T00:14:22.7661245Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-11-10T00:14:22.7661788Z  at 
> org.junit.Assert.assertEquals(Assert.java:118)
> 2020-11-10T00:14:22.7662312Z  at 
> org.junit.Assert.assertEquals(Assert.java:144)
> 2020-11-10T00:14:22.7663051Z  at 
> org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest.testTopicPatternSubscriber(KafkaSubscriberTest.java:94)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19964) Gelly ITCase stuck on Azure in HITSITCase.testPrintWithRMatGraph

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229801#comment-17229801
 ] 

Robert Metzger commented on FLINK-19964:


Thanks a lot for the fix!

> Gelly ITCase stuck on Azure in HITSITCase.testPrintWithRMatGraph
> 
>
> Key: FLINK-19964
> URL: https://issues.apache.org/jira/browse/FLINK-19964
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Graph Processing (Gelly), Runtime / Network, 
> Tests
>Affects Versions: 1.12.0
>Reporter: Chesnay Schepler
>Assignee: Arvid Heise
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> The HITSITCase has gotten stuck on Azure. Chances are that something in the 
> scheduling or network has broken it.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8919=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] becketqin closed pull request #14022: [FLINK-20068] Enhance the topic creation guarantee to ensure all the …

2020-11-10 Thread GitBox


becketqin closed pull request #14022:
URL: https://github.com/apache/flink/pull/14022


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] becketqin commented on pull request #14022: [FLINK-20068] Enhance the topic creation guarantee to ensure all the …

2020-11-10 Thread GitBox


becketqin commented on pull request #14022:
URL: https://github.com/apache/flink/pull/14022#issuecomment-725251041


   Merged to master.
   cacb4c1fb1d6123d0aeb93d550ffcafa6ad4a8fb
   Cherry-picked to 1.11:
   18e4c1ba5b66c704b3ea7e5ba8fb7f3499207903



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai reassigned FLINK-19300:
---

Assignee: Xiang Gao

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.8.0
>Reporter: Xiang Gao
>Assignee: Xiang Gao
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19300:
---
Affects Version/s: 1.8.0

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.8.0
>Reporter: Xiang Gao
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19300:
---
Fix Version/s: 1.11.3
   1.12.0

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
> Fix For: 1.12.0, 1.11.3
>
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19300:
---
Priority: Blocker  (was: Critical)

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229796#comment-17229796
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-19300 at 11/11/20, 7:09 AM:


Just a comment on the severity of the issue:

It looks like timer loss is only possible if somehow, the key groups contain a 
{{0}} at the very beginning of the stream. This seems to be the only possible 
case that would lead to the {{InternalTimerServiceSerializationProxy}} silently 
skipping the rest of the reads, instead of failing the restore with some 
{{IOException}}.


was (Author: tzulitai):
Just a comment on the severity of the issue:

It looks like timer loss is only possible if somehow, the key groups contain a 
{{0}} at the very beginning of the stream. This seems to be the only possible 
case that would lead to the {{InternalTimerServiceSerializationProxy}} silently 
skipping the rest of the reads.

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229796#comment-17229796
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-19300:
-

Just a comment on the severity of the issue:

It looks like timer loss is only possible if somehow, the key groups contain a 
{{0}} at the very beginning of the stream. This seems to be the only possible 
case that would lead to the {{InternalTimerServiceSerializationProxy}} silently 
skipping the rest of the reads.

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20084) Fix NPE when generating watermark for record whose rowtime field is null after watermark push down

2020-11-10 Thread Shengkai Fang (Jira)
Shengkai Fang created FLINK-20084:
-

 Summary: Fix NPE when generating watermark for record whose 
rowtime field is null after watermark push down
 Key: FLINK-20084
 URL: https://issues.apache.org/jira/browse/FLINK-20084
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.12.0
Reporter: Shengkai Fang
 Fix For: 1.12.0


The problem is from the class 
{{PushWatermarkIntoTableSourceScanRuleBase$DefaultWatermarkGeneratorSupplier$DefaultWatermarkGenerator#onEvent}}.
 It doesn't test whether the calculated watermark is null before set.  

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229790#comment-17229790
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-19300 at 11/11/20, 7:06 AM:


This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skipping reading the 
timers, instead of some {{IOException}} due to incorrect read attempts (and 
eventually fails the restore, instead of a timer loss). Could you clarify if my 
assumption is correct?

As for the fix, I would suggest to try to reuse the 
{{java.io.DataInput#readFully}} method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached / not 
enough bytes available.




was (Author: tzulitai):
This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skipping reading the 
timers, instead of some {{IOException}} due to incorrect read attempts (and 
eventually fails the restore, instead of a timer loss). Could you clarify if my 
assumption is correct?

As for the fix, I would suggest to try to reuse the 
{{java.io.DataInput#readFully}} method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.



> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229790#comment-17229790
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-19300 at 11/11/20, 7:05 AM:


This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skipping reading the 
timers, instead of some {{IOException}} due to incorrect read attempts (and 
eventually fails the restore, instead of a timer loss). Could you clarify if my 
assumption is correct?

As for the fix, I would suggest to try to reuse the 
{{java.io.DataInput#readFully}} method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.




was (Author: tzulitai):
This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skipping reading the 
timers, instead of some {{IOException}} due to incorrect read attempts (and 
eventually fails the restore, instead of a timer loss). Could you clarify if my 
assumption is correct?

As for the fix, I would suggest to try to reuse the 
`java.io.DataInput#readFully` method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.



> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229790#comment-17229790
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-19300 at 11/11/20, 7:05 AM:


This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skipping reading the 
timers, instead of some {{IOException}} due to incorrect read attempts (and 
eventually fails the restore, instead of a timer loss). Could you clarify if my 
assumption is correct?

As for the fix, I would suggest to try to reuse the 
`java.io.DataInput#readFully` method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.




was (Author: tzulitai):
This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skips reading the timers, 
instead of some {{IOException}} due to incorrect read attempts (and eventually 
fails the restore, instead of a timer loss). Could you clarify if my assumption 
is correct?

As for the fix, I would suggest to try to reuse the 
`java.io.DataInput#readFully` method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.



> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229793#comment-17229793
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-19300:
-

As for priority of this issue:

The bug looks like it's been there for quite a while already across many 
versions, but I'd suggest to list it as a blocker for 1.12.0 and 1.11.3.

[~xianggao] Please do submit a PR for this, I'll try to review it as soon as 
possible.

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229790#comment-17229790
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-19300:
-

This looks like a real issue, the typical {{read}} v.s. {{readFully}} problem.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skips reading the timers, 
instead of some {{IOException}} due to incorrect read attempts (and eventually 
fails the restore, instead of a timer loss). Could you clarify if my assumption 
is correct?

As for the fix, I would suggest to try to reuse the 
`java.io.DataInput#readFully` method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.



> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229790#comment-17229790
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-19300 at 11/11/20, 7:03 AM:


This looks like a real issue, the typical {{read}} v.s. {{readFully}} mistake.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skips reading the timers, 
instead of some {{IOException}} due to incorrect read attempts (and eventually 
fails the restore, instead of a timer loss). Could you clarify if my assumption 
is correct?

As for the fix, I would suggest to try to reuse the 
`java.io.DataInput#readFully` method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.




was (Author: tzulitai):
This looks like a real issue, the typical {{read}} v.s. {{readFully}} problem.

This read path should only occur for the case where users are using RocksDB 
backends + heap-based timers (using the heap backend should be fine, would not 
bump into this).

[~xianggao] to help me understand the full problem: in your scenarios, I'm 
assuming that the timer losses are caused by somehow the 
{{InternalTimerServiceSerializationProxy}} silently skips reading the timers, 
instead of some {{IOException}} due to incorrect read attempts (and eventually 
fails the restore, instead of a timer loss). Could you clarify if my assumption 
is correct?

As for the fix, I would suggest to try to reuse the 
`java.io.DataInput#readFully` method instead or re-implementing it:
{code}
byte[] tmp = new byte[VERSIONED_IDENTIFIER.length];

DataInputView inputView = new DataInputViewStreamWrapper(inputStream);
inputView.readFully(tmp);
{code}

You can catch {{EOFException}} to determine if end of stream is reached.



> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (FLINK-20004) UpperLimitExceptionParameter description is misleading

2020-11-10 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated FLINK-20004:
---
Comment: was deleted

(was: [~f.pompermaier], you could refer to MessageQueryParameter. In 
MessageQueryParameter, the maxExceptions is comma separated value.
cc [~chesnay])

> UpperLimitExceptionParameter description is misleading
> --
>
> Key: FLINK-20004
> URL: https://issues.apache.org/jira/browse/FLINK-20004
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.2
>Reporter: Flavio Pompermaier
>Priority: Trivial
>
> The maxExceptions query parameter of /jobs/:jobid/exceptions REST API  is an 
> integer parameter, not a list of comma separated values..this is probably a 
> cut-and-paste error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20083) OrcFsStreamingSinkITCase times out

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-20083:
---
Fix Version/s: 1.12.0

> OrcFsStreamingSinkITCase times out
> --
>
> Key: FLINK-20083
> URL: https://issues.apache.org/jira/browse/FLINK-20083
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
> ORC, SequenceFile)
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8661=logs=66592496-52df-56bb-d03e-37509e1d9d0f=ae0269db-6796-5583-2e5f-d84757d711aa
> {code}
> [ERROR]   
> OrcFsStreamingSinkITCase>FsStreamingSinkITCaseBase.testPart:84->FsStreamingSinkITCaseBase.test:120->FsStreamingSinkITCaseBase.check:133
>  » TestTimedOut
> [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 26.871 s <<< FAILURE! - in org.apache.flink.orc.OrcFsStreamingSinkITCase
> [ERROR] testPart(org.apache.flink.orc.OrcFsStreamingSinkITCase)  Time 
> elapsed: 20.052 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 20 seconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sleepBeforeRetry(CollectResultFetcher.java:231)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:119)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>   at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.check(FsStreamingSinkITCaseBase.scala:133)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.test(FsStreamingSinkITCaseBase.scala:120)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.testPart(FsStreamingSinkITCaseBase.scala:84)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-20065) UnalignedCheckpointCompatibilityITCase.test failed with AskTimeoutException

2020-11-10 Thread Arvid Heise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise reassigned FLINK-20065:
---

Assignee: Arvid Heise

> UnalignedCheckpointCompatibilityITCase.test failed with AskTimeoutException
> ---
>
> Key: FLINK-20065
> URL: https://issues.apache.org/jira/browse/FLINK-20065
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Dian Fu
>Assignee: Arvid Heise
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9362=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=45cc9205-bdb7-5b54-63cd-89fdc0983323
> {code}
> 2020-11-09T22:19:47.2714024Z [ERROR] test[type: SAVEPOINT, startAligned: 
> true](org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase)
>   Time elapsed: 1.293 s  <<< ERROR!
> 2020-11-09T22:19:47.2715260Z java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public default 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.stopWithSavepoint(org.apache.flink.api.common.JobID,java.lang.String,boolean,org.apache.flink.api.common.time.Time)
>  timed out.
> 2020-11-09T22:19:47.2716743Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-11-09T22:19:47.2718213Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-11-09T22:19:47.2719166Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase.runAndTakeSavepoint(UnalignedCheckpointCompatibilityITCase.java:113)
> 2020-11-09T22:19:47.2720278Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase.test(UnalignedCheckpointCompatibilityITCase.java:97)
> 2020-11-09T22:19:47.2721126Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-11-09T22:19:47.2721771Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-11-09T22:19:47.2722773Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-11-09T22:19:47.2723479Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-11-09T22:19:47.2724187Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-11-09T22:19:47.2725026Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-11-09T22:19:47.2725817Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-11-09T22:19:47.2726595Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-11-09T22:19:47.2727515Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-11-09T22:19:47.2728192Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-11-09T22:19:47.2744089Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-11-09T22:19:47.2744907Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-11-09T22:19:47.2745573Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-11-09T22:19:47.2746037Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-11-09T22:19:47.2746445Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-11-09T22:19:47.2746868Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-11-09T22:19:47.2747443Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-11-09T22:19:47.2747876Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-11-09T22:19:47.2748297Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-11-09T22:19:47.2748694Z  at 
> org.junit.runners.Suite.runChild(Suite.java:128)
> 2020-11-09T22:19:47.2749054Z  at 
> org.junit.runners.Suite.runChild(Suite.java:27)
> 2020-11-09T22:19:47.2749414Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-11-09T22:19:47.2749819Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-11-09T22:19:47.2750373Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-11-09T22:19:47.2750923Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-11-09T22:19:47.2751555Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-11-09T22:19:47.2752148Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-11-09T22:19:47.2752938Z  at 
> 

[jira] [Created] (FLINK-20083) OrcFsStreamingSinkITCase times out

2020-11-10 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-20083:
--

 Summary: OrcFsStreamingSinkITCase times out
 Key: FLINK-20083
 URL: https://issues.apache.org/jira/browse/FLINK-20083
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
ORC, SequenceFile)
Affects Versions: 1.12.0
Reporter: Robert Metzger


https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8661=logs=66592496-52df-56bb-d03e-37509e1d9d0f=ae0269db-6796-5583-2e5f-d84757d711aa

{code}
[ERROR]   
OrcFsStreamingSinkITCase>FsStreamingSinkITCaseBase.testPart:84->FsStreamingSinkITCaseBase.test:120->FsStreamingSinkITCaseBase.check:133
 » TestTimedOut

[ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.871 
s <<< FAILURE! - in org.apache.flink.orc.OrcFsStreamingSinkITCase
[ERROR] testPart(org.apache.flink.orc.OrcFsStreamingSinkITCase)  Time elapsed: 
20.052 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 20 seconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sleepBeforeRetry(CollectResultFetcher.java:231)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:119)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
at 
org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
at 
org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at 
org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114)
at 
org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.check(FsStreamingSinkITCaseBase.scala:133)
at 
org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.test(FsStreamingSinkITCaseBase.scala:120)
at 
org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.testPart(FsStreamingSinkITCaseBase.scala:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20083) OrcFsStreamingSinkITCase times out

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-20083:
---
Priority: Critical  (was: Major)

> OrcFsStreamingSinkITCase times out
> --
>
> Key: FLINK-20083
> URL: https://issues.apache.org/jira/browse/FLINK-20083
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
> ORC, SequenceFile)
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8661=logs=66592496-52df-56bb-d03e-37509e1d9d0f=ae0269db-6796-5583-2e5f-d84757d711aa
> {code}
> [ERROR]   
> OrcFsStreamingSinkITCase>FsStreamingSinkITCaseBase.testPart:84->FsStreamingSinkITCaseBase.test:120->FsStreamingSinkITCaseBase.check:133
>  » TestTimedOut
> [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 26.871 s <<< FAILURE! - in org.apache.flink.orc.OrcFsStreamingSinkITCase
> [ERROR] testPart(org.apache.flink.orc.OrcFsStreamingSinkITCase)  Time 
> elapsed: 20.052 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 20 seconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sleepBeforeRetry(CollectResultFetcher.java:231)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:119)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>   at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.check(FsStreamingSinkITCaseBase.scala:133)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.test(FsStreamingSinkITCaseBase.scala:120)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.testPart(FsStreamingSinkITCaseBase.scala:84)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20083) OrcFsStreamingSinkITCase times out

2020-11-10 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-20083:
---
Issue Type: Bug  (was: Improvement)

> OrcFsStreamingSinkITCase times out
> --
>
> Key: FLINK-20083
> URL: https://issues.apache.org/jira/browse/FLINK-20083
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
> ORC, SequenceFile)
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8661=logs=66592496-52df-56bb-d03e-37509e1d9d0f=ae0269db-6796-5583-2e5f-d84757d711aa
> {code}
> [ERROR]   
> OrcFsStreamingSinkITCase>FsStreamingSinkITCaseBase.testPart:84->FsStreamingSinkITCaseBase.test:120->FsStreamingSinkITCaseBase.check:133
>  » TestTimedOut
> [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 26.871 s <<< FAILURE! - in org.apache.flink.orc.OrcFsStreamingSinkITCase
> [ERROR] testPart(org.apache.flink.orc.OrcFsStreamingSinkITCase)  Time 
> elapsed: 20.052 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 20 seconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sleepBeforeRetry(CollectResultFetcher.java:231)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:119)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>   at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.check(FsStreamingSinkITCaseBase.scala:133)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.test(FsStreamingSinkITCaseBase.scala:120)
>   at 
> org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.testPart(FsStreamingSinkITCaseBase.scala:84)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise commented on a change in pull request #14024: [FLINK-20079][task] Fix StreamTask initialization order

2020-11-10 Thread GitBox


AHeise commented on a change in pull request #14024:
URL: https://github.com/apache/flink/pull/14024#discussion_r521147841



##
File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/UnalignedCheckpointITCase.java
##
@@ -531,8 +534,10 @@ public void invoke(Long value, Context context) throws 
Exception {
state.numOutput++;
 
if (state.completedCheckpoints < minCheckpoints) {
-   // induce heavy backpressure until enough 
checkpoints have been written
-   Thread.sleep(0, 100_000);
+   // induce backpressure until enough checkpoints 
have been written
+   if (random.nextInt(1000) == 42) {

Review comment:
   Why do we need to reduce the backpressure here? I'm worried that any 
solution on top of it will only work with a certain minimal flow.
   
   Note that in 
https://github.com/apache/flink/pull/13845/commits/ef4f78d9b7bedb10a8c7be3e062032d2e7a0c77c#diff-c0d448d637f04cc203de3592d7fab0ba2412f2d75c59dcf6e44cc4a3c18e5851R568
 I added a modification to the test that avoids backpressuring during recovery 
(by applying backpressure only after the first successful checkpoint.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20046) StreamTableAggregateTests.test_map_view_iterate is instable

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229786#comment-17229786
 ] 

Robert Metzger commented on FLINK-20046:


https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8661=logs=584fa981-f71a-5840-1c49-f800c954fe4b=532bf1f8-8c75-59c3-eaad-8c773769bc3a

> StreamTableAggregateTests.test_map_view_iterate is instable
> ---
>
> Key: FLINK-20046
> URL: https://issues.apache.org/jira/browse/FLINK-20046
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9279=logs=821b528f-1eed-5598-a3b4-7f748b13f261=4fad9527-b9a5-5015-1b70-8356e5c91490
> {code}
> 2020-11-07T22:50:57.4180758Z ___ 
> StreamTableAggregateTests.test_map_view_iterate 
> 2020-11-07T22:50:57.4181301Z 
> 2020-11-07T22:50:57.4181965Z self = 
>  testMethod=test_map_view_iterate>
> 2020-11-07T22:50:57.4182348Z 
> 2020-11-07T22:50:57.4182535Z def test_map_view_iterate(self):
> 2020-11-07T22:50:57.4182812Z test_iterate = 
> udaf(TestIterateAggregateFunction())
> 2020-11-07T22:50:57.4183320Z 
> self.t_env.get_config().set_idle_state_retention(datetime.timedelta(days=1))
> 2020-11-07T22:50:57.4183763Z 
> self.t_env.get_config().get_configuration().set_string(
> 2020-11-07T22:50:57.4297555Z "python.fn-execution.bundle.size", 
> "2")
> 2020-11-07T22:50:57.4297922Z # trigger the cache eviction in a bundle.
> 2020-11-07T22:50:57.4308028Z 
> self.t_env.get_config().get_configuration().set_string(
> 2020-11-07T22:50:57.4308653Z "python.state.cache-size", "2")
> 2020-11-07T22:50:57.4308945Z 
> self.t_env.get_config().get_configuration().set_string(
> 2020-11-07T22:50:57.4309382Z "python.map-state.read-cache-size", 
> "2")
> 2020-11-07T22:50:57.4309676Z 
> self.t_env.get_config().get_configuration().set_string(
> 2020-11-07T22:50:57.4310428Z "python.map-state.write-cache-size", 
> "2")
> 2020-11-07T22:50:57.4310701Z 
> self.t_env.get_config().get_configuration().set_string(
> 2020-11-07T22:50:57.4311130Z 
> "python.map-state.iterate-response-batch-size", "2")
> 2020-11-07T22:50:57.4311361Z t = self.t_env.from_elements(
> 2020-11-07T22:50:57.4311691Z [(1, 'Hi_', 'hi'),
> 2020-11-07T22:50:57.4312004Z  (1, 'Hi', 'hi'),
> 2020-11-07T22:50:57.4312316Z  (2, 'hello', 'hello'),
> 2020-11-07T22:50:57.4312639Z  (3, 'Hi_', 'hi'),
> 2020-11-07T22:50:57.4312975Z  (3, 'Hi', 'hi'),
> 2020-11-07T22:50:57.4313285Z  (4, 'hello', 'hello'),
> 2020-11-07T22:50:57.4313609Z  (5, 'Hi2_', 'hi'),
> 2020-11-07T22:50:57.4313908Z  (5, 'Hi2', 'hi'),
> 2020-11-07T22:50:57.4314238Z  (6, 'hello2', 'hello'),
> 2020-11-07T22:50:57.4314558Z  (7, 'Hi', 'hi'),
> 2020-11-07T22:50:57.4315053Z  (8, 'hello', 'hello'),
> 2020-11-07T22:50:57.4315396Z  (9, 'Hi2', 'hi'),
> 2020-11-07T22:50:57.4315773Z  (13, 'Hi3', 'hi')], ['a', 'b', 'c'])
> 2020-11-07T22:50:57.4316023Z 
> self.t_env.create_temporary_view("source", t)
> 2020-11-07T22:50:57.4316299Z table_with_retract_message = 
> self.t_env.sql_query(
> 2020-11-07T22:50:57.4316615Z "select LAST_VALUE(b) as b, 
> LAST_VALUE(c) as c from source group by a")
> 2020-11-07T22:50:57.4316919Z result = 
> table_with_retract_message.group_by(t.c) \
> 2020-11-07T22:50:57.4317197Z 
> .select(test_iterate(t.b).alias("a"), t.c) \
> 2020-11-07T22:50:57.4317619Z .select(col("a").get(0).alias("a"),
> 2020-11-07T22:50:57.4318111Z col("a").get(1).alias("b"),
> 2020-11-07T22:50:57.4318357Z col("a").get(2).alias("c"),
> 2020-11-07T22:50:57.4318586Z col("a").get(3).alias("d"),
> 2020-11-07T22:50:57.4318814Z t.c.alias("e"))
> 2020-11-07T22:50:57.4319023Z assert_frame_equal(
> 2020-11-07T22:50:57.4319208Z >   result.to_pandas(),
> 2020-11-07T22:50:57.4319408Z pd.DataFrame([
> 2020-11-07T22:50:57.4319872Z ["hello,hello2", "1,3", 
> 'hello:3,hello2:1', 2, "hello"],
> 2020-11-07T22:50:57.4320398Z ["Hi,Hi2,Hi3", "1,2,3", 
> "Hi:3,Hi2:2,Hi3:1", 3, "hi"]],
> 2020-11-07T22:50:57.4321047Z columns=['a', 'b', 'c', 'd', 
> 'e']))
> 2020-11-07T22:50:57.4321198Z 
> 2020-11-07T22:50:57.4321385Z pyflink/table/tests/test_aggregate.py:468: 
> 

[jira] [Commented] (FLINK-20004) UpperLimitExceptionParameter description is misleading

2020-11-10 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229784#comment-17229784
 ] 

Nicholas Jiang commented on FLINK-20004:


[~f.pompermaier], you could refer to MessageQueryParameter. In 
MessageQueryParameter, the maxExceptions is comma separated value.
cc [~chesnay]

> UpperLimitExceptionParameter description is misleading
> --
>
> Key: FLINK-20004
> URL: https://issues.apache.org/jira/browse/FLINK-20004
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.2
>Reporter: Flavio Pompermaier
>Priority: Trivial
>
> The maxExceptions query parameter of /jobs/:jobid/exceptions REST API  is an 
> integer parameter, not a list of comma separated values..this is probably a 
> cut-and-paste error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread david weinstein (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229782#comment-17229782
 ] 

david weinstein commented on FLINK-19300:
-

We're using Flink version 1.8.

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19300) Timer loss after restoring from savepoint

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229780#comment-17229780
 ] 

Robert Metzger commented on FLINK-19300:


Thanks a lot for reporting this. Which Flink version are you using?

> Timer loss after restoring from savepoint
> -
>
> Key: FLINK-19300
> URL: https://issues.apache.org/jira/browse/FLINK-19300
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Xiang Gao
>Priority: Critical
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19964) Gelly ITCase stuck on Azure in HITSITCase.testPrintWithRMatGraph

2020-11-10 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229777#comment-17229777
 ] 

Arvid Heise commented on FLINK-19964:
-

Yes sorry about that, but I realized that exactly that question was unanswered 
;).

Merged the fix into master as 18ffebb3dbecc21d2d33c436628176c4971cebbd. 
Backport not applicable.

> Gelly ITCase stuck on Azure in HITSITCase.testPrintWithRMatGraph
> 
>
> Key: FLINK-19964
> URL: https://issues.apache.org/jira/browse/FLINK-19964
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Graph Processing (Gelly), Runtime / Network, 
> Tests
>Affects Versions: 1.12.0
>Reporter: Chesnay Schepler
>Assignee: Arvid Heise
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> The HITSITCase has gotten stuck on Azure. Chances are that something in the 
> scheduling or network has broken it.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8919=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19964) Gelly ITCase stuck on Azure in HITSITCase.testPrintWithRMatGraph

2020-11-10 Thread Arvid Heise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise resolved FLINK-19964.
-
Resolution: Fixed

> Gelly ITCase stuck on Azure in HITSITCase.testPrintWithRMatGraph
> 
>
> Key: FLINK-19964
> URL: https://issues.apache.org/jira/browse/FLINK-19964
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Graph Processing (Gelly), Runtime / Network, 
> Tests
>Affects Versions: 1.12.0
>Reporter: Chesnay Schepler
>Assignee: Arvid Heise
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> The HITSITCase has gotten stuck on Azure. Chances are that something in the 
> scheduling or network has broken it.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8919=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise merged pull request #14023: [FLINK-19964][network] Avoid proactively pulling segments in destroyed LocalBufferPool.

2020-11-10 Thread GitBox


AHeise merged pull request #14023:
URL: https://github.com/apache/flink/pull/14023


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14027: [FLINK-20074][table-planner-blink] Fix can't generate plan when joining on changelog source without updates

2020-11-10 Thread GitBox


flinkbot edited a comment on pull request #14027:
URL: https://github.com/apache/flink/pull/14027#issuecomment-725222729


   
   ## CI report:
   
   * 6d29e26c756f853e68d2930a2fc9cdbf41174f2f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9442)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14028: [FLINK-20020][client] Make UnsuccessfulExecutionException part of the JobClient.getJobExecutionResult() contract

2020-11-10 Thread GitBox


flinkbot edited a comment on pull request #14028:
URL: https://github.com/apache/flink/pull/14028#issuecomment-725222918


   
   ## CI report:
   
   * 0e105679613efd2a1ccc80b678445bbcc23042cb Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9443)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20082) Cannot start Flink application due to "cannot assign instance of java.lang.invoke.SerializedLambda to type scala.Function1

2020-11-10 Thread 陳昌倬
ChangZhuo Chen (陳昌倬) created FLINK-20082:


 Summary: Cannot start Flink application due to "cannot assign 
instance of java.lang.invoke.SerializedLambda to type scala.Function1
 Key: FLINK-20082
 URL: https://issues.apache.org/jira/browse/FLINK-20082
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream, API / Scala
Affects Versions: 1.11.2
 Environment: * Flink 1.11.2
 * Java 11
 * Scala 2.12.11
Reporter: ChangZhuo Chen (陳昌倬)
 Attachments: image-20201110-060934.png

Hi,
 * Our Flink application (Java 11 + Scala 2.12) has the following problem when 
executing it. It cannot be run in Flink cluster.
 * The problem is similar to https://issues.apache.org/jira/browse/SPARK-25047, 
so maybe the same fix shall be implemented in Flink?

!image-20201110-060934.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] curcur closed pull request #13677: Single task scheduler

2020-11-10 Thread GitBox


curcur closed pull request #13677:
URL: https://github.com/apache/flink/pull/13677


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19044) Web UI reports incorrect timeline for the FINISHED state of a subtask

2020-11-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229766#comment-17229766
 ] 

Robert Metzger commented on FLINK-19044:


Is this a problem of the REST API, or is the UI mixing up the data?

> Web UI reports incorrect timeline for the FINISHED state of a subtask
> -
>
> Key: FLINK-19044
> URL: https://issues.apache.org/jira/browse/FLINK-19044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.12.0
>Reporter: Caizhi Weng
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: timeline.jpg
>
>
> The timeline for the FINISHED state of a subtask is incorrect. The starting 
> time of the FINISHED state is larger than its ending time. See the image 
> below:
>  !timeline.jpg! 
> I discover this bug when running the TPCDS benchmark, but I think it can be 
> reproduced by arbitrary tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] becketqin commented on pull request #14022: [FLINK-20068] Enhance the topic creation guarantee to ensure all the …

2020-11-10 Thread GitBox


becketqin commented on pull request #14022:
URL: https://github.com/apache/flink/pull/14022#issuecomment-725224684


   @PatrickRen @Sxnan Thanks for helping verify the fix. I'll merge the PR now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14028: [FLINK-20020][client] Make UnsuccessfulExecutionException part of the JobClient.getJobExecutionResult() contract

2020-11-10 Thread GitBox


flinkbot commented on pull request #14028:
URL: https://github.com/apache/flink/pull/14028#issuecomment-725222918


   
   ## CI report:
   
   * 0e105679613efd2a1ccc80b678445bbcc23042cb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20068) KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results

2020-11-10 Thread Jiangjie Qin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229763#comment-17229763
 ] 

Jiangjie Qin commented on FLINK-20068:
--

This is a legacy issue caused by asynchronous topic creation in Kafka.

The problems is that we have 3 Kafka brokers. When a topic is created, the 
{{CreateTopicsRequest}} will be sent to one of the brokers and block waiting 
until that broker receives a metadata cache update. However, this does not 
guarantee that the other two brokers have also received the metadata cache 
update. When a subsequent TopicMetadataRequest goes to the other two brokers, 
it is still possible that the topic being created is not returned in the topic 
metadata.

The patch fixes this by letting the {{KafkaTestEnvironmentImpl}} to look into 
each of the brokers and wait until all the brokers have got the newly created 
topic in their metadata cache before it returns from {{createTestTopic()}} call.

This should help fix a few other intermittent test failures.

> KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results
> -
>
> Key: FLINK-20068
> URL: https://issues.apache.org/jira/browse/FLINK-20068
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9365=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5
> {code}
> 2020-11-10T00:14:22.7658242Z [ERROR] 
> testTopicPatternSubscriber(org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest)
>   Time elapsed: 0.012 s  <<< FAILURE!
> 2020-11-10T00:14:22.7659838Z java.lang.AssertionError: 
> expected:<[pattern-topic-5, pattern-topic-4, pattern-topic-7, 
> pattern-topic-6, pattern-topic-9, pattern-topic-8, pattern-topic-1, 
> pattern-topic-0, pattern-topic-3]> but was:<[]>
> 2020-11-10T00:14:22.7660740Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-10T00:14:22.7661245Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-11-10T00:14:22.7661788Z  at 
> org.junit.Assert.assertEquals(Assert.java:118)
> 2020-11-10T00:14:22.7662312Z  at 
> org.junit.Assert.assertEquals(Assert.java:144)
> 2020-11-10T00:14:22.7663051Z  at 
> org.apache.flink.connector.kafka.source.enumerator.subscriber.KafkaSubscriberTest.testTopicPatternSubscriber(KafkaSubscriberTest.java:94)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14027: [FLINK-20074][table-planner-blink] Fix can't generate plan when joining on changelog source without updates

2020-11-10 Thread GitBox


flinkbot commented on pull request #14027:
URL: https://github.com/apache/flink/pull/14027#issuecomment-725222729


   
   ## CI report:
   
   * 6d29e26c756f853e68d2930a2fc9cdbf41174f2f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20074) ChangelogSourceITCase.testRegularJoin fail

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20074:
---
Labels: pull-request-available test-stability  (was: test-stability)

> ChangelogSourceITCase.testRegularJoin fail
> --
>
> Key: FLINK-20074
> URL: https://issues.apache.org/jira/browse/FLINK-20074
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Jingsong Lee
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> INSTANCE: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9379=logs=ae4f8708-9994-57d3-c2d7-b892156e7812=e25d5e7e-2a9c-5589-4940-0b638d75a414



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-13733) FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin resolved FLINK-13733.
--
Fix Version/s: 1.11.3
   Resolution: Fixed

> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis
> --
>
> Key: FLINK-13733
> URL: https://issues.apache.org/jira/browse/FLINK-13733
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Till Rohrmann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
> Attachments: 20200421.13.tar.gz
>
>
> The {{FlinkKafkaInternalProducerITCase.testHappyPath}} fails on Travis with 
> {code}
> Test 
> testHappyPath(org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase)
>  failed with:
> java.util.NoSuchElementException
>   at 
> org.apache.kafka.common.utils.AbstractIterator.next(AbstractIterator.java:52)
>   at 
> org.apache.flink.shaded.guava18.com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302)
>   at 
> org.apache.flink.shaded.guava18.com.google.common.collect.Iterables.getOnlyElement(Iterables.java:289)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.assertRecord(FlinkKafkaInternalProducerITCase.java:169)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.testHappyPath(FlinkKafkaInternalProducerITCase.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://api.travis-ci.org/v3/job/571870358/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14028: [FLINK-20020][client] Make UnsuccessfulExecutionException part of the JobClient.getJobExecutionResult() contract

2020-11-10 Thread GitBox


flinkbot commented on pull request #14028:
URL: https://github.com/apache/flink/pull/14028#issuecomment-725219422


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 0e105679613efd2a1ccc80b678445bbcc23042cb (Wed Nov 11 
06:01:14 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-13733) FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis

2020-11-10 Thread Jiangjie Qin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229761#comment-17229761
 ] 

Jiangjie Qin commented on FLINK-13733:
--

The patch has been merged to master.
cb2d137adc9ea1e46a67513c1e0f2469bb05bff4

 I am closing the ticket. Hopefully we don't see the unstability again.

> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis
> --
>
> Key: FLINK-13733
> URL: https://issues.apache.org/jira/browse/FLINK-13733
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Till Rohrmann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
> Attachments: 20200421.13.tar.gz
>
>
> The {{FlinkKafkaInternalProducerITCase.testHappyPath}} fails on Travis with 
> {code}
> Test 
> testHappyPath(org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase)
>  failed with:
> java.util.NoSuchElementException
>   at 
> org.apache.kafka.common.utils.AbstractIterator.next(AbstractIterator.java:52)
>   at 
> org.apache.flink.shaded.guava18.com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302)
>   at 
> org.apache.flink.shaded.guava18.com.google.common.collect.Iterables.getOnlyElement(Iterables.java:289)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.assertRecord(FlinkKafkaInternalProducerITCase.java:169)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.testHappyPath(FlinkKafkaInternalProducerITCase.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://api.travis-ci.org/v3/job/571870358/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20020) Make UnsuccessfulExecutionException part of the JobClient.getJobExecutionResult() contract.

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20020:
---
Labels: pull-request-available  (was: )

> Make UnsuccessfulExecutionException part of the 
> JobClient.getJobExecutionResult() contract.
> ---
>
> Key: FLINK-20020
> URL: https://issues.apache.org/jira/browse/FLINK-20020
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Affects Versions: 1.12.0
>Reporter: Kostas Kloudas
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, different implementations of the {{JobClient}} throw different 
> exceptions. The {{ClusterClientJobClientAdapter}} wraps the exception from 
> the {{JobResult.toJobExecutionResult()}} into a 
> {{ProgramInvocationException}}, the {{MiniClusterJobClient}} simply wraps it 
> in a {{CompletionException}} and the {{EmbeddedJobClient}} wraps it into an 
> {{UnsuccessfulExecutionException}}. 
> With this issue I would like to propose making the exception uniform and part 
> of the contract and as a candidate I would propose the behaviour of the 
> {{EmbeddedJobClient}} which throws an {{UnsuccessfulExecutionException}}. The 
> reason is that this exception also includes the status of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] SteNicholas opened a new pull request #14028: [FLINK-20020][client] Make UnsuccessfulExecutionException part of the JobClient.getJobExecutionResult() contract

2020-11-10 Thread GitBox


SteNicholas opened a new pull request #14028:
URL: https://github.com/apache/flink/pull/14028


   ## What is the purpose of the change
   
   *Currently, different implementations of the `JobClient` throw different 
exceptions. The `ClusterClientJobClientAdapter` wraps the exception from the 
`JobResult.toJobExecutionResult()` into a `ProgramInvocationException`, the 
`MiniClusterJobClient` simply wraps it in a `CompletionException` and the 
`EmbeddedJobClient` wraps it into an `UnsuccessfulExecutionException`. 
`JobResult.toJobExecutionResult()` should make the exception uniform and part 
of the contract through `JobExecutionResultException` which exception includes 
the status of the application.*
   
   ## Brief change log
   
 - *Add `JobExecutionResultException` to make the exception uniform and 
part of the contract for `JobResult.toJobExecutionResult()`.*
 - *`ClusterClientJobClientAdapter`, `MiniClusterJobClient` and 
`EmbeddedJobClient` uniform the exception to `JobExecutionResultException`.*
   
   ## Verifying this change
   
 - *Modify `ApplicationDispatcherBootstrapTest` and 
`PerJobMiniClusterFactoryTest.java` to update the 
`UnsuccessfulExecutionException` check to the `JobExecutionResultException`.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14027: [FLINK-20074][table-planner-blink] Fix can't generate plan when joining on changelog source without updates

2020-11-10 Thread GitBox


flinkbot commented on pull request #14027:
URL: https://github.com/apache/flink/pull/14027#issuecomment-725218189


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 6d29e26c756f853e68d2930a2fc9cdbf41174f2f (Wed Nov 11 
05:57:45 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] becketqin closed pull request #14021: [FLINK-13733][connector/kafka] Make FlinkKafkaInternalProducerITCase more robust

2020-11-10 Thread GitBox


becketqin closed pull request #14021:
URL: https://github.com/apache/flink/pull/14021


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] becketqin commented on pull request #14021: [FLINK-13733][connector/kafka] Make FlinkKafkaInternalProducerITCase more robust

2020-11-10 Thread GitBox


becketqin commented on pull request #14021:
URL: https://github.com/apache/flink/pull/14021#issuecomment-725218181


   Merged to master.
   cb2d137adc9ea1e46a67513c1e0f2469bb05bff4



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wuchong opened a new pull request #14027: [FLINK-20074][table-planner-blink] Fix can't generate plan when joini…

2020-11-10 Thread GitBox


wuchong opened a new pull request #14027:
URL: https://github.com/apache/flink/pull/14027


   …ng on changelog source without updates
   
   
   
   ## What is the purpose of the change
   
   This PR fixes the exception when get the plan of JOIN on changelog source 
which doesn't contain updates. 
   
   ```
   [ERROR] testRegularJoin[Source=NO_UPDATE, 
StateBackend=MiniBatch=OFF](org.apache.flink.table.planner.runtime.stream.sql.ChangelogSourceITCase)
  Time elapsed: 0.047 s  <<< ERROR!
   2020-11-10T06:39:13.0856712Z org.apache.flink.table.api.TableException: 
   2020-11-10T06:39:13.0857161Z UpdateKindTrait [ONLY_UPDATE_AFTER] conflicts 
with ModifyKindSetTrait [I,D]. This is a bug in planner, please file an issue. 
   2020-11-10T06:39:13.0858003Z Current node is Join(joinType=[InnerJoin], 
where=[(currency = currency0)], select=[amount, currency, currency0, rate], 
leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]).
   ```
   
   ## Brief change log
   
   - Fix `FlinkChangelogModeInferenceProgram`.
   
   
   ## Verifying this change
   
   - Add `testJoinOnNoUpdateSource` test which can reproduce this problem.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] becketqin commented on pull request #14021: [FLINK-13733][connector/kafka] Make FlinkKafkaInternalProducerITCase more robust

2020-11-10 Thread GitBox


becketqin commented on pull request #14021:
URL: https://github.com/apache/flink/pull/14021#issuecomment-725216164


   @PatrickRen Thanks a lot for the thorough tests. The patch LGTM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20081) ExecutorNotifier should run handler in the main thread when receive an exception from the callable.

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated FLINK-20081:
-
Fix Version/s: 1.12.0

> ExecutorNotifier should run handler in the main thread when receive an 
> exception from the callable.
> ---
>
> Key: FLINK-20081
> URL: https://issues.apache.org/jira/browse/FLINK-20081
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Reporter: Jiangjie Qin
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently the {{ExecutorNotifier}} runs the {{handler}} in the worker thread 
> if there is an exception thrown from the {{callable}}. This breaks the 
> threading model and prevents an exception from bubbling up to fail the job.
> Another issue is that right now, when an exception bubbles up from the 
> {{SourceCoordinator}}, the UncaughtExceptionHandler will call 
> System.exit(-17) and kill the JM. This is too much. Instead, we should just 
> fail the job to trigger a failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20081) ExecutorNotifier should run handler in the main thread when receive an exception from the callable.

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated FLINK-20081:
-
Affects Version/s: 1.11.2

> ExecutorNotifier should run handler in the main thread when receive an 
> exception from the callable.
> ---
>
> Key: FLINK-20081
> URL: https://issues.apache.org/jira/browse/FLINK-20081
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.11.2
>Reporter: Jiangjie Qin
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently the {{ExecutorNotifier}} runs the {{handler}} in the worker thread 
> if there is an exception thrown from the {{callable}}. This breaks the 
> threading model and prevents an exception from bubbling up to fail the job.
> Another issue is that right now, when an exception bubbles up from the 
> {{SourceCoordinator}}, the UncaughtExceptionHandler will call 
> System.exit(-17) and kill the JM. This is too much. Instead, we should just 
> fail the job to trigger a failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20081) ExecutorNotifier should run handler in the main thread when receive an exception from the callable.

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated FLINK-20081:
-
Fix Version/s: 1.11.3

> ExecutorNotifier should run handler in the main thread when receive an 
> exception from the callable.
> ---
>
> Key: FLINK-20081
> URL: https://issues.apache.org/jira/browse/FLINK-20081
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.11.2
>Reporter: Jiangjie Qin
>Priority: Major
> Fix For: 1.12.0, 1.11.3
>
>
> Currently the {{ExecutorNotifier}} runs the {{handler}} in the worker thread 
> if there is an exception thrown from the {{callable}}. This breaks the 
> threading model and prevents an exception from bubbling up to fail the job.
> Another issue is that right now, when an exception bubbles up from the 
> {{SourceCoordinator}}, the UncaughtExceptionHandler will call 
> System.exit(-17) and kill the JM. This is too much. Instead, we should just 
> fail the job to trigger a failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20081) ExecutorNotifier should run handler in the main thread when receive an exception from the callable.

2020-11-10 Thread Jiangjie Qin (Jira)
Jiangjie Qin created FLINK-20081:


 Summary: ExecutorNotifier should run handler in the main thread 
when receive an exception from the callable.
 Key: FLINK-20081
 URL: https://issues.apache.org/jira/browse/FLINK-20081
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Common
Reporter: Jiangjie Qin


Currently the {{ExecutorNotifier}} runs the {{handler}} in the worker thread if 
there is an exception thrown from the {{callable}}. This breaks the threading 
model and prevents an exception from bubbling up to fail the job.

Another issue is that right now, when an exception bubbles up from the 
{{SourceCoordinator}}, the UncaughtExceptionHandler will call System.exit(-17) 
and kill the JM. This is too much. Instead, we should just fail the job to 
trigger a failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-20081) ExecutorNotifier should run handler in the main thread when receive an exception from the callable.

2020-11-10 Thread Jiangjie Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin reassigned FLINK-20081:


Assignee: Jiangjie Qin

> ExecutorNotifier should run handler in the main thread when receive an 
> exception from the callable.
> ---
>
> Key: FLINK-20081
> URL: https://issues.apache.org/jira/browse/FLINK-20081
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.11.2
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>Priority: Major
> Fix For: 1.12.0, 1.11.3
>
>
> Currently the {{ExecutorNotifier}} runs the {{handler}} in the worker thread 
> if there is an exception thrown from the {{callable}}. This breaks the 
> threading model and prevents an exception from bubbling up to fail the job.
> Another issue is that right now, when an exception bubbles up from the 
> {{SourceCoordinator}}, the UncaughtExceptionHandler will call 
> System.exit(-17) and kill the JM. This is too much. Instead, we should just 
> fail the job to trigger a failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20065) UnalignedCheckpointCompatibilityITCase.test failed with AskTimeoutException

2020-11-10 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229747#comment-17229747
 ] 

Dian Fu commented on FLINK-20065:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9431=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=45cc9205-bdb7-5b54-63cd-89fdc0983323

> UnalignedCheckpointCompatibilityITCase.test failed with AskTimeoutException
> ---
>
> Key: FLINK-20065
> URL: https://issues.apache.org/jira/browse/FLINK-20065
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9362=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=45cc9205-bdb7-5b54-63cd-89fdc0983323
> {code}
> 2020-11-09T22:19:47.2714024Z [ERROR] test[type: SAVEPOINT, startAligned: 
> true](org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase)
>   Time elapsed: 1.293 s  <<< ERROR!
> 2020-11-09T22:19:47.2715260Z java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public default 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.stopWithSavepoint(org.apache.flink.api.common.JobID,java.lang.String,boolean,org.apache.flink.api.common.time.Time)
>  timed out.
> 2020-11-09T22:19:47.2716743Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-11-09T22:19:47.2718213Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-11-09T22:19:47.2719166Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase.runAndTakeSavepoint(UnalignedCheckpointCompatibilityITCase.java:113)
> 2020-11-09T22:19:47.2720278Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointCompatibilityITCase.test(UnalignedCheckpointCompatibilityITCase.java:97)
> 2020-11-09T22:19:47.2721126Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-11-09T22:19:47.2721771Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-11-09T22:19:47.2722773Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-11-09T22:19:47.2723479Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-11-09T22:19:47.2724187Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-11-09T22:19:47.2725026Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-11-09T22:19:47.2725817Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-11-09T22:19:47.2726595Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-11-09T22:19:47.2727515Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-11-09T22:19:47.2728192Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-11-09T22:19:47.2744089Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-11-09T22:19:47.2744907Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-11-09T22:19:47.2745573Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-11-09T22:19:47.2746037Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-11-09T22:19:47.2746445Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-11-09T22:19:47.2746868Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-11-09T22:19:47.2747443Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-11-09T22:19:47.2747876Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-11-09T22:19:47.2748297Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-11-09T22:19:47.2748694Z  at 
> org.junit.runners.Suite.runChild(Suite.java:128)
> 2020-11-09T22:19:47.2749054Z  at 
> org.junit.runners.Suite.runChild(Suite.java:27)
> 2020-11-09T22:19:47.2749414Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-11-09T22:19:47.2749819Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-11-09T22:19:47.2750373Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-11-09T22:19:47.2750923Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-11-09T22:19:47.2751555Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 

[jira] [Commented] (FLINK-17096) Minibatch Group Agg support state ttl

2020-11-10 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229745#comment-17229745
 ] 

Dian Fu commented on FLINK-17096:
-

[~jark] Thanks a lot~

> Minibatch Group Agg support state ttl
> -
>
> Key: FLINK-17096
> URL: https://issues.apache.org/jira/browse/FLINK-17096
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Runtime
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: dalongliu
>Assignee: dalongliu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> At the moment, MiniBatch Group Agg include Local/Global doesn`t support State 
> TTL, for streaming job, it will lead to OOM in long time running, so we need 
> to make state data expire after ttl, the solution is that use incremental 
> cleanup feature refer to FLINK-16581



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-11-10 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229744#comment-17229744
 ] 

Dian Fu commented on FLINK-19630:
-

[~neighborhood] Thanks a lot for the feedback.

[~lzljs3620320] Agree with you ~. Adding some documentation is a good idea.

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.12.0
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-20074) ChangelogSourceITCase.testRegularJoin fail

2020-11-10 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-20074:
---

Assignee: Jark Wu

> ChangelogSourceITCase.testRegularJoin fail
> --
>
> Key: FLINK-20074
> URL: https://issues.apache.org/jira/browse/FLINK-20074
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Jingsong Lee
>Assignee: Jark Wu
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> INSTANCE: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9379=logs=ae4f8708-9994-57d3-c2d7-b892156e7812=e25d5e7e-2a9c-5589-4940-0b638d75a414



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18139) Unaligned checkpoints checks wrong channels for inflight data.

2020-11-10 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229743#comment-17229743
 ] 

Dian Fu commented on FLINK-18139:
-

[~AHeise] Could we close this ticket as the PR is already merged?

> Unaligned checkpoints checks wrong channels for inflight data.
> --
>
> Key: FLINK-18139
> URL: https://issues.apache.org/jira/browse/FLINK-18139
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Arvid Heise
>Assignee: Arvid Heise
>Priority: Major
>  Labels: pull-request-available
>
> CheckpointBarrierUnaligner#hasInflightData is not called with input gate 
> contextual information, such that only the same first few channels are 
> checked during initial snapshotting of inflight data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18456) CompressionFactoryITCase.testWriteCompressedFile fails with "expected:<1> but was:<2>"

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-18456.
---
Resolution: Cannot Reproduce

Closing this ticket for now as it has not occurred for almost half a year.

> CompressionFactoryITCase.testWriteCompressedFile fails with "expected:<1> but 
> was:<2>"
> --
>
> Key: FLINK-18456
> URL: https://issues.apache.org/jira/browse/FLINK-18456
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> Instance on the master: 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4123=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf]
> {code:java}
> 2020-06-30T07:59:56.3844746Z [INFO] Running 
> org.apache.flink.formats.compress.CompressionFactoryITCase
> 2020-06-30T08:00:00.3083101Z [ERROR] Tests run: 1, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 3.921 s <<< FAILURE! - in 
> org.apache.flink.formats.compress.CompressionFactoryITCase
> 2020-06-30T08:00:00.3084778Z [ERROR] 
> testWriteCompressedFile(org.apache.flink.formats.compress.CompressionFactoryITCase)
>   Time elapsed: 1.191 s  <<< FAILURE!
> 2020-06-30T08:00:00.3085932Z java.lang.AssertionError: expected:<1> but 
> was:<2>
> 2020-06-30T08:00:00.3086694Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-06-30T08:00:00.3087435Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-06-30T08:00:00.3088250Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-06-30T08:00:00.3089022Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-06-30T08:00:00.3090188Z  at 
> org.apache.flink.formats.compress.CompressionFactoryITCase.validateResults(CompressionFactoryITCase.java:106)
> 2020-06-30T08:00:00.3091575Z  at 
> org.apache.flink.formats.compress.CompressionFactoryITCase.testWriteCompressedFile(CompressionFactoryITCase.java:90)
> 2020-06-30T08:00:00.3092751Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-06-30T08:00:00.3093687Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-06-30T08:00:00.3094801Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-06-30T08:00:00.3095784Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-06-30T08:00:00.3096737Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-06-30T08:00:00.3097863Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-06-30T08:00:00.3099212Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-06-30T08:00:00.3100380Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-06-30T08:00:00.3101557Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-06-30T08:00:00.3102957Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-06-30T08:00:00.3104026Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-06-30T08:00:00.3104859Z  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18698) org.apache.flink.sql.parser.utils.ParserResource compile error

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-18698.
---
Resolution: Invalid

Closing this ticket as it seems not a bug.

> org.apache.flink.sql.parser.utils.ParserResource compile error
> --
>
> Key: FLINK-18698
> URL: https://issues.apache.org/jira/browse/FLINK-18698
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: 毛宗良
>Priority: Major
> Attachments: image-2020-07-24-11-42-09-880.png
>
>
> org.apache.flink.sql.parser.utils.ParserResource in flink-sql-parser import 
> org.apache.flink.sql.parser.impl.ParseException which could not find.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17096) Minibatch Group Agg support state ttl

2020-11-10 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229738#comment-17229738
 ] 

Jark Wu commented on FLINK-17096:
-

Yes [~dian.fu]. I will review and merge this pull request these days. 

> Minibatch Group Agg support state ttl
> -
>
> Key: FLINK-17096
> URL: https://issues.apache.org/jira/browse/FLINK-17096
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Runtime
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: dalongliu
>Assignee: dalongliu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> At the moment, MiniBatch Group Agg include Local/Global doesn`t support State 
> TTL, for streaming job, it will lead to OOM in long time running, so we need 
> to make state data expire after ttl, the solution is that use incremental 
> cleanup feature refer to FLINK-16581



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19044) Web UI reports incorrect timeline for the FINISHED state of a subtask

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-19044:

Fix Version/s: 1.12.0

> Web UI reports incorrect timeline for the FINISHED state of a subtask
> -
>
> Key: FLINK-19044
> URL: https://issues.apache.org/jira/browse/FLINK-19044
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.12.0
>Reporter: Caizhi Weng
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: timeline.jpg
>
>
> The timeline for the FINISHED state of a subtask is incorrect. The starting 
> time of the FINISHED state is larger than its ending time. See the image 
> below:
>  !timeline.jpg! 
> I discover this bug when running the TPCDS benchmark, but I think it can be 
> reproduced by arbitrary tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-11-10 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229736#comment-17229736
 ] 

Jingsong Lee commented on FLINK-19630:
--

[~lirui] I think we can document this.

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.12.0
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19253) SourceReaderTestBase.testAddSplitToExistingFetcher hangs

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-19253:

Fix Version/s: 1.12.0

> SourceReaderTestBase.testAddSplitToExistingFetcher hangs
> 
>
> Key: FLINK-19253
> URL: https://issues.apache.org/jira/browse/FLINK-19253
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Xuannan Su
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6521=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf
> {code}
> 2020-09-15T10:51:35.5236837Z "SourceFetcher" #39 prio=5 os_prio=0 
> tid=0x7f70d0a57000 nid=0x858 in Object.wait() [0x7f6fd81f]
> 2020-09-15T10:51:35.5237447Zjava.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-09-15T10:51:35.5237962Z  at java.lang.Object.wait(Native Method)
> 2020-09-15T10:51:35.5238886Z  - waiting on <0xc27f5be8> (a 
> java.util.ArrayDeque)
> 2020-09-15T10:51:35.5239380Z  at java.lang.Object.wait(Object.java:502)
> 2020-09-15T10:51:35.5240401Z  at 
> org.apache.flink.connector.base.source.reader.mocks.TestingSplitReader.fetch(TestingSplitReader.java:52)
> 2020-09-15T10:51:35.5241471Z  - locked <0xc27f5be8> (a 
> java.util.ArrayDeque)
> 2020-09-15T10:51:35.5242180Z  at 
> org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
> 2020-09-15T10:51:35.5243245Z  at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:128)
> 2020-09-15T10:51:35.5244263Z  at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:95)
> 2020-09-15T10:51:35.5245128Z  at 
> org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:42)
> 2020-09-15T10:51:35.5245973Z  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 2020-09-15T10:51:35.5247081Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-09-15T10:51:35.5247816Z  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-09-15T10:51:35.5248809Z  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-09-15T10:51:35.5249463Z  at java.lang.Thread.run(Thread.java:748)
> 2020-09-15T10:51:35.5249827Z 
> 2020-09-15T10:51:35.5250383Z "SourceFetcher" #37 prio=5 os_prio=0 
> tid=0x7f70d0a4b000 nid=0x856 in Object.wait() [0x7f6f80cfa000]
> 2020-09-15T10:51:35.5251124Zjava.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-09-15T10:51:35.5251636Z  at java.lang.Object.wait(Native Method)
> 2020-09-15T10:51:35.5252767Z  - waiting on <0xc298d0b8> (a 
> java.util.ArrayDeque)
> 2020-09-15T10:51:35.5253336Z  at java.lang.Object.wait(Object.java:502)
> 2020-09-15T10:51:35.5254184Z  at 
> org.apache.flink.connector.base.source.reader.mocks.TestingSplitReader.fetch(TestingSplitReader.java:52)
> 2020-09-15T10:51:35.5255220Z  - locked <0xc298d0b8> (a 
> java.util.ArrayDeque)
> 2020-09-15T10:51:35.5255678Z  at 
> org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
> 2020-09-15T10:51:35.5256235Z  at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:128)
> 2020-09-15T10:51:35.5256803Z  at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:95)
> 2020-09-15T10:51:35.5257351Z  at 
> org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:42)
> 2020-09-15T10:51:35.5257838Z  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 2020-09-15T10:51:35.5258284Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-09-15T10:51:35.5258856Z  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-09-15T10:51:35.5259350Z  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-09-15T10:51:35.5260011Z  at java.lang.Thread.run(Thread.java:748)
> 2020-09-15T10:51:35.5260211Z 
> 2020-09-15T10:51:35.5260574Z "process reaper" #24 daemon prio=10 os_prio=0 
> tid=0x7f6f70042000 nid=0x844 waiting on condition [0x7f6fd832a000]
> 2020-09-15T10:51:35.5261036Zjava.lang.Thread.State: TIMED_WAITING 
> (parking)
> 2020-09-15T10:51:35.5261342Z  at sun.misc.Unsafe.park(Native Method)
> 2020-09-15T10:51:35.5261972Z  - parking to wait for  <0x815d0810> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
> 2020-09-15T10:51:35.5262456Z  at 
> 

[jira] [Updated] (FLINK-19982) AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg fails with "RuntimeException: Job restarted"

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-19982:

Fix Version/s: 1.12.0

> AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg fails with 
> "RuntimeException: Job restarted"
> ---
>
> Key: FLINK-19982
> URL: https://issues.apache.org/jira/browse/FLINK-19982
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/georgeryan1322/Flink/_build/results?buildId=336=logs=a1590513-d0ea-59c3-3c7b-aad756c48f25=5129dea2-618b-5c74-1b8f-9ec63a37a8a6
> {code}
> [ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 59.688 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase
> [ERROR] 
> testSingleAggOnTable_SortAgg(org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase)
>   Time elapsed: 2.789 s  <<< ERROR!
> java.lang.RuntimeException: Job restarted
>   at 
> org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:41)
>   at 
> org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:127)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>   at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114)
>   at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:298)
>   at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:138)
>   at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:104)
>   at 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable(AggregateReduceGroupingITCase.scala:153)
>   at 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg(AggregateReduceGroupingITCase.scala:122)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}
> In the logs, I find occurrences of this:
> {code}
> 16:37:49,262 [main] WARN  
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher [] - An 
> exception occurs when fetching query results
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.dispatcher.UnavailableDispatcherOperationException: 
> Unable to get JobMasterGateway for initializing job. The requested operation 
> is not available while the JobManager is initializing.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> ~[?:1.8.0_242]
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
> ~[?:1.8.0_242]
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sendRequest(CollectResultFetcher.java:163)
>  ~[flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:134)
>  [flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>  [flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>  [flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>  [classes/:?]
>   at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>  [flink-table-api-java-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 

[jira] [Updated] (FLINK-19989) Add collect operation in Python DataStream API

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-19989:

Fix Version/s: 1.13.0

> Add collect operation in Python DataStream API
> --
>
> Key: FLINK-19989
> URL: https://issues.apache.org/jira/browse/FLINK-19989
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> DataStream.executeAndCollect() has already been supported in FLINK-19508. We 
> should also support it in the Python DataStream API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20030) Barrier announcement causes outdated RemoteInputChannel#numBuffersOvertaken

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-20030:

Fix Version/s: 1.12.0

> Barrier announcement causes outdated RemoteInputChannel#numBuffersOvertaken
> ---
>
> Key: FLINK-20030
> URL: https://issues.apache.org/jira/browse/FLINK-20030
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Network
>Affects Versions: 1.12.0
>Reporter: Arvid Heise
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> {{numBuffersOvertaken}} is set when the announcement is enqueued, but it can 
> take quite a while until the checkpoint is actually started with quite a 
> non-priority buffers being drained.
> We should move away from {{numBuffersOvertaken}} and use sequence numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-11-10 Thread Lsw_aka_laplace (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229732#comment-17229732
 ] 

Lsw_aka_laplace commented on FLINK-19630:
-

[~dian.fu]

Hi, Actually I have solved this problem from my side. But I'm not sure if flink 
need to do sth to solve this problem cuz  locally patch on hive-exec may not  
be a good solution for everyone. I'm +1 for closing this issue, while you'd 
better ask [~lzljs3620320] and [~lirui] for this~

Thx~

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.12.0
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-20066) BatchPandasUDAFITTests.test_group_aggregate_with_aux_group unstable

2020-11-10 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-20066.
---
Resolution: Fixed

This should have been address in FLINK-19842

> BatchPandasUDAFITTests.test_group_aggregate_with_aux_group unstable
> ---
>
> Key: FLINK-20066
> URL: https://issues.apache.org/jira/browse/FLINK-20066
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9361=logs=bdd9ea51-4de2-506a-d4d9-f3930e4d2355=98717c4f-b888-5636-bb1e-db7aca25755e
> {code}
> 2020-11-09T23:41:41.8853547Z === FAILURES 
> ===
> 2020-11-09T23:41:41.8854000Z __ 
> BatchPandasUDAFITTests.test_group_aggregate_with_aux_group __
> 2020-11-09T23:41:41.8854324Z 
> 2020-11-09T23:41:41.8854647Z self = 
>  testMethod=test_group_aggregate_with_aux_group>
> 2020-11-09T23:41:41.8854956Z 
> 2020-11-09T23:41:41.8855205Z def 
> test_group_aggregate_with_aux_group(self):
> 2020-11-09T23:41:41.8855521Z t = self.t_env.from_elements(
> 2020-11-09T23:41:41.8858372Z [(1, 2, 3), (3, 2, 3), (2, 1, 3), 
> (1, 5, 4), (1, 8, 6), (2, 3, 4)],
> 2020-11-09T23:41:41.8858807Z DataTypes.ROW(
> 2020-11-09T23:41:41.8859091Z [DataTypes.FIELD("a", 
> DataTypes.TINYINT()),
> 2020-11-09T23:41:41.8859453Z  DataTypes.FIELD("b", 
> DataTypes.SMALLINT()),
> 2020-11-09T23:41:41.8859783Z  DataTypes.FIELD("c", 
> DataTypes.INT())]))
> 2020-11-09T23:41:41.8860012Z 
> 2020-11-09T23:41:41.8860255Z table_sink = 
> source_sink_utils.TestAppendSink(
> 2020-11-09T23:41:41.8863051Z ['a', 'b', 'c', 'd'],
> 2020-11-09T23:41:41.8863523Z [DataTypes.TINYINT(), 
> DataTypes.INT(), DataTypes.FLOAT(), DataTypes.INT()])
> 2020-11-09T23:41:41.8864048Z 
> self.t_env.register_table_sink("Results", table_sink)
> 2020-11-09T23:41:41.8864715Z 
> self.t_env.get_config().get_configuration().set_string('python.metric.enabled',
>  'true')
> 2020-11-09T23:41:41.8865161Z self.t_env.register_function("max_add", 
> udaf(MaxAdd(),
> 2020-11-09T23:41:41.8865573Z  
> result_type=DataTypes.INT(),
> 2020-11-09T23:41:41.8865999Z  
> func_type="pandas"))
> 2020-11-09T23:41:41.8866426Z 
> self.t_env.create_temporary_system_function("mean_udaf", mean_udaf)
> 2020-11-09T23:41:41.8866759Z t.group_by("a") \
> 2020-11-09T23:41:41.8867052Z .select("a, a + 1 as b, a + 2 as c") 
> \
> 2020-11-09T23:41:41.8867352Z .group_by("a, b") \
> 2020-11-09T23:41:41.8867660Z .select("a, b, mean_udaf(b), 
> max_add(b, c, 1)") \
> 2020-11-09T23:41:41.8868026Z >   .execute_insert("Results") \
> 2020-11-09T23:41:41.8868293Z .wait()
> 2020-11-09T23:41:41.8868446Z 
> 2020-11-09T23:41:41.8868704Z pyflink/table/tests/test_pandas_udaf.py:95: 
> 2020-11-09T23:41:41.8869077Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> 2020-11-09T23:41:41.8869464Z pyflink/table/table_result.py:76: in wait
> 2020-11-09T23:41:41.8870150Z get_method(self._j_table_result, "await")()
> 2020-11-09T23:41:41.8870850Z 
> .tox/py37-cython/lib/python3.7/site-packages/py4j/java_gateway.py:1286: in 
> __call__
> 2020-11-09T23:41:41.8871415Z answer, self.gateway_client, self.target_id, 
> self.name)
> 2020-11-09T23:41:41.8871768Z pyflink/util/exceptions.py:147: in deco
> 2020-11-09T23:41:41.8872032Z return f(*a, **kw)
> 2020-11-09T23:41:41.8872378Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> 2020-11-09T23:41:41.8872629Z 
> 2020-11-09T23:41:41.8872978Z answer = 'xro24238'
> 2020-11-09T23:41:41.8873296Z gateway_client = 
> 
> 2020-11-09T23:41:41.8873792Z target_id = 'o24237', name = 'await'
> 2020-11-09T23:41:41.8874097Z 
> 2020-11-09T23:41:41.8874433Z def get_return_value(answer, gateway_client, 
> target_id=None, name=None):
> 2020-11-09T23:41:41.8874893Z """Converts an answer received from the 
> Java gateway into a Python object.
> 2020-11-09T23:41:41.8875212Z 
> 2020-11-09T23:41:41.8875515Z For example, string representation of 
> integers are converted to Python
> 2020-11-09T23:41:41.8875922Z integer, string representation of 
> objects are converted to JavaObject
> 2020-11-09T23:41:41.8876255Z instances, etc.
> 2020-11-09T23:41:41.8876584Z 
> 2020-11-09T23:41:41.8876873Z 

[GitHub] [flink] flinkbot edited a comment on pull request #14021: [FLINK-13733][connector/kafka] Make FlinkKafkaInternalProducerITCase more robust

2020-11-10 Thread GitBox


flinkbot edited a comment on pull request #14021:
URL: https://github.com/apache/flink/pull/14021#issuecomment-724820662


   
   ## CI report:
   
   * 685a334e4b6fa71ee9a95c74aaf3b7ba2557d05e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9417)
 
   * 6b1819154e14ebc0c9525536fba5757b136080bc Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9439)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   >