[GitHub] [flink] flinkbot edited a comment on pull request #14434: [DOC]fix to update file_sink & streamfile_sink doc for error code

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14434:
URL: https://github.com/apache/flink/pull/14434#issuecomment-748424188


   
   ## CI report:
   
   * c842f89ec6dc499aae7892af964db77f595d547f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11062)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14434: [DOC]fix to update file_sink & streamfile_sink doc for error code

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14434:
URL: https://github.com/apache/flink/pull/14434#issuecomment-748424188


   
   ## CI report:
   
   * c842f89ec6dc499aae7892af964db77f595d547f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11062)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14434: [DOC]fix to update file_sink & streamfile_sink doc for error code

2020-12-18 Thread GitBox


flinkbot commented on pull request #14434:
URL: https://github.com/apache/flink/pull/14434#issuecomment-748424188


   
   ## CI report:
   
   * c842f89ec6dc499aae7892af964db77f595d547f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14433: [typo] fix kubernetes ha typo

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14433:
URL: https://github.com/apache/flink/pull/14433#issuecomment-748371062


   
   ## CI report:
   
   * 7511cbed092a30ac95caf96f2205bc5f299fb3b7 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11058)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14434: [DOC]fix to update file_sink & streamfile_sink doc for error code

2020-12-18 Thread GitBox


flinkbot commented on pull request #14434:
URL: https://github.com/apache/flink/pull/14434#issuecomment-748423025


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit c842f89ec6dc499aae7892af964db77f595d547f (Sat Dec 19 
05:35:55 UTC 2020)
   
   **Warnings:**
* **Invalid pull request title: No valid Jira ID provided**
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] hehuiyuan opened a new pull request #14434: [DOC]fix to update file_sink & streamfile_sink doc for error code

2020-12-18 Thread GitBox


hehuiyuan opened a new pull request #14434:
URL: https://github.com/apache/flink/pull/14434


   
   ## What is the purpose of the change
   
   There  are some documents error.
   
![image](https://user-images.githubusercontent.com/18002496/102681692-97d8d280-41fe-11eb-98c0-487cf6b4d765.png)
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20662) UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252096#comment-17252096
 ] 

Huang Xingbo commented on FLINK-20662:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11057=logs=219e462f-e75e-506c-3671-5017d866ccf6=4c5dc768-5c82-5ab0-660d-086cb90b76a0

> UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException
> ---
>
> Key: FLINK-20662
> URL: https://issues.apache.org/jira/browse/FLINK-20662
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10988=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56]
> {code:java}
> 2020-12-18T01:01:13.7845549Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 143.951 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> 2020-12-18T01:01:13.7848530Z [ERROR] execute[Parallel cogroup, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 12.725 s  <<< ERROR!
> 2020-12-18T01:01:13.7849231Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-12-18T01:01:13.7849788Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-12-18T01:01:13.7872152Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
> 2020-12-18T01:01:13.7873528Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-12-18T01:01:13.7875322Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-12-18T01:01:13.7875932Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7876475Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7877098Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-12-18T01:01:13.7877732Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-12-18T01:01:13.7878307Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-12-18T01:01:13.7879078Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7879795Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7880333Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
> 2020-12-18T01:01:13.7880834Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-12-18T01:01:13.7881266Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-12-18T01:01:13.7881691Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-12-18T01:01:13.7882146Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-12-18T01:01:13.7882609Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7883142Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-12-18T01:01:13.7883726Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-12-18T01:01:13.7884258Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-12-18T01:01:13.7884741Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-12-18T01:01:13.7885279Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-12-18T01:01:13.7885901Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-12-18T01:01:13.7886455Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-12-18T01:01:13.7886954Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-12-18T01:01:13.7887442Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7887955Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-12-18T01:01:13.7888577Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889215Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889816Z  at 
> 

[jira] [Commented] (FLINK-20476) New File Sink end-to-end test Failed

2020-12-18 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252095#comment-17252095
 ] 

Yun Gao commented on FLINK-20476:
-

It also seems there is always one TaskManager keep iterating on the following 
process:
{code:java}
2020-12-18 07:45:26,101 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close 
JobManager connection for job eccf49f1ff5186c9f6b2e80a02e83d15.
2020-12-18 07:45:26,109 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive slot 
request 73b75d109b147c9e58af72cc55b8c373 for job 
eccf49f1ff5186c9f6b2e80a02e83d15 from resource manager with leader id 
.
2020-12-18 07:45:26,109 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Allocated 
slot for 73b75d109b147c9e58af72cc55b8c373.
2020-12-18 07:45:26,109 INFO  
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Add job 
eccf49f1ff5186c9f6b2e80a02e83d15 for job leader monitoring.
2020-12-18 07:45:26,109 INFO  
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Try to 
register at job manager 
akka.ssl.tcp://flink@localhost:6123/user/rpc/jobmanager_2 with leader id 
----.
2020-12-18 07:45:26,129 INFO  
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Resolved 
JobManager address, beginning registration
2020-12-18 07:45:26,162 INFO  
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Successful 
registration at job manager 
akka.ssl.tcp://flink@localhost:6123/user/rpc/jobmanager_2 for job 
eccf49f1ff5186c9f6b2e80a02e83d15.
2020-12-18 07:45:26,162 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Establish 
JobManager connection for job eccf49f1ff5186c9f6b2e80a02e83d15.
2020-12-18 07:45:26,162 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Offer 
reserved slots to the leader of job eccf49f1ff5186c9f6b2e80a02e83d15.
2020-12-18 07:45:26,174 INFO  
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
TaskSlot(index:0, state:ALLOCATED, resource profile: 
ResourceProfile{cpuCores=1., taskHeapMemory=384.000mb 
(402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
(536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
73b75d109b147c9e58af72cc55b8c373, jobId: eccf49f1ff5186c9f6b2e80a02e83d15).
2020-12-18 07:45:26,184 INFO  
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
eccf49f1ff5186c9f6b2e80a02e83d15 from job leader monitoring.
{code}
and the ResourceManager keep printing
{code:java}
Received resource declaration for job eccf49f1ff5186c9f6b2e80a02e83d15: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=4}]
{code}
Also cc [~xintongsong] [~zhuzh]

 

> New File Sink end-to-end test Failed
> 
>
> Key: FLINK-20476
> URL: https://issues.apache.org/jira/browse/FLINK-20476
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Tests
>Affects Versions: 1.13.0, 1.12.1
>Reporter: Huang Xingbo
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0, 1.12.1
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10502=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529]
> {code}
> 2020-12-03T23:22:43.8578352Z Dec 03 23:22:43 Starting taskexecutor daemon on 
> host fv-az586-109.
> 2020-12-03T23:22:43.8587276Z Dec 03 23:22:43 Waiting for restart to happen
> 2020-12-03T23:22:43.8587669Z Dec 03 23:22:43 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:48.9939434Z Dec 03 23:22:48 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:54.1236439Z Dec 03 23:22:54 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:59.2469617Z Dec 03 23:22:59 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:04.3730041Z Dec 03 23:23:04 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:09.5227739Z Dec 03 23:23:09 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:14.6572986Z Dec 03 23:23:14 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:19.7762483Z Dec 03 23:23:19 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:24.8973187Z Dec 03 23:23:24 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:30.0272934Z Dec 03 23:23:30 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:35.2332771Z Dec 03 23:23:35 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:40.3766421Z Dec 03 23:23:40 Still waiting for restarts. 
> Expected: 1 Current: 0
> 

[jira] [Commented] (FLINK-20329) Elasticsearch7DynamicSinkITCase hangs

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252094#comment-17252094
 ] 

Huang Xingbo commented on FLINK-20329:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11055=logs=ba53eb01-1462-56a3-8e98-0dd97fbcaab5=bfbc6239-57a0-5db0-63f3-41551b4f7d51

> Elasticsearch7DynamicSinkITCase hangs
> -
>
> Key: FLINK-20329
> URL: https://issues.apache.org/jira/browse/FLINK-20329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / ElasticSearch
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10052=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=03dca39c-73e8-5aaf-601d-328ae5c35f20
> {code}
> 2020-11-24T16:04:05.9260517Z [INFO] Running 
> org.apache.flink.streaming.connectors.elasticsearch.table.Elasticsearch7DynamicSinkITCase
> 2020-11-24T16:19:25.5481231Z 
> ==
> 2020-11-24T16:19:25.5483549Z Process produced no output for 900 seconds.
> 2020-11-24T16:19:25.5484064Z 
> ==
> 2020-11-24T16:19:25.5484498Z 
> ==
> 2020-11-24T16:19:25.5484882Z The following Java processes are running (JPS)
> 2020-11-24T16:19:25.5485475Z 
> ==
> 2020-11-24T16:19:25.5694497Z Picked up JAVA_TOOL_OPTIONS: 
> -XX:+HeapDumpOnOutOfMemoryError
> 2020-11-24T16:19:25.7263048Z 16192 surefirebooter5057948964630155904.jar
> 2020-11-24T16:19:25.7263515Z 18566 Jps
> 2020-11-24T16:19:25.7263709Z 959 Launcher
> 2020-11-24T16:19:25.7411148Z 
> ==
> 2020-11-24T16:19:25.7427013Z Printing stack trace of Java process 16192
> 2020-11-24T16:19:25.7427369Z 
> ==
> 2020-11-24T16:19:25.7484365Z Picked up JAVA_TOOL_OPTIONS: 
> -XX:+HeapDumpOnOutOfMemoryError
> 2020-11-24T16:19:26.0848776Z 2020-11-24 16:19:26
> 2020-11-24T16:19:26.0849578Z Full thread dump OpenJDK 64-Bit Server VM 
> (25.275-b01 mixed mode):
> 2020-11-24T16:19:26.0849831Z 
> 2020-11-24T16:19:26.0850185Z "Attach Listener" #32 daemon prio=9 os_prio=0 
> tid=0x7fc148001000 nid=0x48e7 waiting on condition [0x]
> 2020-11-24T16:19:26.0850595Zjava.lang.Thread.State: RUNNABLE
> 2020-11-24T16:19:26.0850814Z 
> 2020-11-24T16:19:26.0851375Z "testcontainers-ryuk" #31 daemon prio=5 
> os_prio=0 tid=0x7fc251232000 nid=0x3fb0 in Object.wait() 
> [0x7fc1012c4000]
> 2020-11-24T16:19:26.0854688Zjava.lang.Thread.State: TIMED_WAITING (on 
> object monitor)
> 2020-11-24T16:19:26.0855379Z  at java.lang.Object.wait(Native Method)
> 2020-11-24T16:19:26.0855844Z  at 
> org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:142)
> 2020-11-24T16:19:26.0857272Z  - locked <0x8e2bd2d0> (a 
> java.util.ArrayList)
> 2020-11-24T16:19:26.0857977Z  at 
> org.testcontainers.utility.ResourceReaper$$Lambda$93/1981729428.run(Unknown 
> Source)
> 2020-11-24T16:19:26.0858471Z  at 
> org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
> 2020-11-24T16:19:26.0858961Z  at 
> org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:133)
> 2020-11-24T16:19:26.0859422Z  at 
> org.testcontainers.utility.ResourceReaper$$Lambda$92/40191541.run(Unknown 
> Source)
> 2020-11-24T16:19:26.0859788Z  at java.lang.Thread.run(Thread.java:748)
> 2020-11-24T16:19:26.0860030Z 
> 2020-11-24T16:19:26.0860371Z "process reaper" #24 daemon prio=10 os_prio=0 
> tid=0x7fc0f803b800 nid=0x3f92 waiting on condition [0x7fc10296e000]
> 2020-11-24T16:19:26.0860913Zjava.lang.Thread.State: TIMED_WAITING 
> (parking)
> 2020-11-24T16:19:26.0861387Z  at sun.misc.Unsafe.park(Native Method)
> 2020-11-24T16:19:26.0862495Z  - parking to wait for  <0x8814bf30> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
> 2020-11-24T16:19:26.0863253Z  at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 2020-11-24T16:19:26.0863760Z  at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
> 2020-11-24T16:19:26.0864274Z  at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
> 2020-11-24T16:19:26.0864762Z  at 
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
> 2020-11-24T16:19:26.0865299Z  at 
> 

[jira] [Created] (FLINK-20679) JobMasterTest.testSlotRequestTimeoutWhenNoSlotOffering test failed

2020-12-18 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-20679:


 Summary: JobMasterTest.testSlotRequestTimeoutWhenNoSlotOffering 
test failed
 Key: FLINK-20679
 URL: https://issues.apache.org/jira/browse/FLINK-20679
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.12.0, 1.13.0
Reporter: Huang Xingbo


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11055=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=19336553-69ec-5b03-471a-791a483cced6]
{code:java}
[ERROR] Failures: 
[ERROR]   JobMasterTest.testSlotRequestTimeoutWhenNoSlotOffering:972 
Expected: a collection with size <1>
 but: collection size was <0>
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20476) New File Sink end-to-end test Failed

2020-12-18 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252093#comment-17252093
 ] 

Yun Gao commented on FLINK-20476:
-

Hi [~hxbks2ks], very thanks for reporting the issues! But the new cases should 
be different from the cases last time, this time the job does not acquired 
enough slots to re-scheduler after failover. Do you know if the testing branch 
have some new changes in resource management or scheduler ? 

> New File Sink end-to-end test Failed
> 
>
> Key: FLINK-20476
> URL: https://issues.apache.org/jira/browse/FLINK-20476
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Tests
>Affects Versions: 1.13.0, 1.12.1
>Reporter: Huang Xingbo
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0, 1.12.1
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10502=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529]
> {code}
> 2020-12-03T23:22:43.8578352Z Dec 03 23:22:43 Starting taskexecutor daemon on 
> host fv-az586-109.
> 2020-12-03T23:22:43.8587276Z Dec 03 23:22:43 Waiting for restart to happen
> 2020-12-03T23:22:43.8587669Z Dec 03 23:22:43 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:48.9939434Z Dec 03 23:22:48 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:54.1236439Z Dec 03 23:22:54 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:59.2469617Z Dec 03 23:22:59 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:04.3730041Z Dec 03 23:23:04 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:09.5227739Z Dec 03 23:23:09 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:14.6572986Z Dec 03 23:23:14 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:19.7762483Z Dec 03 23:23:19 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:24.8973187Z Dec 03 23:23:24 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:30.0272934Z Dec 03 23:23:30 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:35.2332771Z Dec 03 23:23:35 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:40.3766421Z Dec 03 23:23:40 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:45.5103677Z Dec 03 23:23:45 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:50.6382894Z Dec 03 23:23:50 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:55.7908088Z Dec 03 23:23:55 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:00.9276393Z Dec 03 23:24:00 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:06.0966785Z Dec 03 23:24:06 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:11.2497761Z Dec 03 23:24:11 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:16.4118742Z Dec 03 23:24:16 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:21.5640591Z Dec 03 23:24:21 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:26.7080816Z Dec 03 23:24:26 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:31.8460471Z Dec 03 23:24:31 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:36.9640393Z Dec 03 23:24:36 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:42.1055030Z Dec 03 23:24:42 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:47.2399707Z Dec 03 23:24:47 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:52.3555612Z Dec 03 23:24:52 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:57.4903920Z Dec 03 23:24:57 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:02.6275471Z Dec 03 23:25:02 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:07.7481675Z Dec 03 23:25:07 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:12.9002926Z Dec 03 23:25:12 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:18.0361767Z Dec 03 23:25:18 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:23.1776688Z Dec 03 23:25:23 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:28.3029352Z Dec 03 23:25:28 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:33.4175706Z Dec 03 23:25:33 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:38.5444337Z Dec 03 23:25:38 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:43.6699914Z Dec 03 23:25:43 Still waiting for restarts. 
> Expected: 1 Current: 0
> 

[jira] [Commented] (FLINK-20662) UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252090#comment-17252090
 ] 

Huang Xingbo commented on FLINK-20662:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11055=logs=34f41360-6c0d-54d3-11a1-0292a2def1d9=2d56e022-1ace-542f-bf1a-b37dd63243f2

> UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException
> ---
>
> Key: FLINK-20662
> URL: https://issues.apache.org/jira/browse/FLINK-20662
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10988=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56]
> {code:java}
> 2020-12-18T01:01:13.7845549Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 143.951 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> 2020-12-18T01:01:13.7848530Z [ERROR] execute[Parallel cogroup, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 12.725 s  <<< ERROR!
> 2020-12-18T01:01:13.7849231Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-12-18T01:01:13.7849788Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-12-18T01:01:13.7872152Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
> 2020-12-18T01:01:13.7873528Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-12-18T01:01:13.7875322Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-12-18T01:01:13.7875932Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7876475Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7877098Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-12-18T01:01:13.7877732Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-12-18T01:01:13.7878307Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-12-18T01:01:13.7879078Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7879795Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7880333Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
> 2020-12-18T01:01:13.7880834Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-12-18T01:01:13.7881266Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-12-18T01:01:13.7881691Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-12-18T01:01:13.7882146Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-12-18T01:01:13.7882609Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7883142Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-12-18T01:01:13.7883726Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-12-18T01:01:13.7884258Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-12-18T01:01:13.7884741Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-12-18T01:01:13.7885279Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-12-18T01:01:13.7885901Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-12-18T01:01:13.7886455Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-12-18T01:01:13.7886954Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-12-18T01:01:13.7887442Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7887955Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-12-18T01:01:13.7888577Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889215Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889816Z  at 
> 

[jira] [Closed] (FLINK-20666) Fix the deserialized Row losing the field_name information in PyFlink

2020-12-18 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-20666.
---
Fix Version/s: 1.12.1
   1.11.4
   1.13.0
   Resolution: Fixed

Fixed in
- master via 2061b720f6995c0701df390dce767d1e2a9645b2
- release-1.12 via cd4d3f1bed19c5d3223f493a200922734e7f87d3
- release-1.11 via 3c0253936f2e6cf8d4fcb3d5b4c6050de2626d9f

> Fix the deserialized Row losing the field_name information in PyFlink
> -
>
> Key: FLINK-20666
> URL: https://issues.apache.org/jira/browse/FLINK-20666
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Huang Xingbo
>Assignee: Huang Xingbo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0, 1.11.4, 1.12.1
>
>
> Now, the deserialized Row loses the field_name information.
> {code:java}
> @udf(result_type=DataTypes.STRING())
> def get_string_element(my_list):
> my_string = 'xxx'
> for element in my_list:
> if element.integer_element == 2:  # element lost the field_name 
> information
> my_string = element.string_element
> return my_string
> t = t_env.from_elements(
> [("1", [Row(3, "flink")]), ("3", [Row(2, "pyflink")]), ("2", [Row(2, 
> "python")])],
> DataTypes.ROW(
> [DataTypes.FIELD("Key", DataTypes.STRING()),
>  DataTypes.FIELD("List_element",
>  DataTypes.ARRAY(DataTypes.ROW(
>  [DataTypes.FIELD("integer_element", 
> DataTypes.INT()),
>   DataTypes.FIELD("string_element", 
> DataTypes.STRING())])))]))
> print(t.select(get_string_element(t.List_element)).to_pandas())
> {code}
> element lost the field_name information



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dianfu closed pull request #14422: [FLINK-20666][python] Fix the deserialized Row losing the field_name information in PyFlink

2020-12-18 Thread GitBox


dianfu closed pull request #14422:
URL: https://github.com/apache/flink/pull/14422


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-20666) Fix the deserialized Row losing the field_name information in PyFlink

2020-12-18 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-20666:
---

Assignee: Huang Xingbo

> Fix the deserialized Row losing the field_name information in PyFlink
> -
>
> Key: FLINK-20666
> URL: https://issues.apache.org/jira/browse/FLINK-20666
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Huang Xingbo
>Assignee: Huang Xingbo
>Priority: Major
>  Labels: pull-request-available
>
> Now, the deserialized Row loses the field_name information.
> {code:java}
> @udf(result_type=DataTypes.STRING())
> def get_string_element(my_list):
> my_string = 'xxx'
> for element in my_list:
> if element.integer_element == 2:  # element lost the field_name 
> information
> my_string = element.string_element
> return my_string
> t = t_env.from_elements(
> [("1", [Row(3, "flink")]), ("3", [Row(2, "pyflink")]), ("2", [Row(2, 
> "python")])],
> DataTypes.ROW(
> [DataTypes.FIELD("Key", DataTypes.STRING()),
>  DataTypes.FIELD("List_element",
>  DataTypes.ARRAY(DataTypes.ROW(
>  [DataTypes.FIELD("integer_element", 
> DataTypes.INT()),
>   DataTypes.FIELD("string_element", 
> DataTypes.STRING())])))]))
> print(t.select(get_string_element(t.List_element)).to_pandas())
> {code}
> element lost the field_name information



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dianfu merged pull request #14426: [FLINK-20666][python] Fix the deserialized Row losing the field_name information in PyFlink

2020-12-18 Thread GitBox


dianfu merged pull request #14426:
URL: https://github.com/apache/flink/pull/14426


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20476) New File Sink end-to-end test Failed

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252083#comment-17252083
 ] 

Huang Xingbo commented on FLINK-20476:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11044=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529

> New File Sink end-to-end test Failed
> 
>
> Key: FLINK-20476
> URL: https://issues.apache.org/jira/browse/FLINK-20476
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Tests
>Affects Versions: 1.13.0, 1.12.1
>Reporter: Huang Xingbo
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0, 1.12.1
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10502=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529]
> {code}
> 2020-12-03T23:22:43.8578352Z Dec 03 23:22:43 Starting taskexecutor daemon on 
> host fv-az586-109.
> 2020-12-03T23:22:43.8587276Z Dec 03 23:22:43 Waiting for restart to happen
> 2020-12-03T23:22:43.8587669Z Dec 03 23:22:43 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:48.9939434Z Dec 03 23:22:48 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:54.1236439Z Dec 03 23:22:54 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:22:59.2469617Z Dec 03 23:22:59 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:04.3730041Z Dec 03 23:23:04 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:09.5227739Z Dec 03 23:23:09 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:14.6572986Z Dec 03 23:23:14 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:19.7762483Z Dec 03 23:23:19 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:24.8973187Z Dec 03 23:23:24 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:30.0272934Z Dec 03 23:23:30 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:35.2332771Z Dec 03 23:23:35 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:40.3766421Z Dec 03 23:23:40 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:45.5103677Z Dec 03 23:23:45 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:50.6382894Z Dec 03 23:23:50 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:23:55.7908088Z Dec 03 23:23:55 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:00.9276393Z Dec 03 23:24:00 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:06.0966785Z Dec 03 23:24:06 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:11.2497761Z Dec 03 23:24:11 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:16.4118742Z Dec 03 23:24:16 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:21.5640591Z Dec 03 23:24:21 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:26.7080816Z Dec 03 23:24:26 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:31.8460471Z Dec 03 23:24:31 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:36.9640393Z Dec 03 23:24:36 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:42.1055030Z Dec 03 23:24:42 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:47.2399707Z Dec 03 23:24:47 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:52.3555612Z Dec 03 23:24:52 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:24:57.4903920Z Dec 03 23:24:57 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:02.6275471Z Dec 03 23:25:02 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:07.7481675Z Dec 03 23:25:07 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:12.9002926Z Dec 03 23:25:12 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:18.0361767Z Dec 03 23:25:18 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:23.1776688Z Dec 03 23:25:23 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:28.3029352Z Dec 03 23:25:28 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:33.4175706Z Dec 03 23:25:33 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:38.5444337Z Dec 03 23:25:38 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:43.6699914Z Dec 03 23:25:43 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:48.8064066Z Dec 03 23:25:48 Still waiting for restarts. 
> Expected: 1 Current: 0
> 2020-12-03T23:25:53.9376640Z Dec 

[jira] [Commented] (FLINK-20662) UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252082#comment-17252082
 ] 

Huang Xingbo commented on FLINK-20662:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11044=logs=34f41360-6c0d-54d3-11a1-0292a2def1d9=2d56e022-1ace-542f-bf1a-b37dd63243f2

> UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException
> ---
>
> Key: FLINK-20662
> URL: https://issues.apache.org/jira/browse/FLINK-20662
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10988=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56]
> {code:java}
> 2020-12-18T01:01:13.7845549Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 143.951 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> 2020-12-18T01:01:13.7848530Z [ERROR] execute[Parallel cogroup, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 12.725 s  <<< ERROR!
> 2020-12-18T01:01:13.7849231Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-12-18T01:01:13.7849788Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-12-18T01:01:13.7872152Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
> 2020-12-18T01:01:13.7873528Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-12-18T01:01:13.7875322Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-12-18T01:01:13.7875932Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7876475Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7877098Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-12-18T01:01:13.7877732Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-12-18T01:01:13.7878307Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-12-18T01:01:13.7879078Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7879795Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7880333Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
> 2020-12-18T01:01:13.7880834Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-12-18T01:01:13.7881266Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-12-18T01:01:13.7881691Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-12-18T01:01:13.7882146Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-12-18T01:01:13.7882609Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7883142Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-12-18T01:01:13.7883726Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-12-18T01:01:13.7884258Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-12-18T01:01:13.7884741Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-12-18T01:01:13.7885279Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-12-18T01:01:13.7885901Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-12-18T01:01:13.7886455Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-12-18T01:01:13.7886954Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-12-18T01:01:13.7887442Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7887955Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-12-18T01:01:13.7888577Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889215Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889816Z  at 
> 

[jira] [Commented] (FLINK-20662) UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252081#comment-17252081
 ] 

Huang Xingbo commented on FLINK-20662:
--

Hi [~AHeise] It seems that this issue occurs quite frequently, could you help 
to take a look? Thanks.

> UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException
> ---
>
> Key: FLINK-20662
> URL: https://issues.apache.org/jira/browse/FLINK-20662
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10988=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56]
> {code:java}
> 2020-12-18T01:01:13.7845549Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 143.951 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> 2020-12-18T01:01:13.7848530Z [ERROR] execute[Parallel cogroup, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 12.725 s  <<< ERROR!
> 2020-12-18T01:01:13.7849231Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-12-18T01:01:13.7849788Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-12-18T01:01:13.7872152Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
> 2020-12-18T01:01:13.7873528Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-12-18T01:01:13.7875322Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-12-18T01:01:13.7875932Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7876475Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7877098Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-12-18T01:01:13.7877732Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-12-18T01:01:13.7878307Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-12-18T01:01:13.7879078Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7879795Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7880333Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
> 2020-12-18T01:01:13.7880834Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-12-18T01:01:13.7881266Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-12-18T01:01:13.7881691Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-12-18T01:01:13.7882146Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-12-18T01:01:13.7882609Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7883142Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-12-18T01:01:13.7883726Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-12-18T01:01:13.7884258Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-12-18T01:01:13.7884741Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-12-18T01:01:13.7885279Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-12-18T01:01:13.7885901Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-12-18T01:01:13.7886455Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-12-18T01:01:13.7886954Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-12-18T01:01:13.7887442Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7887955Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-12-18T01:01:13.7888577Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889215Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889816Z  at 
> 

[jira] [Commented] (FLINK-20662) UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252080#comment-17252080
 ] 

Huang Xingbo commented on FLINK-20662:
--

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11033=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0]

failed with the "IndexOutOfBoundsException"

> UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException
> ---
>
> Key: FLINK-20662
> URL: https://issues.apache.org/jira/browse/FLINK-20662
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10988=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56]
> {code:java}
> 2020-12-18T01:01:13.7845549Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 143.951 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> 2020-12-18T01:01:13.7848530Z [ERROR] execute[Parallel cogroup, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 12.725 s  <<< ERROR!
> 2020-12-18T01:01:13.7849231Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-12-18T01:01:13.7849788Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-12-18T01:01:13.7872152Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
> 2020-12-18T01:01:13.7873528Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-12-18T01:01:13.7875322Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-12-18T01:01:13.7875932Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7876475Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7877098Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-12-18T01:01:13.7877732Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-12-18T01:01:13.7878307Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-12-18T01:01:13.7879078Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7879795Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7880333Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
> 2020-12-18T01:01:13.7880834Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-12-18T01:01:13.7881266Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-12-18T01:01:13.7881691Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-12-18T01:01:13.7882146Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-12-18T01:01:13.7882609Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7883142Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-12-18T01:01:13.7883726Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-12-18T01:01:13.7884258Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-12-18T01:01:13.7884741Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-12-18T01:01:13.7885279Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-12-18T01:01:13.7885901Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-12-18T01:01:13.7886455Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-12-18T01:01:13.7886954Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-12-18T01:01:13.7887442Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7887955Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-12-18T01:01:13.7888577Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889215Z  at 
> 

[jira] [Commented] (FLINK-20662) UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252078#comment-17252078
 ] 

Huang Xingbo commented on FLINK-20662:
--

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11034=logs=34f41360-6c0d-54d3-11a1-0292a2def1d9=2d56e022-1ace-542f-bf1a-b37dd63243f2]

Same as the failed case of Jark report

> UnalignedCheckpointITCase.execute failed with IndexOutOfBoundsException
> ---
>
> Key: FLINK-20662
> URL: https://issues.apache.org/jira/browse/FLINK-20662
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10988=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56]
> {code:java}
> 2020-12-18T01:01:13.7845549Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 143.951 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> 2020-12-18T01:01:13.7848530Z [ERROR] execute[Parallel cogroup, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 12.725 s  <<< ERROR!
> 2020-12-18T01:01:13.7849231Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-12-18T01:01:13.7849788Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-12-18T01:01:13.7872152Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
> 2020-12-18T01:01:13.7873528Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-12-18T01:01:13.7875322Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-12-18T01:01:13.7875932Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7876475Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7877098Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-12-18T01:01:13.7877732Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-12-18T01:01:13.7878307Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-12-18T01:01:13.7879078Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-12-18T01:01:13.7879795Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-12-18T01:01:13.7880333Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
> 2020-12-18T01:01:13.7880834Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-12-18T01:01:13.7881266Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-12-18T01:01:13.7881691Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-12-18T01:01:13.7882146Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-12-18T01:01:13.7882609Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7883142Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-12-18T01:01:13.7883726Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-12-18T01:01:13.7884258Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-12-18T01:01:13.7884741Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-12-18T01:01:13.7885279Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-12-18T01:01:13.7885901Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-12-18T01:01:13.7886455Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-12-18T01:01:13.7886954Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-12-18T01:01:13.7887442Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-12-18T01:01:13.7887955Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-12-18T01:01:13.7888577Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-12-18T01:01:13.7889215Z  at 
> 

[jira] [Commented] (FLINK-20254) HiveTableSourceITCase.testStreamPartitionReadByCreateTime times out

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252076#comment-17252076
 ] 

Huang Xingbo commented on FLINK-20254:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11034=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf

> HiveTableSourceITCase.testStreamPartitionReadByCreateTime times out
> ---
>
> Key: FLINK-20254
> URL: https://issues.apache.org/jira/browse/FLINK-20254
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Leonard Xu
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9808=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf
> {code}
> 2020-11-19T10:34:23.5591765Z [ERROR] Tests run: 18, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 192.243 s <<< FAILURE! - in 
> org.apache.flink.connectors.hive.HiveTableSourceITCase
> 2020-11-19T10:34:23.5593193Z [ERROR] 
> testStreamPartitionReadByCreateTime(org.apache.flink.connectors.hive.HiveTableSourceITCase)
>   Time elapsed: 120.075 s  <<< ERROR!
> 2020-11-19T10:34:23.5593929Z org.junit.runners.model.TestTimedOutException: 
> test timed out after 12 milliseconds
> 2020-11-19T10:34:23.5594321Z  at java.lang.Thread.sleep(Native Method)
> 2020-11-19T10:34:23.5594777Z  at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sleepBeforeRetry(CollectResultFetcher.java:231)
> 2020-11-19T10:34:23.5595378Z  at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:119)
> 2020-11-19T10:34:23.5596001Z  at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
> 2020-11-19T10:34:23.5596610Z  at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
> 2020-11-19T10:34:23.5597218Z  at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
> 2020-11-19T10:34:23.5597811Z  at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
> 2020-11-19T10:34:23.5598555Z  at 
> org.apache.flink.connectors.hive.HiveTableSourceITCase.fetchRows(HiveTableSourceITCase.java:653)
> 2020-11-19T10:34:23.5599407Z  at 
> org.apache.flink.connectors.hive.HiveTableSourceITCase.testStreamPartitionReadByCreateTime(HiveTableSourceITCase.java:594)
> 2020-11-19T10:34:23.5599982Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-11-19T10:34:23.5600393Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-11-19T10:34:23.5600865Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-11-19T10:34:23.5601300Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-11-19T10:34:23.5601713Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-11-19T10:34:23.5602211Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-11-19T10:34:23.5602688Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-11-19T10:34:23.5603181Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-11-19T10:34:23.5603753Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-11-19T10:34:23.5604308Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-11-19T10:34:23.5604780Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-11-19T10:34:23.5605114Z  at java.lang.Thread.run(Thread.java:748)
> 2020-11-19T10:34:23.5605299Z 
> 2020-11-19T10:34:24.4180149Z [INFO] Running 
> org.apache.flink.connectors.hive.TableEnvHiveConnectorITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20389) UnalignedCheckpointITCase failure caused by NullPointerException

2020-12-18 Thread Huang Xingbo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252075#comment-17252075
 ] 

Huang Xingbo commented on FLINK-20389:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11028=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0

> UnalignedCheckpointITCase failure caused by NullPointerException
> 
>
> Key: FLINK-20389
> URL: https://issues.apache.org/jira/browse/FLINK-20389
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Matthias
>Assignee: Arvid Heise
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
> Attachments: FLINK-20389-failure.log
>
>
> [Build|https://dev.azure.com/mapohl/flink/_build/results?buildId=118=results]
>  failed due to {{UnalignedCheckpointITCase}} caused by a 
> {{NullPointerException}}:
> {code:java}
> Test execute[Parallel cogroup, p = 
> 10](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase) failed 
> with:
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
>   at akka.dispatch.OnComplete.internal(Future.scala:264)
>   at akka.dispatch.OnComplete.internal(Future.scala:261)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>   at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
> 

[GitHub] [flink] flinkbot edited a comment on pull request #14433: [typo] fix kubernetes ha typo

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14433:
URL: https://github.com/apache/flink/pull/14433#issuecomment-748371062


   
   ## CI report:
   
   * 7511cbed092a30ac95caf96f2205bc5f299fb3b7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11058)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14433: [typo] fix kubernetes ha typo

2020-12-18 Thread GitBox


flinkbot commented on pull request #14433:
URL: https://github.com/apache/flink/pull/14433#issuecomment-748371062


   
   ## CI report:
   
   * 7511cbed092a30ac95caf96f2205bc5f299fb3b7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14433: [typo] fix kubernetes ha typo

2020-12-18 Thread GitBox


flinkbot commented on pull request #14433:
URL: https://github.com/apache/flink/pull/14433#issuecomment-748368610


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 7511cbed092a30ac95caf96f2205bc5f299fb3b7 (Fri Dec 18 
23:17:51 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **Invalid pull request title: No valid Jira ID provided**
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] casidiablo opened a new pull request #14433: [typo] fix kubernetes ha typo

2020-12-18 Thread GitBox


casidiablo opened a new pull request #14433:
URL: https://github.com/apache/flink/pull/14433


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-docker] tianon commented on pull request #49: [FLINK-20650] Rename "native-k8s" command to a more general name in docker-entrypoint.sh

2020-12-18 Thread GitBox


tianon commented on pull request #49:
URL: https://github.com/apache/flink-docker/pull/49#issuecomment-748346905


   `docker run flink run bash` just to get a shell that has useful environment 
variables is a bit convoluted/awkward -- are these variables harmful for normal 
commands?  Given that a shell still has to be used in order to actually use the 
variables, why not just use `bash`, or better yet, just set the variables 
unilaterally for all non-"commands" so that `docker run flink bash -c '...'` 
will just DTRT?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14432: [FLINK-20678] Let JobManagerRunnerImpl fail if the JobMasterService fails unexpectedly

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14432:
URL: https://github.com/apache/flink/pull/14432#issuecomment-748196526


   
   ## CI report:
   
   * d2192e5c0d882ef4d475b2f55b0623673026fb7d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11051)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14431: [FLINK-11719] Do not reuse JobMaster instances across leader sessions

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14431:
URL: https://github.com/apache/flink/pull/14431#issuecomment-748196435


   
   ## CI report:
   
   * aec1ac97effc6607d3fb4f94fe323c86c20759a8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11050)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14377: [FLINK-19905][Connector][jdbc] The Jdbc-connector's 'lookup.max-retries' option initial value is 1 in JdbcLookupFunction

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14377:
URL: https://github.com/apache/flink/pull/14377#issuecomment-744303277


   
   ## CI report:
   
   * ba3f9a13b98b5ddaee1dcf142ee3cf00322fe5c0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11052)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14430: [FLINK-20677] Introduce proper JobManagerRunnerResult

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14430:
URL: https://github.com/apache/flink/pull/14430#issuecomment-748196338


   
   ## CI report:
   
   * 2996e3ffa70a4caa7437c9699e1cbc93ebfe488a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11049)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14423: [FLINK-20673][table-planner-blink] ExecNode#getOutputType method should return LogicalType instead of RowType

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14423:
URL: https://github.com/apache/flink/pull/14423#issuecomment-748013167


   
   ## CI report:
   
   * feca8bda1976f237787ea3219b567bff442e4568 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11043)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20433) UnalignedCheckpointTestBase.execute failed with "TestTimedOutException: test timed out after 300 seconds"

2020-12-18 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251965#comment-17251965
 ] 

Arvid Heise commented on FLINK-20433:
-

Stabilized test in master as d07140c7a2dfe33a067454a7a802e25e81c0d168 and 1.12 
as a4fc365af1a08be74b2ae12b65f97568d6a8. Note that it reveaks the rare 
FLINK-20654.

> UnalignedCheckpointTestBase.execute failed with "TestTimedOutException: test 
> timed out after 300 seconds"
> -
>
> Key: FLINK-20433
> URL: https://issues.apache.org/jira/browse/FLINK-20433
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Dian Fu
>Assignee: Arvid Heise
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10353=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> 2020-12-01T01:33:37.8672135Z [ERROR] execute[Parallel cogroup, p = 
> 10](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 300.308 s  <<< ERROR!
> 2020-12-01T01:33:37.8672736Z org.junit.runners.model.TestTimedOutException: 
> test timed out after 300 seconds
> 2020-12-01T01:33:37.8673110Z  at sun.misc.Unsafe.park(Native Method)
> 2020-12-01T01:33:37.8673463Z  at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2020-12-01T01:33:37.8673951Z  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2020-12-01T01:33:37.8674429Z  at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> 2020-12-01T01:33:37.8686627Z  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2020-12-01T01:33:37.8687167Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-12-01T01:33:37.8687859Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1842)
> 2020-12-01T01:33:37.8696554Z  at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:70)
> 2020-12-01T01:33:37.8697226Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1822)
> 2020-12-01T01:33:37.8697885Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1804)
> 2020-12-01T01:33:37.8698514Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:122)
> 2020-12-01T01:33:37.8699131Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:159)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise merged pull request #14421: [FLINK-20433][tests] Stabilizing UnalignedCheckpointITCase. [1.12]

2020-12-18 Thread GitBox


AHeise merged pull request #14421:
URL: https://github.com/apache/flink/pull/14421


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14429: [FLINK-20623][table-planner-blink] Introduce StreamPhysicalWatermarkAssigner, and make StreamExecWatermarkAssigner only extended from

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14429:
URL: https://github.com/apache/flink/pull/14429#issuecomment-748082234


   
   ## CI report:
   
   * 2480c4158f646cef362efb050fe60356c80bcd0c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11041)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14312: [FLINK-20491] Support Broadcast State in BATCH execution mode

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14312:
URL: https://github.com/apache/flink/pull/14312#issuecomment-738876739


   
   ## CI report:
   
   * 091ba28c2182acfbb79b8cf0c0d6b0b2d40784ed Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11042)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14427: [FLINK-20608][table-planner-blink] Separate the implementation of BatchExecLegacyTableSourceScan and StreamExecLegacyTableSourceScan

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14427:
URL: https://github.com/apache/flink/pull/14427#issuecomment-748081998


   
   ## CI report:
   
   * ac114d186758867e905eb1cd891a9a52279ed375 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11039)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20672) CheckpointAborted RPC failure can fail JM

2020-12-18 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251928#comment-17251928
 ] 

Roman Khachatryan commented on FLINK-20672:
---

Thanks for the confirmation [~yunta].

> CheckpointAborted RPC failure can fail JM
> -
>
> Key: FLINK-20672
> URL: https://issues.apache.org/jira/browse/FLINK-20672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Roman Khachatryan
>Priority: Major
>
> Introduced in FLINK-8871, aborted RPC notifications are done asynchonously:
>  
> {code}
>   private void sendAbortedMessages(long checkpointId, long timeStamp) {
>   // send notification of aborted checkpoints asynchronously.
>   executor.execute(() -> {
>   // send the "abort checkpoint" messages to necessary 
> vertices.
> // ..
>   });
>   }
> {code}
> However, the executor that eventually executes this request is created as 
> follows
> {code}
>   final ScheduledExecutorService futureExecutor = 
> Executors.newScheduledThreadPool(
>   Hardware.getNumberCPUCores(),
>   new ExecutorThreadFactory("jobmanager-future"));
> {code}
> ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error.
> cc: [~yunta]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20672) CheckpointAborted RPC failure can fail JM

2020-12-18 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-20672:
--
Fix Version/s: 1.13.0

> CheckpointAborted RPC failure can fail JM
> -
>
> Key: FLINK-20672
> URL: https://issues.apache.org/jira/browse/FLINK-20672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Roman Khachatryan
>Priority: Major
> Fix For: 1.13.0
>
>
> Introduced in FLINK-8871, aborted RPC notifications are done asynchonously:
>  
> {code}
>   private void sendAbortedMessages(long checkpointId, long timeStamp) {
>   // send notification of aborted checkpoints asynchronously.
>   executor.execute(() -> {
>   // send the "abort checkpoint" messages to necessary 
> vertices.
> // ..
>   });
>   }
> {code}
> However, the executor that eventually executes this request is created as 
> follows
> {code}
>   final ScheduledExecutorService futureExecutor = 
> Executors.newScheduledThreadPool(
>   Hardware.getNumberCPUCores(),
>   new ExecutorThreadFactory("jobmanager-future"));
> {code}
> ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error.
> cc: [~yunta]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14402: [FLINK-20622][table-planner-blink] Introduce StreamPhysicalChangelogNormalize, and make StreamExecChangelogNormalize only extended fr

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14402:
URL: https://github.com/apache/flink/pull/14402#issuecomment-74600


   
   ## CI report:
   
   * 64e05a309f50760e1f821402892b36b3c8e58133 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11037)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14400: [FLINK-20610][table-planner-blink] Separate the implementation of BatchExecCalc and StreamExecCalc

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14400:
URL: https://github.com/apache/flink/pull/14400#issuecomment-745936223


   
   ## CI report:
   
   * d37a896391cd2b590044d3ec1074798cb01bdbed Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11036)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14426: [FLINK-20666][python] Fix the deserialized Row losing the field_name information in PyFlink

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14426:
URL: https://github.com/apache/flink/pull/14426#issuecomment-748081897


   
   ## CI report:
   
   * acccd3653ec2ae872671b5289ca4082e2ab9107b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11038)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14398: [FLINK-20516][table-planner-blink] Separate the implementation of BatchExecTableSourceScan and StreamExecTableSourceScan

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14398:
URL: https://github.com/apache/flink/pull/14398#issuecomment-745762402


   
   ## CI report:
   
   * ebca7dda6bb09c98710b8f34a4e5b3cd2b156f0a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11035)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14387:
URL: https://github.com/apache/flink/pull/14387#issuecomment-745165624


   
   ## CI report:
   
   * 4333cc2789d8eccf1ddc58a3c79ab0880169fab6 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11053)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20552) JdbcDynamicTableSink doesn't sink buffered data on checkpoint

2020-12-18 Thread mei jie (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251912#comment-17251912
 ] 

mei jie commented on FLINK-20552:
-

Hi, [~jark] 
Too busy these days, I will  fix this at this weekend.

> JdbcDynamicTableSink doesn't sink buffered data on checkpoint
> -
>
> Key: FLINK-20552
> URL: https://issues.apache.org/jira/browse/FLINK-20552
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: mei jie
>Assignee: mei jie
>Priority: Major
>  Labels: starter
> Fix For: 1.13.0
>
>
> JdbcBatchingOutputFormat  is wrapped to OutputFormatSinkFunction``` when 
> createSinkTransformation at CommonPhysicalSink class. but 
> OutputFormatSinkFunction don't implement CheckpointedFunction interface, so 
> the flush method of JdbcBatchingOutputFormat can't be called  when checkpoint



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14425: [FLINK-19773][runtime] Implement ExponentialDelayStrategy

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14425:
URL: https://github.com/apache/flink/pull/14425#issuecomment-748063383


   
   ## CI report:
   
   * 0b24ca071bed9d436d66a2362c5a93efdb049e11 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11032)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14387:
URL: https://github.com/apache/flink/pull/14387#issuecomment-745165624


   
   ## CI report:
   
   * 3556926bfe3d10e73d4700f295c7343e7b42509e Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11048)
 
   * 4333cc2789d8eccf1ddc58a3c79ab0880169fab6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11053)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14377: [FLINK-19905][Connector][jdbc] The Jdbc-connector's 'lookup.max-retries' option initial value is 1 in JdbcLookupFunction

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14377:
URL: https://github.com/apache/flink/pull/14377#issuecomment-744303277


   
   ## CI report:
   
   * ffcb3c699eb099caccb20aa38f372557e8a59306 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11003)
 
   * ba3f9a13b98b5ddaee1dcf142ee3cf00322fe5c0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11052)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14423: [FLINK-20673][table-planner-blink] ExecNode#getOutputType method should return LogicalType instead of RowType

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14423:
URL: https://github.com/apache/flink/pull/14423#issuecomment-748013167


   
   ## CI report:
   
   * c1507f5fd4747da2a8d3434b2cdbc883a1d332f2 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11031)
 
   * feca8bda1976f237787ea3219b567bff442e4568 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11043)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14432: [FLINK-20678] Let JobManagerRunnerImpl fail if the JobMasterService fails unexpectedly

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14432:
URL: https://github.com/apache/flink/pull/14432#issuecomment-748196526


   
   ## CI report:
   
   * d2192e5c0d882ef4d475b2f55b0623673026fb7d Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11051)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14431: [FLINK-11719] Do not reuse JobMaster instances across leader sessions

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14431:
URL: https://github.com/apache/flink/pull/14431#issuecomment-748196435


   
   ## CI report:
   
   * aec1ac97effc6607d3fb4f94fe323c86c20759a8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11050)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14430: [FLINK-20677] Introduce proper JobManagerRunnerResult

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14430:
URL: https://github.com/apache/flink/pull/14430#issuecomment-748196338


   
   ## CI report:
   
   * 2996e3ffa70a4caa7437c9699e1cbc93ebfe488a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11049)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14422: [FLINK-20666][python] Fix the deserialized Row losing the field_name information in PyFlink

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14422:
URL: https://github.com/apache/flink/pull/14422#issuecomment-747979046


   
   ## CI report:
   
   * 6496e62d787b56d01e9053fa4f59e2ec5ac9077e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11030)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14377: [FLINK-19905][Connector][jdbc] The Jdbc-connector's 'lookup.max-retries' option initial value is 1 in JdbcLookupFunction

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14377:
URL: https://github.com/apache/flink/pull/14377#issuecomment-744303277


   
   ## CI report:
   
   * ffcb3c699eb099caccb20aa38f372557e8a59306 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11003)
 
   * ba3f9a13b98b5ddaee1dcf142ee3cf00322fe5c0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream

2020-12-18 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251864#comment-17251864
 ] 

Arvid Heise commented on FLINK-20654:
-

For my investigation, I added a bunch of info statements to track which buffers 
are written.
https://github.com/AHeise/flink/tree/FLINK-20654

For [Parallel union, p = 5], I noticed that the issues arises only when 
multiple buffers of the same channel are recovered.


{noformat}
19749 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 8 bytes
19749 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=1, 
inputChannelIdx=3} recovered 9 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=2, 
inputChannelIdx=3} recovered 1 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=3, 
inputChannelIdx=3} recovered 14 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4096 bytes
19750 [channel-state-unspilling-thread-1] INFO  
org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel [] 
- Flat Map (4/5)#5 
(de5cfe0d40797af545b28a5c2994ca79)/InputChannelInfo{gateIdx=0, 
inputChannelIdx=3} recovered 4 bytes
19750 [Flat Map (4/5)#5] INFO  
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput [] - 
InputChannelInfo{gateIdx=1, inputChannelIdx=3} prepareSnapshot 9 bytes
19750 [Flat Map (4/5)#5] INFO  
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput [] - 
InputChannelInfo{gateIdx=2, inputChannelIdx=3} prepareSnapshot 1 bytes
19750 [Flat Map (4/5)#5] INFO  
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput [] - 
InputChannelInfo{gateIdx=3, inputChannelIdx=3} prepareSnapshot 14 bytes
19750 [Flat Map (1/5)#5] INFO  
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput [] - 
InputChannelInfo{gateIdx=0, inputChannelIdx=0} prepareSnapshot 17 bytes
19750 [Flat Map (1/5)#5] INFO  
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput [] - 
InputChannelInfo{gateIdx=1, inputChannelIdx=0} prepareSnapshot 13 bytes
19750 [Flat Map (1/5)#5] INFO  

[jira] [Commented] (FLINK-20664) Support setting service account for TaskManager pod

2020-12-18 Thread Boris Lublinsky (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251863#comment-17251863
 ] 

Boris Lublinsky commented on FLINK-20664:
-

I think so. This gives me the flexibility. I can decide to make it the same as 
job manager or different

> Support setting service account for TaskManager pod
> ---
>
> Key: FLINK-20664
> URL: https://issues.apache.org/jira/browse/FLINK-20664
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Fix For: 1.13.0, 1.12.1
>
>
> Currently, we only set the service account for JobManager. The TaskManager is 
> using the default service account. Before the KubernetesHAService is 
> introduced, it works because the TaskManager does not need to access the K8s 
> resource(e.g. ConfigMap) directly. But now the TaskManager needs to watch 
> ConfigMap and retrieve leader address. So if the default service account does 
> not have enough permission, users could not specify a valid service account 
> for TaskManager.
>  
> We should introduce a new config option for TaskManager service account. 
> {{kubernetes.taskmanager.service-account}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream

2020-12-18 Thread Arvid Heise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise updated FLINK-20654:

Description: 
Fix of FLINK-20433 shows potential corruption after recovery for all variations 
of UnalignedCheckpointITCase.

To reproduce, run UCITCase a couple hundreds times. The issue showed for me in:
- execute [Parallel union, p = 5]
- execute [Parallel union, p = 10]
- execute [Parallel cogroup, p = 5]
- execute [parallel pipeline with remote channels, p = 5]
with decreasing frequency.

The issue manifests as one of the following issues:
- stream corrupted exception
- EOF exception
- assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
- (for union) ArithmeticException overflow (because the number that should be 
[0;10] has been mis-deserialized)

  was:Fix of FLINK-20433 shows potential corruption after recovery for all 
variations of UnalignedCheckpointITCase.


> Unaligned checkpoint recovery may lead to corrupted data stream
> ---
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0
>Reporter: Arvid Heise
>Priority: Major
> Fix For: 1.13.0, 1.12.1
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all 
> variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me 
> in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be 
> [0;10] has been mis-deserialized)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14432: [FLINK-20678] Let JobManagerRunnerImpl fail if the JobMasterService fails unexpectedly

2020-12-18 Thread GitBox


flinkbot commented on pull request #14432:
URL: https://github.com/apache/flink/pull/14432#issuecomment-748196526


   
   ## CI report:
   
   * d2192e5c0d882ef4d475b2f55b0623673026fb7d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14431: [FLINK-11719] Do not reuse JobMaster instances across leader sessions

2020-12-18 Thread GitBox


flinkbot commented on pull request #14431:
URL: https://github.com/apache/flink/pull/14431#issuecomment-748196435


   
   ## CI report:
   
   * aec1ac97effc6607d3fb4f94fe323c86c20759a8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14430: [FLINK-20677] Introduce proper JobManagerRunnerResult

2020-12-18 Thread GitBox


flinkbot commented on pull request #14430:
URL: https://github.com/apache/flink/pull/14430#issuecomment-748196338


   
   ## CI report:
   
   * 2996e3ffa70a4caa7437c9699e1cbc93ebfe488a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Issue Comment Deleted] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-20648:
--
Comment: was deleted

(was: Hmm. Maybe we could build the rpc endpoint id for JobMaster outside and 
pass it to the constructor of {{JobMaster}}. I will have a try.)

> Unable to restore job from savepoint when using Kubernetes based HA services
> 
>
> Key: FLINK-20648
> URL: https://issues.apache.org/jira/browse/FLINK-20648
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: David Morávek
>Assignee: Yang Wang
>Priority: Critical
> Fix For: 1.13.0, 1.12.1
>
>
> When restoring job from savepoint, we always end up with following error:
> {code}
> Caused by: org.apache.flink.runtime.client.JobInitializationException: Could 
> not instantiate JobManager.
>   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:463)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1764)
>  ~[?:?]
>   ... 3 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Stopped 
> retrying the operation because the error is not retryable.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2063) ~[?:?]
>   at 
> org.apache.flink.kubernetes.highavailability.KubernetesStateHandleStore.addAndLock(KubernetesStateHandleStore.java:150)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.addCheckpoint(DefaultCompletedCheckpointStore.java:211)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1479)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:325)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:266)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:238)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:134)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:108)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:323)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:310) 
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:96)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:41)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:141)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:80)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:450)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1764)
>  ~[?:?]
>   ... 3 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
> Stopped retrying the operation because the error is not retryable.
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperation$1(FutureUtils.java:166)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  ~[?:?]
>   ... 3 more
> Caused by: java.util.concurrent.CompletionException: 
> 

[GitHub] [flink] kl0u commented on a change in pull request #14312: [FLINK-20491] Support Broadcast State in BATCH execution mode

2020-12-18 Thread GitBox


kl0u commented on a change in pull request #14312:
URL: https://github.com/apache/flink/pull/14312#discussion_r545953329



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamMultipleInputProcessorFactory.java
##
@@ -143,7 +150,20 @@ else if (configuredInput instanceof 
StreamConfig.SourceInputConfig) {
jobConfig
);
 
-   inputs = selectableSortingInputs.getSortingInputs();
+   StreamTaskInput[] sortedInputs = 
selectableSortingInputs.getSortedInputs();
+   StreamTaskInput[] passThroughInputs = 
selectableSortingInputs.getPassThroughInputs();
+   int sortedIndex = 0;
+   int passThroughIndex = 0;
+   for (int i = 0; i < inputs.length; i++) {
+   if (requiresSorting(inputConfigs[i])) {
+   inputs[i] = sortedInputs[sortedIndex];
+   sortedIndex++;
+   } else {
+   inputs[i] = 
passThroughInputs[passThroughIndex];
+   passThroughIndex++;
+   }
+   }
+   inputs = selectableSortingInputs.getSortedInputs();

Review comment:
   This line seems to be undoing what you did before, right?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14387:
URL: https://github.com/apache/flink/pull/14387#issuecomment-745165624


   
   ## CI report:
   
   * 3556926bfe3d10e73d4700f295c7343e7b42509e Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11048)
 
   * 4333cc2789d8eccf1ddc58a3c79ab0880169fab6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251850#comment-17251850
 ] 

Yang Wang commented on FLINK-20648:
---

Yes. We are justing using the {{leaderContenderDescription}}(aka leader 
address) for logging in {{ZooKeeperLeaderElectionDriver}}.

> Unable to restore job from savepoint when using Kubernetes based HA services
> 
>
> Key: FLINK-20648
> URL: https://issues.apache.org/jira/browse/FLINK-20648
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: David Morávek
>Assignee: Yang Wang
>Priority: Critical
> Fix For: 1.13.0, 1.12.1
>
>
> When restoring job from savepoint, we always end up with following error:
> {code}
> Caused by: org.apache.flink.runtime.client.JobInitializationException: Could 
> not instantiate JobManager.
>   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:463)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1764)
>  ~[?:?]
>   ... 3 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Stopped 
> retrying the operation because the error is not retryable.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2063) ~[?:?]
>   at 
> org.apache.flink.kubernetes.highavailability.KubernetesStateHandleStore.addAndLock(KubernetesStateHandleStore.java:150)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.addCheckpoint(DefaultCompletedCheckpointStore.java:211)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1479)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:325)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:266)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:238)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:134)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:108)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:323)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:310) 
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:96)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:41)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:141)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:80)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:450)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1764)
>  ~[?:?]
>   ... 3 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
> Stopped retrying the operation because the error is not retryable.
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperation$1(FutureUtils.java:166)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  ~[?:?]
>   ... 3 more
> Caused by: java.util.concurrent.CompletionException: 
> 

[GitHub] [flink] flinkbot commented on pull request #14432: [FLINK-20678] Let JobManagerRunnerImpl fail if the JobMasterService fails unexpectedly

2020-12-18 Thread GitBox


flinkbot commented on pull request #14432:
URL: https://github.com/apache/flink/pull/14432#issuecomment-748187284


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit d2192e5c0d882ef4d475b2f55b0623673026fb7d (Fri Dec 18 
16:24:58 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-19369) BlobClientTest.testGetFailsDuringStreamingForJobPermanentBlob hangs

2020-12-18 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-19369:
-

Assignee: Till Rohrmann

> BlobClientTest.testGetFailsDuringStreamingForJobPermanentBlob hangs
> ---
>
> Key: FLINK-19369
> URL: https://issues.apache.org/jira/browse/FLINK-19369
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Dian Fu
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6803=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=39a61cac-5c62-532f-d2c1-dea450a66708
> {code}
> 2020-09-22T21:40:57.5304615Z "main" #1 prio=5 os_prio=0 cpu=18407.84ms 
> elapsed=1969.42s tid=0x7f0730015800 nid=0x79bd waiting for monitor entry  
> [0x7f07389fb000]
> 2020-09-22T21:40:57.5305080Zjava.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-09-22T21:40:57.5305487Z  at 
> sun.security.ssl.SSLSocketImpl.duplexCloseOutput(java.base@11.0.7/SSLSocketImpl.java:541)
> 2020-09-22T21:40:57.5306159Z  - waiting to lock <0x8661a560> (a 
> sun.security.ssl.SSLSocketOutputRecord)
> 2020-09-22T21:40:57.5306545Z  at 
> sun.security.ssl.SSLSocketImpl.close(java.base@11.0.7/SSLSocketImpl.java:472)
> 2020-09-22T21:40:57.5307045Z  at 
> org.apache.flink.runtime.blob.BlobUtils.closeSilently(BlobUtils.java:367)
> 2020-09-22T21:40:57.5307605Z  at 
> org.apache.flink.runtime.blob.BlobServerConnection.close(BlobServerConnection.java:141)
> 2020-09-22T21:40:57.5308337Z  at 
> org.apache.flink.runtime.blob.BlobClientTest.testGetFailsDuringStreaming(BlobClientTest.java:443)
> 2020-09-22T21:40:57.5308904Z  at 
> org.apache.flink.runtime.blob.BlobClientTest.testGetFailsDuringStreamingForJobPermanentBlob(BlobClientTest.java:408)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20678) JobManagerRunnerImpl does not notice unexpected JobMasterService termination

2020-12-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20678:
---
Labels: pull-request-available  (was: )

> JobManagerRunnerImpl does not notice unexpected JobMasterService termination
> 
>
> Key: FLINK-20678
> URL: https://issues.apache.org/jira/browse/FLINK-20678
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The {{JobManagerRunnerImpl}} does not notice unexpected terminations of the 
> {{JobMasterService}}. This puts the system stability at risk and I propose to 
> monitor the liveliness of the {{JobMasterService}} and then to fail if it 
> should terminate unexpectedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20664) Support setting service account for TaskManager pod

2020-12-18 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251848#comment-17251848
 ] 

Yang Wang commented on FLINK-20664:
---

[~lublinsky] Do you think adding a new config option 
{{kubernetes.taskmanager.service-account}} makes sense to you?

> Support setting service account for TaskManager pod
> ---
>
> Key: FLINK-20664
> URL: https://issues.apache.org/jira/browse/FLINK-20664
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Fix For: 1.13.0, 1.12.1
>
>
> Currently, we only set the service account for JobManager. The TaskManager is 
> using the default service account. Before the KubernetesHAService is 
> introduced, it works because the TaskManager does not need to access the K8s 
> resource(e.g. ConfigMap) directly. But now the TaskManager needs to watch 
> ConfigMap and retrieve leader address. So if the default service account does 
> not have enough permission, users could not specify a valid service account 
> for TaskManager.
>  
> We should introduce a new config option for TaskManager service account. 
> {{kubernetes.taskmanager.service-account}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann opened a new pull request #14432: [FLINK-20678] Let JobManagerRunnerImpl fail if the JobMasterService fails unexpectedly

2020-12-18 Thread GitBox


tillrohrmann opened a new pull request #14432:
URL: https://github.com/apache/flink/pull/14432


   ## What is the purpose of the change
   
   This is a safety net which ensures that the JobManagerRunnerImpl will fail 
if the
   JobMasterService fails unexpectedly.
   
   ## Verifying this change
   
   I added 
`JobManagerRunnerImplTest.testJobMasterServiceTerminatesUnexpectedlyTriggersFailure`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14387:
URL: https://github.com/apache/flink/pull/14387#issuecomment-745165624


   
   ## CI report:
   
   * d1afe2b80aee567e824eee4c8654798e7163a01b Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11046)
 
   * 3556926bfe3d10e73d4700f295c7343e7b42509e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11048)
 
   * 4333cc2789d8eccf1ddc58a3c79ab0880169fab6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14431: [FLINK-11719] Do not reuse JobMaster instances across leader sessions

2020-12-18 Thread GitBox


flinkbot commented on pull request #14431:
URL: https://github.com/apache/flink/pull/14431#issuecomment-748184472


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit aec1ac97effc6607d3fb4f94fe323c86c20759a8 (Fri Dec 18 
16:19:47 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20678) JobManagerRunnerImpl does not notice unexpected JobMasterService termination

2020-12-18 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-20678:
-

 Summary: JobManagerRunnerImpl does not notice unexpected 
JobMasterService termination
 Key: FLINK-20678
 URL: https://issues.apache.org/jira/browse/FLINK-20678
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.13.0


The {{JobManagerRunnerImpl}} does not notice unexpected terminations of the 
{{JobMasterService}}. This puts the system stability at risk and I propose to 
monitor the liveliness of the {{JobMasterService}} and then to fail if it 
should terminate unexpectedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11719) Remove JobMaster#start(JobMasterId) and #suspend

2020-12-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-11719:
---
Labels: pull-request-available  (was: )

> Remove JobMaster#start(JobMasterId) and #suspend
> 
>
> Key: FLINK-11719
> URL: https://issues.apache.org/jira/browse/FLINK-11719
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the {{JobMaster}} contains a lot of mutable state which is 
> necessary because it is used across different leadership sessions by the 
> {{JobManagerRunner}}. For this purpose, we have the methods 
> {{JobMaster#start(JobMasterId)}} and {{#suspend}}. The mutable state 
> management makes things on the {{JobMaster}} side more complicated than they 
> need to be. In order to improve the {{JobMaster's}} maintainability I suggest 
> to remove this logic and instead terminate the {{JobMaster}} if the 
> {{JobManagerRunner}} loses leadership. This entails that for every leadership 
> we will create a new {{JobMaster}} instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann opened a new pull request #14431: [FLINK-11719] Do not reuse JobMaster instances across leader sessions

2020-12-18 Thread GitBox


tillrohrmann opened a new pull request #14431:
URL: https://github.com/apache/flink/pull/14431


   ## What is the purpose of the change
   
   This PR changes how the `JobMaster` is used by the `JobManagerRunnerImpl`. 
Instead of reusing the `JobMaster` across different leader sessions, the 
`JobManagerRunnerImpl` will create a new instance for every leader session. 
This makes the state management in the `JobMaster` easier because we don't have 
to make sure that the components are always in a cleaned up state when starting 
a new leader session. Moreover, it simplifies the state management because 
there are fewer mutable components in the `JobMaster`.
   
   This PR is based on #14430.
   
   ## Brief change log
   
   - 8faba3b: This commit changes how the JobManagerRunnerImpl uses 
JobMasterServices.
   Now we use a JobMasterService per leader session.
   
   - 436aa53: This commit changes the JobMaster to have a permanent fencing 
token.
   
   - 1ab27e0: Since the JobMaster is now a PermanentlyFencedRpcEndpoint we no 
longer
   need to make the scheduler resettable.
   
   - 7b74054: Since we are no longer reusing the JobMaster across different 
leader sessions,
   we can make the heartbeat managers final.
   
   - aec1ac9: Make starting and stopping of JobMaster services symmetric
   
   ## Verifying this change
   
   I adjusted several test to the new model. The test class `JobMasterTest` was 
mainly affected.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14430: [FLINK-20677] Introduce proper JobManagerRunnerResult

2020-12-18 Thread GitBox


flinkbot commented on pull request #14430:
URL: https://github.com/apache/flink/pull/14430#issuecomment-748182634


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 2996e3ffa70a4caa7437c9699e1cbc93ebfe488a (Fri Dec 18 
16:16:23 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20677) Let JobManagerRunner return JobManagerRunnerResult to better distinguish result cases

2020-12-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20677:
---
Labels: pull-request-available  (was: )

> Let JobManagerRunner return JobManagerRunnerResult to better distinguish 
> result cases
> -
>
> Key: FLINK-20677
> URL: https://issues.apache.org/jira/browse/FLINK-20677
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> In order to better distinguish the result cases of the {{JobManagerRunner}} I 
> propose to introduce a {{JobManagerRunnerResult}} which can have the values
> 1) the job was finished successfully 
> 2) the job was not finished
> 3) the job initialization failed (this is required in order to handle job 
> initialization failures properly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann opened a new pull request #14430: [FLINK-20677] Introduce proper JobManagerRunnerResult

2020-12-18 Thread GitBox


tillrohrmann opened a new pull request #14430:
URL: https://github.com/apache/flink/pull/14430


   ## What is the purpose of the change
   
   The JobManagerRunnerResult allows to express different termination
   conditions for the JobManagerRunner:
   
   1) Job finished successfully
   2) Job did not finish
   3) Job initialization failed
   
   ## Verifying this change
   
   - Added `JobManagerRunnerResultTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20677) Let JobManagerRunner return JobManagerRunnerResult to better distinguish result cases

2020-12-18 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-20677:
-

 Summary: Let JobManagerRunner return JobManagerRunnerResult to 
better distinguish result cases
 Key: FLINK-20677
 URL: https://issues.apache.org/jira/browse/FLINK-20677
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.13.0


In order to better distinguish the result cases of the {{JobManagerRunner}} I 
propose to introduce a {{JobManagerRunnerResult}} which can have the values

1) the job was finished successfully 
2) the job was not finished
3) the job initialization failed (this is required in order to handle job 
initialization failures properly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-docker] wangyang0918 edited a comment on pull request #49: [FLINK-20650] Rename "native-k8s" command to a more general name in docker-entrypoint.sh

2020-12-18 Thread GitBox


wangyang0918 edited a comment on pull request #49:
URL: https://github.com/apache/flink-docker/pull/49#issuecomment-748171203


   @tillrohrmann I have updated this PR. Once this PR is merged, we could ping 
the docker guys for merging the docker publishing PR for 1.12.0.
   
   I also created a new ticket FLINK-20676 for removing the deprecated command 
`native-k8s` in 1.13.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20676) Remove deprecated command "native-k8s" in docker-entrypoint.sh

2020-12-18 Thread Yang Wang (Jira)
Yang Wang created FLINK-20676:
-

 Summary: Remove deprecated command "native-k8s" in 
docker-entrypoint.sh
 Key: FLINK-20676
 URL: https://issues.apache.org/jira/browse/FLINK-20676
 Project: Flink
  Issue Type: Improvement
  Components: flink-docker
Reporter: Yang Wang
 Fix For: 1.13.0


In FLINK-20650, we have introduced a new general command "run" and mark 
"native-k8s" as deprecated in docker-entrypoint.sh. The deprecated command 
"native-k8s" should be removed in the next major release(1.13).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20650) Add new general command "run" and mark "native-k8s" as deprecated in docker-entrypoint.sh

2020-12-18 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-20650:
--
Summary: Add new general command "run" and mark "native-k8s" as deprecated 
in docker-entrypoint.sh  (was: [FLINK-20650] Add new general command "run" and 
mark "native-k8s" as deprecated in docker-entrypoint.sh)

> Add new general command "run" and mark "native-k8s" as deprecated in 
> docker-entrypoint.sh
> -
>
> Key: FLINK-20650
> URL: https://issues.apache.org/jira/browse/FLINK-20650
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-docker
>Affects Versions: 1.12.0
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.13.0, 1.12.1
>
>
> When we are publishing the image 1.12 to docker hub, some docker guys raise 
> up a issue for the {{docker-entrypoint.sh}}. They want the images to have a 
> certain standard, because they are the official ones. However the proposed 
> {{native-k8s}} command is more like an internal bridge. It is only used for 
> native Kubernetes integration.
>  
> Another suggestion is removing the "bash -c" wrapper and generate it in the 
> flink codes. Refer here[1] for more information.
>  
> Note: when we rename the {{native-k8s}} to {{generic}} in the flink-docker 
> project, the flink Kubernetes codes should be adjusted accordingly.
>  
> [1]. https://github.com/docker-library/official-images/pull/9249



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20650) [FLINK-20650] Add new general command "run" and mark "native-k8s" as deprecated in docker-entrypoint.sh

2020-12-18 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-20650:
--
Summary: [FLINK-20650] Add new general command "run" and mark "native-k8s" 
as deprecated in docker-entrypoint.sh  (was: Rename "native-k8s" command to a 
general name in docker-entrypoint.sh)

> [FLINK-20650] Add new general command "run" and mark "native-k8s" as 
> deprecated in docker-entrypoint.sh
> ---
>
> Key: FLINK-20650
> URL: https://issues.apache.org/jira/browse/FLINK-20650
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-docker
>Affects Versions: 1.12.0
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.13.0, 1.12.1
>
>
> When we are publishing the image 1.12 to docker hub, some docker guys raise 
> up a issue for the {{docker-entrypoint.sh}}. They want the images to have a 
> certain standard, because they are the official ones. However the proposed 
> {{native-k8s}} command is more like an internal bridge. It is only used for 
> native Kubernetes integration.
>  
> Another suggestion is removing the "bash -c" wrapper and generate it in the 
> flink codes. Refer here[1] for more information.
>  
> Note: when we rename the {{native-k8s}} to {{generic}} in the flink-docker 
> project, the flink Kubernetes codes should be adjusted accordingly.
>  
> [1]. https://github.com/docker-library/official-images/pull/9249



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14424: [FLINK-19369][tests] Disable BlobClientSslTest.testGetFailsDuringStreaming*

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14424:
URL: https://github.com/apache/flink/pull/14424#issuecomment-748039662


   
   ## CI report:
   
   * a281ac1645846dce585e8b377c4f48a7e03e8cf6 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11025)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14387:
URL: https://github.com/apache/flink/pull/14387#issuecomment-745165624


   
   ## CI report:
   
   * 47dcb9670fdea6129692d27b33dc13b620248600 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11021)
 
   * d1afe2b80aee567e824eee4c8654798e7163a01b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11046)
 
   * 3556926bfe3d10e73d4700f295c7343e7b42509e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11048)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14361: [FLINK-19435][connectors/jdbc] Fix deadlock when loading different driver classes concurrently using Class.forName

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14361:
URL: https://github.com/apache/flink/pull/14361#issuecomment-742899629


   
   ## CI report:
   
   * 0aa90c60d5e038daeb11b9ed90ef1226a4410613 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11024)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17598) Implement FileSystemHAServices for native K8s setups

2020-12-18 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251835#comment-17251835
 ] 

Yang Wang commented on FLINK-17598:
---

cc [~boris.lublin...@cna.com], do you think we still need to implement the 
{{FileSystemHAServices}}? I notice that you are trying the 
{{KubernetesHAService}}.

> Implement FileSystemHAServices for native K8s setups
> 
>
> Key: FLINK-17598
> URL: https://issues.apache.org/jira/browse/FLINK-17598
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes, Runtime / Coordination
>Reporter: Canbin Zheng
>Priority: Major
>
> At the moment we use Zookeeper as a distributed coordinator for implementing 
> JobManager high availability services. But in the cloud-native environment, 
> there is a trend that more and more users prefer to use *Kubernetes* as the 
> underlying scheduler backend while *Storage Object* as the Storage medium, 
> both of these two services don't require Zookeeper deployment.
> As a result, in the K8s setups, people have to deploy and maintain their 
> Zookeeper clusters for solving JobManager SPOF. This ticket proposes to 
> provide a simplified FileSystem HA implementation with the leader-election 
> removed, which saves the efforts of Zookeeper deployment.
> To achieve this, we plan to 
> # Introduce a {{FileSystemHaServices}} which implements the 
> {{HighAvailabilityServices}}.
> # Replace Deployment with StatefulSet to ensure *at most one* semantics, 
> preventing potential concurrent access to the underlying FileSystem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-docker] wangyang0918 commented on pull request #49: [FLINK-20650] Rename "native-k8s" command to a more general name in docker-entrypoint.sh

2020-12-18 Thread GitBox


wangyang0918 commented on pull request #49:
URL: https://github.com/apache/flink-docker/pull/49#issuecomment-748171203


   @tillrohrmann I have updated this PR. Once this PR is merged, we could ping 
the docker guys for merging the docker publishing PR for 1.12.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20675) Asynchronous checkpoint failure would not fail the job anymore

2020-12-18 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-20675:
-
Description: 
After FLINK-12364, no mater how many times of asynchronous part of checkpoint 
on task failed, the job itself would not fail by default:
| Default behavior|Flink-1.5 —> Flink-1.8||Flink-1.9 -> Flink-1.12||
|Synchronous part of checkpoint at task failed|Job failed|Job failed|
|Asynchronous part of checkpoint at task failed| Job failed| Job would not fail|

 This error was because {{StreamTask}} use {{Exception}} instead of 
{{CheckpointException}} [when async part 
failed|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1118]
 as decline message. Thus checkpoint coordinator would call 
{{failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID)}} to [process 
the declined 
checkpoint|https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1316-L1323]:
{code:java}
if (cause == null) {
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.CHECKPOINT_DECLINED, executionAttemptID);
} else if (cause instanceof CheckpointException) {
CheckpointException exception = (CheckpointException) cause;
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
exception.getCheckpointFailureReason(), cause, executionAttemptID);
} else {
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID);
}
{code}
However, {{CheckpointFailureManager}} would [ignore the JOB_FAILURE 
reason|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java#L108]
 and not count this failed checkpoint, which causes asynchronous checkpoint 
failure would not fail the job anymore.

 

FLINK-16753 corrects the misleading message of JOB_FAILURE but the asynchronous 
checkpoint failure still cannot fail the job.

 

As this bug exists too long, I decide to set it as critical instead of blocker 
level. 

 

  was:
After FLINK-12364, no mater how many times of asynchronous part of checkpoint 
on task failed, the job itself would not fail by default:
| |Flink-1.5 —> Flink-1.8||Flink-1.9 -> Flink-1.12||
|Synchronous part of checkpoint at task failed|Job failed|Job failed|
|Asynchronous part of checkpoint at task failed| Job failed| Job would not fail|

 This error was because {{StreamTask}} use {{Exception}} instead of 
{{CheckpointException}} [when async part 
failed|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1118]
 as decline message. Thus checkpoint coordinator would call 
{{failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID)}} to [process 
the declined 
checkpoint|https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1316-L1323]:
{code:java}
if (cause == null) {
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.CHECKPOINT_DECLINED, executionAttemptID);
} else if (cause instanceof CheckpointException) {
CheckpointException exception = (CheckpointException) cause;
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
exception.getCheckpointFailureReason(), cause, executionAttemptID);
} else {
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID);
}
{code}
However, {{CheckpointFailureManager}} would [ignore the JOB_FAILURE 
reason|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java#L108]
 and not count this failed checkpoint, which causes asynchronous checkpoint 
failure would not fail the job anymore.

 

FLINK-16753 corrects the misleading message of JOB_FAILURE but the asynchronous 
checkpoint failure still cannot fail the job.

 

As this bug exists too long, I decide to set it as critical instead of blocker 
level. 

 


> Asynchronous checkpoint failure would not fail the job anymore
> --
>
> Key: FLINK-20675
> URL: https://issues.apache.org/jira/browse/FLINK-20675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.9.3, 1.10.2, 1.12.0, 1.11.3
>  

[jira] [Commented] (FLINK-20675) Asynchronous checkpoint failure would not fail the job anymore

2020-12-18 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251834#comment-17251834
 ] 

Yun Tang commented on FLINK-20675:
--

cc [~pnowojski]

> Asynchronous checkpoint failure would not fail the job anymore
> --
>
> Key: FLINK-20675
> URL: https://issues.apache.org/jira/browse/FLINK-20675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.9.3, 1.10.2, 1.12.0, 1.11.3
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Critical
> Fix For: 1.13.0, 1.11.4, 1.12.1
>
>
> After FLINK-12364, no mater how many times of asynchronous part of checkpoint 
> on task failed, the job itself would not fail by default:
> | |Flink-1.5 —> Flink-1.8||Flink-1.9 -> Flink-1.12||
> |Synchronous part of checkpoint at task failed|Job failed|Job failed|
> |Asynchronous part of checkpoint at task failed| Job failed| Job would not 
> fail|
>  This error was because {{StreamTask}} use {{Exception}} instead of 
> {{CheckpointException}} [when async part 
> failed|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1118]
>  as decline message. Thus checkpoint coordinator would call 
> {{failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
> CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID)}} to [process 
> the declined 
> checkpoint|https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1316-L1323]:
> {code:java}
> if (cause == null) {
>   failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
> CheckpointFailureReason.CHECKPOINT_DECLINED, executionAttemptID);
> } else if (cause instanceof CheckpointException) {
>   CheckpointException exception = (CheckpointException) cause;
>   failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
> exception.getCheckpointFailureReason(), cause, executionAttemptID);
> } else {
>   failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
> CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID);
> }
> {code}
> However, {{CheckpointFailureManager}} would [ignore the JOB_FAILURE 
> reason|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java#L108]
>  and not count this failed checkpoint, which causes asynchronous checkpoint 
> failure would not fail the job anymore.
>  
> FLINK-16753 corrects the misleading message of JOB_FAILURE but the 
> asynchronous checkpoint failure still cannot fail the job.
>  
> As this bug exists too long, I decide to set it as critical instead of 
> blocker level. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20675) Asynchronous checkpoint failure would not fail the job anymore

2020-12-18 Thread Yun Tang (Jira)
Yun Tang created FLINK-20675:


 Summary: Asynchronous checkpoint failure would not fail the job 
anymore
 Key: FLINK-20675
 URL: https://issues.apache.org/jira/browse/FLINK-20675
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.11.3, 1.12.0, 1.10.2, 1.9.3
Reporter: Yun Tang
Assignee: Yun Tang
 Fix For: 1.13.0, 1.11.4, 1.12.1


After FLINK-12364, no mater how many times of asynchronous part of checkpoint 
on task failed, the job itself would not fail by default:
| |Flink-1.5 —> Flink-1.8||Flink-1.9 -> Flink-1.12||
|Synchronous part of checkpoint at task failed|Job failed|Job failed|
|Asynchronous part of checkpoint at task failed| Job failed| Job would not fail|

 This error was because {{StreamTask}} use {{Exception}} instead of 
{{CheckpointException}} [when async part 
failed|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1118]
 as decline message. Thus checkpoint coordinator would call 
{{failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID)}} to [process 
the declined 
checkpoint|https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1316-L1323]:
{code:java}
if (cause == null) {
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.CHECKPOINT_DECLINED, executionAttemptID);
} else if (cause instanceof CheckpointException) {
CheckpointException exception = (CheckpointException) cause;
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
exception.getCheckpointFailureReason(), cause, executionAttemptID);
} else {
failPendingCheckpointDueToTaskFailure(pendingCheckpoint, 
CheckpointFailureReason.JOB_FAILURE, cause, executionAttemptID);
}
{code}
However, {{CheckpointFailureManager}} would [ignore the JOB_FAILURE 
reason|https://github.com/apache/flink/blob/5125b1123dfcfff73b5070401dfccb162959080c/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java#L108]
 and not count this failed checkpoint, which causes asynchronous checkpoint 
failure would not fail the job anymore.

 

FLINK-16753 corrects the misleading message of JOB_FAILURE but the asynchronous 
checkpoint failure still cannot fail the job.

 

As this bug exists too long, I decide to set it as critical instead of blocker 
level. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14312: [FLINK-20491] Support Broadcast State in BATCH execution mode

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14312:
URL: https://github.com/apache/flink/pull/14312#issuecomment-738876739


   
   ## CI report:
   
   * 1e4d45934795e74502b7871d1cbad7380727ad22 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11023)
 
   * 091ba28c2182acfbb79b8cf0c0d6b0b2d40784ed Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11042)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20674) Wrong send/received stats with UNION ALL

2020-12-18 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251825#comment-17251825
 ] 

Piotr Nowojski commented on FLINK-20674:


Ideally we would have both bytes/records produced and bytes/records sent, but 
if I remember correctly, the "sent" variant wouldn't be that trivial because of 
the threading model (netty threads?). Either way, it would be nice to have 
them, if not for the fact that it might be also confusing users with overloaded 
stats?

But IMO just renaming this would already help a bit. If we had also tool tips 
in the UI explaining the stats, that would solve the problem.

> Wrong send/received stats with UNION ALL
> 
>
> Key: FLINK-20674
> URL: https://issues.apache.org/jira/browse/FLINK-20674
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Nico Kruber
>Priority: Major
>
> When using {{UNION ALL}} to union the same table twice , the number of 
> records and bytes sent is just half of what the next task receives:
> Reproducible with this:
> {code}
> CREATE TEMPORARY TABLE test (
>   `number` SMALLINT
> )
> WITH (
>   'connector' = 'datagen',
>   'rows-per-second' = '1'
> );
> SELECT * FROM (
> (SELECT * FROM test)
> UNION ALL
> (SELECT * FROM test)
> )
> {code}
> Arguably, the use case is not too useful but other combinations may be 
> affected, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14428: [FLINK-20609][table-planner-blink] Separate the implementation of BatchExecDataStreamScan and StreamExecDataStreamScan

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14428:
URL: https://github.com/apache/flink/pull/14428#issuecomment-748082099


   
   ## CI report:
   
   * c8d5bac964a6dd973db3e643118f4af50c82f7a5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11040)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14387:
URL: https://github.com/apache/flink/pull/14387#issuecomment-745165624


   
   ## CI report:
   
   * 47dcb9670fdea6129692d27b33dc13b620248600 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11021)
 
   * d1afe2b80aee567e824eee4c8654798e7163a01b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11046)
 
   * 3556926bfe3d10e73d4700f295c7343e7b42509e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-docker] tillrohrmann commented on pull request #49: [FLINK-20650] Rename "native-k8s" command to a more general name in docker-entrypoint.sh

2020-12-18 Thread GitBox


tillrohrmann commented on pull request #49:
URL: https://github.com/apache/flink-docker/pull/49#issuecomment-748153256


   Yes, this would be my suggestion in order to not break things within the 
`1.12.x` release line.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251821#comment-17251821
 ] 

Till Rohrmann commented on FLINK-20648:
---

The address of the contender does not need to be known when starting the leader 
election service. Only when confirming the leader session one needs to specify 
the leader address.

> Unable to restore job from savepoint when using Kubernetes based HA services
> 
>
> Key: FLINK-20648
> URL: https://issues.apache.org/jira/browse/FLINK-20648
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: David Morávek
>Assignee: Yang Wang
>Priority: Critical
> Fix For: 1.13.0, 1.12.1
>
>
> When restoring job from savepoint, we always end up with following error:
> {code}
> Caused by: org.apache.flink.runtime.client.JobInitializationException: Could 
> not instantiate JobManager.
>   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:463)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1764)
>  ~[?:?]
>   ... 3 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Stopped 
> retrying the operation because the error is not retryable.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2063) ~[?:?]
>   at 
> org.apache.flink.kubernetes.highavailability.KubernetesStateHandleStore.addAndLock(KubernetesStateHandleStore.java:150)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.addCheckpoint(DefaultCompletedCheckpointStore.java:211)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1479)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:325)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:266)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:238)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:134)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:108)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:323)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:310) 
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:96)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:41)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:141)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:80)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:450)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1764)
>  ~[?:?]
>   ... 3 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
> Stopped retrying the operation because the error is not retryable.
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperation$1(FutureUtils.java:166)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  ~[?:?]
>   ... 3 more
> Caused by: 

[GitHub] [flink] xiaoHoly commented on a change in pull request #14387: [FLINK-19691][Connector][jdbc] Expose `CONNECTION_CHECK_TIMEOUT_SECONDS` as a configurable option in Jdbc connector

2020-12-18 Thread GitBox


xiaoHoly commented on a change in pull request #14387:
URL: https://github.com/apache/flink/pull/14387#discussion_r545900554



##
File path: 
flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/internal/options/JdbcOptions.java
##
@@ -150,7 +168,7 @@ public JdbcOptions build() {
});
}
 
-   return new JdbcOptions(dbURL, tableName, driverName, 
username, password, dialect);
+   return new JdbcOptions(dbURL, tableName, driverName, 
username, password, dialect,connectionCheckTimeoutSeconds);

Review comment:
   thanks , review  my code. i had done with your suggestion .But i dont 
konw how to get connectionCheckTimeoutSeconds in JdbcBatchingOutputFormat.java  
,so i remove it .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19880) Fix ignore-parse-errors not work for the legacy JSON format

2020-12-18 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-19880.
---
Resolution: Fixed

Fixed in
 - master: fc660274a5c7b4cdea00335cac87bdf5d421db92
 - release-1.12: 3bfb2ab611acff9f0db28f95878032c80f3ae815

> Fix ignore-parse-errors not work for the legacy JSON format
> ---
>
> Key: FLINK-19880
> URL: https://issues.apache.org/jira/browse/FLINK-19880
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: MingZhangOk
>Assignee: Xiao Cai
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.13.0, 1.12.1
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> when I call
>  new Json(). ignoreParseErrors (true)
>  When used to ignore exceptions, the following exceptions always occur:
>  org.apache.flink. formats.json.JsonRowFormatFactory
>  Unsupported property keys:
>  format.ignore-parse-errors
>  Modify this line of code to solve my problem .
>  Add a line of code after 52 lines in this 
> class(org.apache.flink.formats.json.JsonRowFormatFactory):
> properties.add(JsonValidator.FORMAT_IGNORE_PARSE_ERRORS);
>  First time to participate in flink submission code, do not understand what 
> rules, directly submitted, sorry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14423: [FLINK-20673][table-planner-blink] ExecNode#getOutputType method should return LogicalType instead of RowType

2020-12-18 Thread GitBox


flinkbot edited a comment on pull request #14423:
URL: https://github.com/apache/flink/pull/14423#issuecomment-748013167


   
   ## CI report:
   
   * 0cd0435254c0858f52ad5bd5f7d998ce2b9797e1 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11020)
 
   * c1507f5fd4747da2a8d3434b2cdbc883a1d332f2 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11031)
 
   * feca8bda1976f237787ea3219b567bff442e4568 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11043)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >