[jira] [Commented] (FLINK-18143) Fix Python meter metric not correct problem

2020-06-04 Thread Hequn Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126409#comment-17126409
 ] 

Hequn Cheng commented on FLINK-18143:
-

Fixed 
in 1.12.0 via dcd764a8a0df788846886f2c3aa2b38828ae0e21
in 1.11.0 via 37f436ec96ffa27fec6650a3ef4726a2a93cada4

> Fix Python meter metric not correct problem
> ---
>
> Key: FLINK-18143
> URL: https://issues.apache.org/jira/browse/FLINK-18143
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> We should report the diff value instead of the total ones. Code 
> `meter.markEvent(update);` in `FlinkMetricContainer` should be changed to 
> `meter.markEvent(update - meter.getCount());`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18143) Fix Python meter metric not correct problem

2020-06-04 Thread Hequn Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng closed FLINK-18143.
---
Resolution: Fixed

> Fix Python meter metric not correct problem
> ---
>
> Key: FLINK-18143
> URL: https://issues.apache.org/jira/browse/FLINK-18143
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> We should report the diff value instead of the total ones. Code 
> `meter.markEvent(update);` in `FlinkMetricContainer` should be changed to 
> `meter.markEvent(update - meter.getCount());`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] hequn8128 merged pull request #12498: [FLINK-18143][python] Fix Python meter metric incorrect value problem

2020-06-04 Thread GitBox


hequn8128 merged pull request #12498:
URL: https://github.com/apache/flink/pull/12498


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] hequn8128 commented on pull request #12498: [FLINK-18143][python] Fix Python meter metric incorrect value problem

2020-06-04 Thread GitBox


hequn8128 commented on pull request #12498:
URL: https://github.com/apache/flink/pull/12498#issuecomment-639269071


   @dianfu Thanks. Merging...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18100) Test different user-defined metrics for Python UDF

2020-06-04 Thread Hequn Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126406#comment-17126406
 ] 

Hequn Cheng commented on FLINK-18100:
-

Check different kinds of metrics. Found one bug for meter.  Fixed in 
FLINK-118143.

> Test different user-defined metrics for Python UDF
> --
>
> Key: FLINK-18100
> URL: https://issues.apache.org/jira/browse/FLINK-18100
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Hequn Cheng
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Test different user-defined metrics for Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18101) Test Python Pipeline API including Transformer and Estimator

2020-06-04 Thread Hequn Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126405#comment-17126405
 ] 

Hequn Cheng commented on FLINK-18101:
-

Done.
Test Python Transformer and Estimator.
Test Wrapper logic.
Test key-value parameter configuration.

> Test Python Pipeline API including Transformer and Estimator
> 
>
> Key: FLINK-18101
> URL: https://issues.apache.org/jira/browse/FLINK-18101
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Hequn Cheng
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Test Python Pipeline API including Transformer and Estimator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18101) Test Python Pipeline API including Transformer and Estimator

2020-06-04 Thread Hequn Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng closed FLINK-18101.
---
Resolution: Resolved

> Test Python Pipeline API including Transformer and Estimator
> 
>
> Key: FLINK-18101
> URL: https://issues.apache.org/jira/browse/FLINK-18101
> Project: Flink
>  Issue Type: Test
>  Components: API / Python
>Reporter: sunjincheng
>Assignee: Hequn Cheng
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.11.0
>
>
> Test Python Pipeline API including Transformer and Estimator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dianfu commented on pull request #12498: [FLINK-18143][python] Fix Python meter metric incorrect value problem

2020-06-04 Thread GitBox


dianfu commented on pull request #12498:
URL: https://github.com/apache/flink/pull/12498#issuecomment-639266564


   @hequn8128 Thanks for the PR. LGTM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12498: [FLINK-18143][python] Fix Python meter metric incorrect value problem

2020-06-04 Thread GitBox


flinkbot commented on pull request #12498:
URL: https://github.com/apache/flink/pull/12498#issuecomment-639266322


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit bfdf2a8676a55dd280a7ae91be419d4076bbe5b7 (Fri Jun 05 
05:23:54 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] hequn8128 opened a new pull request #12498: [FLINK-18143][python] Fix Python meter metric incorrect value problem

2020-06-04 Thread GitBox


hequn8128 opened a new pull request #12498:
URL: https://github.com/apache/flink/pull/12498


   
   
   ## What is the purpose of the change
   
   This pull request fixes Python meter metric incorrect value problem.
   
   
   ## Brief change log
   
 - Report the diff value instead of the total ones for meter.
   
   
   ## Verifying this change
   
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18143) Fix Python meter metric not correct problem

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18143:
---
Labels: pull-request-available  (was: )

> Fix Python meter metric not correct problem
> ---
>
> Key: FLINK-18143
> URL: https://issues.apache.org/jira/browse/FLINK-18143
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> We should report the diff value instead of the total ones. Code 
> `meter.markEvent(update);` in `FlinkMetricContainer` should be changed to 
> `meter.markEvent(update - meter.getCount());`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12456: [FLINK-17113][sql-cli] Refactor view support in SQL Client

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12456:
URL: https://github.com/apache/flink/pull/12456#issuecomment-638032908


   
   ## CI report:
   
   * fefdae96b41549bc9d2c75e126c3a06da6863cfb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2764)
 
   * 41aaad57212cd8ed14c3db89c3e2b46fdc9136fc Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2778)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12497: [FLINK-18142][hive] Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12497:
URL: https://github.com/apache/flink/pull/12497#issuecomment-639258405


   
   ## CI report:
   
   * 4d9e9f41e60e46d39ea712f6dbe7b16c879908ad Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2779)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12494: [FLINK-18038] [statebackend] Logging user-provided state backends after they are configured

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12494:
URL: https://github.com/apache/flink/pull/12494#issuecomment-639209175


   
   ## CI report:
   
   * 89ea45f96705fb828844bcd279443d29d5c622c2 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2762)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18143) Fix Python meter metric not correct problem

2020-06-04 Thread Hequn Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng updated FLINK-18143:

Description: We should report the diff value instead of the total ones. 
Code `meter.markEvent(update);` in `FlinkMetricContainer` should be changed to 
`meter.markEvent(update - meter.getCount());`

> Fix Python meter metric not correct problem
> ---
>
> Key: FLINK-18143
> URL: https://issues.apache.org/jira/browse/FLINK-18143
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
> Fix For: 1.11.0
>
>
> We should report the diff value instead of the total ones. Code 
> `meter.markEvent(update);` in `FlinkMetricContainer` should be changed to 
> `meter.markEvent(update - meter.getCount());`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18143) Fix Python meter metric not correct problem

2020-06-04 Thread Hequn Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hequn Cheng reassigned FLINK-18143:
---

Assignee: Hequn Cheng

> Fix Python meter metric not correct problem
> ---
>
> Key: FLINK-18143
> URL: https://issues.apache.org/jira/browse/FLINK-18143
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.11.0
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18143) Fix Python meter metric not correct problem

2020-06-04 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18143:
---

 Summary: Fix Python meter metric not correct problem
 Key: FLINK-18143
 URL: https://issues.apache.org/jira/browse/FLINK-18143
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.11.0
Reporter: Hequn Cheng
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Jiangjie Qin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126396#comment-17126396
 ] 

Jiangjie Qin commented on FLINK-17949:
--

The LEADER_NOT_AVAILABLE case basically means the partition is down...

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-26T13:35:19.4033188Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-26T13:35:19.4033638Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-26T13:35:19.4034103Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-05-26T13:35:19.4034593Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-26T13:35:19.4035118Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-26T13:35:19.4035570Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-26T13:35:19.4035888Z  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-04 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126395#comment-17126395
 ] 

Benchao Li commented on FLINK-16497:


+1 to Danny's proposal.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] Myasuka commented on a change in pull request #12470: [FLINK-18074][checkpoint] Ensure task could fail when exception thrown out on notified of checkpoint completed/aborted

2020-06-04 Thread GitBox


Myasuka commented on a change in pull request #12470:
URL: https://github.com/apache/flink/pull/12470#discussion_r435692788



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/StreamTaskTest.java
##
@@ -969,6 +970,34 @@ public void testNotifyCheckpointOnClosedOperator() throws 
Throwable {
assertEquals(true, operator.closed.get());
}
 
+   @Test
+   public void testFailToConfirmCheckpointCompleted() throws Exception {
+   testFailToConfirmCheckpointMessage(streamTask -> 
streamTask.notifyCheckpointCompleteAsync(1L));
+   }
+
+   @Test
+   public void testFailToConfirmCheckpointAborted() throws Exception {
+   testFailToConfirmCheckpointMessage(streamTask -> 
streamTask.notifyCheckpointAbortAsync(1L));
+   }
+
+   private void testFailToConfirmCheckpointMessage(Consumer> consumer) throws Exception {
+   FailOnNotifyCheckpointOperator operator = new 
FailOnNotifyCheckpointOperator<>();
+   MultipleInputStreamTaskTestHarnessBuilder builder =
+   new 
MultipleInputStreamTaskTestHarnessBuilder<>(OneInputStreamTask::new, 
BasicTypeInfo.INT_TYPE_INFO)
+   .addInput(BasicTypeInfo.INT_TYPE_INFO);
+   StreamTaskMailboxTestHarness harness = builder
+   .setupOutputForSingletonOperatorChain(operator)
+   .build();
+
+   try {
+   consumer.accept(harness.streamTask);
+   harness.streamTask.runMailboxStep();
+   fail();
+   } catch (ExpectedTestException expected) {
+   // expected exception

Review comment:
   Sure, I run the test locally without my fix on `StreamTask` and the test 
failed as expected.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-18074) Confirm checkpoint completed on task side would not fail the task if exception thrown out

2020-06-04 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-18074.
--
Resolution: Fixed

merged commit e27c517 into apache:master and 29f9918046 to release-1.11

> Confirm checkpoint completed on task side would not fail the task if 
> exception thrown out
> -
>
> Key: FLINK-18074
> URL: https://issues.apache.org/jira/browse/FLINK-18074
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Task
>Affects Versions: 1.11.0
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> FLINK-17350 let the task fail immediately once sync phase of checkpoint 
> failed. However, the included commit ['Simplify checkpoint exception 
> handling'|https://github.com/apache/flink/pull/12101/commits/a2cd3daceca16ae841119d94a24328b4af37dcd8]
>  actually would not fail the task if the runnable of {{() -> 
> notifyCheckpointComplete}} throwing exception out.
> In a nutshell, this actually changes previous checkpoint exception handling.
> Moreover, that part of code also affect the implemented code of 
> {{notifyCheckpointAbortAsync}} when I introduce {{notifyCheckpointAborted}} 
> on task side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] pnowojski merged pull request #12470: [FLINK-18074][checkpoint] Ensure task could fail when exception thrown out on notified of checkpoint completed/aborted

2020-06-04 Thread GitBox


pnowojski merged pull request #12470:
URL: https://github.com/apache/flink/pull/12470


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #12470: [FLINK-18074][checkpoint] Ensure task could fail when exception thrown out on notified of checkpoint completed/aborted

2020-06-04 Thread GitBox


pnowojski commented on a change in pull request #12470:
URL: https://github.com/apache/flink/pull/12470#discussion_r435691849



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -919,9 +920,33 @@ public final ExecutorService 
getAsyncOperationsThreadPool() {
 
@Override
public Future notifyCheckpointCompleteAsync(long checkpointId) {
-   return 
mailboxProcessor.getMailboxExecutor(TaskMailbox.MAX_PRIORITY).submit(
-   () -> notifyCheckpointComplete(checkpointId),
-   "checkpoint %d complete", checkpointId);
+   return notifyCheckpointOperation(
+   () -> notifyCheckpointComplete(checkpointId),
+   String.format("checkpoint %d complete", checkpointId));
+   }
+
+   @Override
+   public Future notifyCheckpointAbortAsync(long checkpointId) {
+   return notifyCheckpointOperation(
+   () -> 
subtaskCheckpointCoordinator.notifyCheckpointAborted(checkpointId, 
operatorChain, this::isRunning),
+   String.format("checkpoint %d aborted", checkpointId));
+   }
+
+   private Future notifyCheckpointOperation(RunnableWithException 
runnable, String description) {

Review comment:
   Thanks for pointing it out. In that case I think there are a bit too 
many small differences and maybe it's better to keep it as it is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on pull request #12470: [FLINK-18074][checkpoint] Ensure task could fail when exception thrown out on notified of checkpoint completed/aborted

2020-06-04 Thread GitBox


pnowojski commented on pull request #12470:
URL: https://github.com/apache/flink/pull/12470#issuecomment-639260996


   Merged, thanks for fixing the issue :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Myasuka commented on a change in pull request #12470: [FLINK-18074][checkpoint] Ensure task could fail when exception thrown out on notified of checkpoint completed/aborted

2020-06-04 Thread GitBox


Myasuka commented on a change in pull request #12470:
URL: https://github.com/apache/flink/pull/12470#discussion_r435691165



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -919,9 +920,33 @@ public final ExecutorService 
getAsyncOperationsThreadPool() {
 
@Override
public Future notifyCheckpointCompleteAsync(long checkpointId) {
-   return 
mailboxProcessor.getMailboxExecutor(TaskMailbox.MAX_PRIORITY).submit(
-   () -> notifyCheckpointComplete(checkpointId),
-   "checkpoint %d complete", checkpointId);
+   return notifyCheckpointOperation(
+   () -> notifyCheckpointComplete(checkpointId),
+   String.format("checkpoint %d complete", checkpointId));
+   }
+
+   @Override
+   public Future notifyCheckpointAbortAsync(long checkpointId) {
+   return notifyCheckpointOperation(
+   () -> 
subtaskCheckpointCoordinator.notifyCheckpointAborted(checkpointId, 
operatorChain, this::isRunning),
+   String.format("checkpoint %d aborted", checkpointId));
+   }
+
+   private Future notifyCheckpointOperation(RunnableWithException 
runnable, String description) {

Review comment:
   `triggerCheckpointAsync` need to return `Future` while those 
operators just return `Future`. If we decide to change interfaces in 
`AbstractInvokable` to return both `Future`, I think this method could 
be re-used.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12497: [FLINK-18142][hive] Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread GitBox


flinkbot commented on pull request #12497:
URL: https://github.com/apache/flink/pull/12497#issuecomment-639258405


   
   ## CI report:
   
   * 4d9e9f41e60e46d39ea712f6dbe7b16c879908ad UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12496: [FLINK-18110][fs-connector] StreamingFileSink notifies for buckets detected to be inactive on restoring

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12496:
URL: https://github.com/apache/flink/pull/12496#issuecomment-639245821


   
   ## CI report:
   
   * 18db5ae4db2c5284bf0530c60e05f02be3a40d19 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2776)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12335: [FLINK-17753] [table-planner-blink] Fix watermark defined in ddl does not work in Table api

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12335:
URL: https://github.com/apache/flink/pull/12335#issuecomment-633921246


   
   ## CI report:
   
   * 1630336016f87ea92bb1186b0522bdcf0dca4fb3 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2777)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12456: [FLINK-17113][sql-cli] Refactor view support in SQL Client

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12456:
URL: https://github.com/apache/flink/pull/12456#issuecomment-638032908


   
   ## CI report:
   
   * fefdae96b41549bc9d2c75e126c3a06da6863cfb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2764)
 
   * 41aaad57212cd8ed14c3db89c3e2b46fdc9136fc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-16198) FileUtilsTest fails on Mac OS

2020-06-04 Thread Zhe Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126385#comment-17126385
 ] 

Zhe Yu commented on FLINK-16198:


PR: [https://github.com/apache/flink/pull/12413]

> FileUtilsTest fails on Mac OS
> -
>
> Key: FLINK-16198
> URL: https://issues.apache.org/jira/browse/FLINK-16198
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Tests
>Reporter: Andrey Zagrebin
>Assignee: Zhe Yu
>Priority: Major
>  Labels: pull-request-available, starter
>
> The following tests fail if run on Mac OS (IDE/maven).
>  
> FileUtilsTest.testCompressionOnRelativePath: 
> {code:java}
> java.nio.file.NoSuchFileException: 
> ../../../../../var/folders/67/v4yp_42d21j6_n8k1h556h0cgn/T/junit6496651678375117676/compressDir/rootDirjava.nio.file.NoSuchFileException:
>  
> ../../../../../var/folders/67/v4yp_42d21j6_n8k1h556h0cgn/T/junit6496651678375117676/compressDir/rootDir
>  at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>  at java.nio.file.Files.createDirectory(Files.java:674) at 
> org.apache.flink.util.FileUtilsTest.verifyDirectoryCompression(FileUtilsTest.java:440)
>  at 
> org.apache.flink.util.FileUtilsTest.testCompressionOnRelativePath(FileUtilsTest.java:261)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20) at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:137) at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>  at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>  at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
>  
> FileUtilsTest.testDeleteDirectoryConcurrently: 
> {code:java}
> java.nio.file.FileSystemException: 
> /var/folders/67/v4yp_42d21j6_n8k1h556h0cgn/T/junit7558825557740784886/junit3566161583262218465/ab1fa0bde8b22cad58b717508c7a7300/121fdf5f7b057183843ed2e1298f9b66/6598025f390d3084d69c98b36e542fe2/8db7cd9c063396a19a86f5b63ce53f66:
>  Invalid argument   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
>   at 
> sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
>   at java.nio.file.Files.deleteIfExists(Files.java:1165)
>   at 
> org.apache.flink.util.FileUtils.deleteFileOrDirectoryInternal(FileUtils.java:324)
>   at org.apache.flink.util.FileUtils.guardIfWindows(FileUtils.java:391)
>   at 
> org.apache.flink.util.FileUtils.deleteFileOrDirectory(FileUtils.java:258)
>   at 
> org.apache.flink.util.FileUtils.cleanDirectoryInternal(FileUtils.java:376)
>   at 
> org.apache.flink.util.FileUtils.deleteDirectoryInternal(FileUtils.java:335)
>   at 
> 

[GitHub] [flink] godfreyhe commented on a change in pull request #12471: [FLINK-18073][FLINK-18029][avro] Fix AvroRowDataSerializationSchema is not serializable and add IT cases

2020-06-04 Thread GitBox


godfreyhe commented on a change in pull request #12471:
URL: https://github.com/apache/flink/pull/12471#discussion_r435687446



##
File path: 
flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSchemaConverter.java
##
@@ -254,14 +254,6 @@ public static Schema convertToSchema(LogicalType 
logicalType, int rowTypeCounter
.array()

.items(convertToSchema(arrayType.getElementType(), rowTypeCounter));
case RAW:

Review comment:
   ditto

##
File path: 
flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/AvroRowSerializationSchema.java
##
@@ -242,14 +245,20 @@ private Object convertFlinkType(Schema schema, Object 
object) {
// check for logical types
if (object instanceof Date) {
return convertFromDate(schema, (Date) 
object);
+   } else if (object instanceof LocalDate) {
+   return convertFromDate(schema, 
Date.valueOf((LocalDate) object));
} else if (object instanceof Time) {
return convertFromTime(schema, (Time) 
object);
+   } else if (object instanceof LocalTime) {

Review comment:
   is there any new test to check the fix ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #12457: [FLINK-18050][task][checkpointing] Fix double buffer recycling

2020-06-04 Thread GitBox


pnowojski commented on a change in pull request #12457:
URL: https://github.com/apache/flink/pull/12457#discussion_r435687298



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java
##
@@ -52,6 +53,10 @@ static ChannelStateWriteRequest write(long checkpointId, 
InputChannelInfo info,
return buildWriteRequest(checkpointId, "writeInput", iterator, 
(writer, buffer) -> writer.writeInput(info, buffer));
}
 
+   static ChannelStateWriteRequest write(long checkpointId, 
ResultSubpartitionInfo info, Buffer... buffers) {
+   return buildWriteRequest(checkpointId, "writeOutput", 
ofElements(Buffer::recycleBuffer, buffers), (writer, buffer) -> 
writer.writeOutput(info, buffer));
+   }
+

Review comment:
   What's the difference to the previous version? Especially in the 
behaviour? It looks like a subtle change that I'm missing.

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriterImpl.java
##
@@ -129,7 +129,7 @@ public void addOutputData(long checkpointId, 
ResultSubpartitionInfo info, int st
info,
startSeqNum,
data == null ? 0 : data.length);
-   enqueue(write(checkpointId, info, checkBufferType(data)), 
false);
+   enqueue(write(checkpointId, info, data), false);

Review comment:
   Why have you dropped `checkBufferType` call? It was actually quite 
helpful couple of times when debugging. Can not we try to keep it?

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java
##
@@ -32,6 +32,7 @@
 import static 
org.apache.flink.runtime.checkpoint.channel.CheckpointInProgressRequestState.EXECUTING;
 import static 
org.apache.flink.runtime.checkpoint.channel.CheckpointInProgressRequestState.FAILED;
 import static 
org.apache.flink.runtime.checkpoint.channel.CheckpointInProgressRequestState.NEW;
+import static org.apache.flink.util.CloseableIterator.ofElements;

Review comment:
   Can you copy JIRA description to this commit description? By looking at 
it, it's quite hard to understand what is it fixing and how.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-17824) "Resuming Savepoint" e2e stalls indefinitely

2020-06-04 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126379#comment-17126379
 ] 

Piotr Nowojski edited comment on FLINK-17824 at 6/5/20, 4:39 AM:
-

As it was first reported around feature when we were merging quite a bit of 
unaligned checkpoints code, this issue might be related to something we did.

It also might be fixed accidentally via one of the in-progress bugs like 
FLINK-18136 

What do you think [~roman_khachatryan]?


was (Author: pnowojski):
As it was first reported around feature when we were merging quite a bit of 
unaligned checkpoints code, this issue might be related to something we did.

It also might be fixed accidentally via one of the in-progress bugs like 
FLINK-18136 

> "Resuming Savepoint" e2e stalls indefinitely 
> -
>
> Key: FLINK-17824
> URL: https://issues.apache.org/jira/browse/FLINK-17824
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Tests
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> CI; 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1887=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=94459a52-42b6-5bfc-5d74-690b5d3c6de8
> {code}
> 2020-05-19T21:05:52.9696236Z 
> ==
> 2020-05-19T21:05:52.9696860Z Running 'Resuming Savepoint (file, async, scale 
> down) end-to-end test'
> 2020-05-19T21:05:52.9697243Z 
> ==
> 2020-05-19T21:05:52.9713094Z TEST_DATA_DIR: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-52970362751
> 2020-05-19T21:05:53.1194478Z Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
> 2020-05-19T21:05:53.2180375Z Starting cluster.
> 2020-05-19T21:05:53.9986167Z Starting standalonesession daemon on host 
> fv-az558.
> 2020-05-19T21:05:55.5997224Z Starting taskexecutor daemon on host fv-az558.
> 2020-05-19T21:05:55.6223837Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:05:57.0552482Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:05:57.9446865Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:05:59.0098434Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:06:00.0569710Z Dispatcher REST endpoint is up.
> 2020-05-19T21:06:07.7099937Z Job (a92a74de8446a80403798bb4806b73f3) is 
> running.
> 2020-05-19T21:06:07.7855906Z Waiting for job to process up to 200 records, 
> current progress: 114 records ...
> 2020-05-19T21:06:55.5755111Z 
> 2020-05-19T21:06:55.5756550Z 
> 
> 2020-05-19T21:06:55.5757225Z  The program finished with the following 
> exception:
> 2020-05-19T21:06:55.5757566Z 
> 2020-05-19T21:06:55.5765453Z org.apache.flink.util.FlinkException: Could not 
> stop with a savepoint job "a92a74de8446a80403798bb4806b73f3".
> 2020-05-19T21:06:55.5766873Z  at 
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:485)
> 2020-05-19T21:06:55.5767980Z  at 
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:854)
> 2020-05-19T21:06:55.5769014Z  at 
> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:477)
> 2020-05-19T21:06:55.5770052Z  at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:921)
> 2020-05-19T21:06:55.5771107Z  at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:982)
> 2020-05-19T21:06:55.5772223Z  at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> 2020-05-19T21:06:55.5773325Z  at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:982)
> 2020-05-19T21:06:55.5774871Z Caused by: 
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint 
> Coordinator is suspending.
> 2020-05-19T21:06:55.5777183Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-05-19T21:06:55.5778884Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2020-05-19T21:06:55.5779920Z  at 
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:483)
> 2020-05-19T21:06:55.5781175Z  ... 6 more
> 2020-05-19T21:06:55.5782391Z Caused by: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> 

[jira] [Commented] (FLINK-17824) "Resuming Savepoint" e2e stalls indefinitely

2020-06-04 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126379#comment-17126379
 ] 

Piotr Nowojski commented on FLINK-17824:


As it was first reported around feature when we were merging quite a bit of 
unaligned checkpoints code, this issue might be related to something we did.

It also might be fixed accidentally via one of the in-progress bugs like 
FLINK-18136 

> "Resuming Savepoint" e2e stalls indefinitely 
> -
>
> Key: FLINK-17824
> URL: https://issues.apache.org/jira/browse/FLINK-17824
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Tests
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> CI; 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1887=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=94459a52-42b6-5bfc-5d74-690b5d3c6de8
> {code}
> 2020-05-19T21:05:52.9696236Z 
> ==
> 2020-05-19T21:05:52.9696860Z Running 'Resuming Savepoint (file, async, scale 
> down) end-to-end test'
> 2020-05-19T21:05:52.9697243Z 
> ==
> 2020-05-19T21:05:52.9713094Z TEST_DATA_DIR: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-52970362751
> 2020-05-19T21:05:53.1194478Z Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
> 2020-05-19T21:05:53.2180375Z Starting cluster.
> 2020-05-19T21:05:53.9986167Z Starting standalonesession daemon on host 
> fv-az558.
> 2020-05-19T21:05:55.5997224Z Starting taskexecutor daemon on host fv-az558.
> 2020-05-19T21:05:55.6223837Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:05:57.0552482Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:05:57.9446865Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:05:59.0098434Z Waiting for Dispatcher REST endpoint to come 
> up...
> 2020-05-19T21:06:00.0569710Z Dispatcher REST endpoint is up.
> 2020-05-19T21:06:07.7099937Z Job (a92a74de8446a80403798bb4806b73f3) is 
> running.
> 2020-05-19T21:06:07.7855906Z Waiting for job to process up to 200 records, 
> current progress: 114 records ...
> 2020-05-19T21:06:55.5755111Z 
> 2020-05-19T21:06:55.5756550Z 
> 
> 2020-05-19T21:06:55.5757225Z  The program finished with the following 
> exception:
> 2020-05-19T21:06:55.5757566Z 
> 2020-05-19T21:06:55.5765453Z org.apache.flink.util.FlinkException: Could not 
> stop with a savepoint job "a92a74de8446a80403798bb4806b73f3".
> 2020-05-19T21:06:55.5766873Z  at 
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:485)
> 2020-05-19T21:06:55.5767980Z  at 
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:854)
> 2020-05-19T21:06:55.5769014Z  at 
> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:477)
> 2020-05-19T21:06:55.5770052Z  at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:921)
> 2020-05-19T21:06:55.5771107Z  at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:982)
> 2020-05-19T21:06:55.5772223Z  at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> 2020-05-19T21:06:55.5773325Z  at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:982)
> 2020-05-19T21:06:55.5774871Z Caused by: 
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint 
> Coordinator is suspending.
> 2020-05-19T21:06:55.5777183Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-05-19T21:06:55.5778884Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2020-05-19T21:06:55.5779920Z  at 
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:483)
> 2020-05-19T21:06:55.5781175Z  ... 6 more
> 2020-05-19T21:06:55.5782391Z Caused by: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint 
> Coordinator is suspending.
> 2020-05-19T21:06:55.5783885Z  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.lambda$stopWithSavepoint$9(SchedulerBase.java:890)
> 2020-05-19T21:06:55.5784992Z  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
> 

[GitHub] [flink] JingsongLi commented on pull request #12408: [FLINK-15547][hive] Add test for avro table

2020-06-04 Thread GitBox


JingsongLi commented on pull request #12408:
URL: https://github.com/apache/flink/pull/12408#issuecomment-639253780


   @lirui-apache Please rebase and update JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-04 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126378#comment-17126378
 ] 

Jingsong Lee commented on FLINK-16497:
--

+1 to default flush size (100) and flush interval(1s). 

I think we should find a trade-off between initial experience and production.

I think 1s is good enough for user experience out-of-box. And these default 
values can also work for production.

IIUC, elasticsearch also has a good default flush values for both user 
experience out-of-box and production.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] pnowojski commented on pull request #12457: [FLINK-18050][task][checkpointing] Fix double buffer recycling

2020-06-04 Thread GitBox


pnowojski commented on pull request #12457:
URL: https://github.com/apache/flink/pull/12457#issuecomment-639252997


   This issue FLINK-17824 could be caused by some of our changes, we should 
take a look at it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TsReaper commented on a change in pull request #12453: [FLINK-18061] [table] TableResult#collect method should return a closeable iterator to avoid resource leak

2020-06-04 Thread GitBox


TsReaper commented on a change in pull request #12453:
URL: https://github.com/apache/flink/pull/12453#discussion_r435684399



##
File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/TableResult.java
##
@@ -49,9 +49,33 @@
ResultKind getResultKind();
 
/**
-* Get the result contents as a row iterator.
+* Get the result contents as a closeable row iterator.
+*
+* NOTE:If this result corresponds to a flink job,
+* the job will not be finished unless all result data has been 
collected.
+* So we should actively close the job to avoid resource leak.
+*
+* There are two approaches to close a job:
+* 1. close the job through JobClient, for example:
+* {@code
+*  TableResult result = tEnv.execute("select ...");
+*  CloseableIterator it = result.collect();
+*  it... // collect same data
+*  result.getJobClient().get().cancel();
+* }
+*
+* 2. close the job through CloseableIterator
+* (calling CloseableIterator#close method will trigger 
JobClient#cancel method),
+* for example:

Review comment:
   For insert job this method can not cancel the job.

##
File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/TableResult.java
##
@@ -49,9 +49,33 @@
ResultKind getResultKind();
 
/**
-* Get the result contents as a row iterator.
+* Get the result contents as a closeable row iterator.
+*
+* NOTE:If this result corresponds to a flink job,
+* the job will not be finished unless all result data has been 
collected.
+* So we should actively close the job to avoid resource leak.
+*
+* There are two approaches to close a job:
+* 1. close the job through JobClient, for example:

Review comment:
   This is actually not recommended. If user cancels the job but the 
iterator does not know it, the next call to the iterator might throw exception. 
We should only recommend the users to close the iterator if they don't need 
more data.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12497: [FLINK-18142][hive] Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread GitBox


flinkbot commented on pull request #12497:
URL: https://github.com/apache/flink/pull/12497#issuecomment-639252070


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 4d9e9f41e60e46d39ea712f6dbe7b16c879908ad (Fri Jun 05 
04:31:38 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TsReaper commented on a change in pull request #12473: [FLINK-18061] [table] TableResult#collect method should return a closeable iterator to avoid resource leak

2020-06-04 Thread GitBox


TsReaper commented on a change in pull request #12473:
URL: https://github.com/apache/flink/pull/12473#discussion_r435684144



##
File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/TableResult.java
##
@@ -49,9 +49,33 @@
ResultKind getResultKind();
 
/**
-* Get the result contents as a row iterator.
+* Get the result contents as a closeable row iterator.
+*
+* NOTE:If this result corresponds to a flink job,
+* the job will not be finished unless all result data has been 
collected.
+* So we should actively close the job to avoid resource leak.
+*
+* There are two approaches to close a job:
+* 1. close the job through JobClient, for example:
+* {@code

Review comment:
   This is actually not recommended. If user cancels the job but the 
iterator does not know it, the next call to the iterator might throw exception. 
We should only recommend the users to close the iterator if they don't need 
more data.

##
File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/TableResult.java
##
@@ -49,9 +49,33 @@
ResultKind getResultKind();
 
/**
-* Get the result contents as a row iterator.
+* Get the result contents as a closeable row iterator.
+*
+* NOTE:If this result corresponds to a flink job,
+* the job will not be finished unless all result data has been 
collected.
+* So we should actively close the job to avoid resource leak.
+*
+* There are two approaches to close a job:
+* 1. close the job through JobClient, for example:
+* {@code
+*  TableResult result = tEnv.execute("select ...");
+*  CloseableIterator it = result.collect();
+*  it... // collect same data
+*  result.getJobClient().get().cancel();
+* }
+*
+* 2. close the job through CloseableIterator
+* (calling CloseableIterator#close method will trigger 
JobClient#cancel method),
+* for example:

Review comment:
   For insert job this method can not cancel the job.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] danny0405 commented on pull request #12456: [FLINK-17113][sql-cli] Refactor view support in SQL Client

2020-06-04 Thread GitBox


danny0405 commented on pull request #12456:
URL: https://github.com/apache/flink/pull/12456#issuecomment-639252001


   > Thanks for the fix @danny0405 , I find there are many unrelated change ? 
such as `Sink`, `TableSourceTable`, `FlinkTypeFactory`.
   
   I found that the legacy planner does not support the nullability inference 
well, so i revert this part of change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi opened a new pull request #12497: [FLINK-18142][hive] Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread GitBox


JingsongLi opened a new pull request #12497:
URL: https://github.com/apache/flink/pull/12497


   
   ## What is the purpose of the change
   
   Should have different name between currReadTimeState and distinctPartsState.
   Otherwise will checkpoint failure.
   
   ## Verifying this change
   
   `HiveTableSourceITCase.testStreamPartitionRead`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18142) Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18142:
---
Labels: pull-request-available  (was: )

> Wrong state names in HiveContinuousMonitoringFunction
> -
>
> Key: FLINK-18142
> URL: https://issues.apache.org/jira/browse/FLINK-18142
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Should have different name between {{currReadTimeState}} and 
> {{distinctPartsState}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12496: [FLINK-18110][fs-connector] StreamingFileSink notifies for buckets detected to be inactive on restoring

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12496:
URL: https://github.com/apache/flink/pull/12496#issuecomment-639245821


   
   ## CI report:
   
   * 18db5ae4db2c5284bf0530c60e05f02be3a40d19 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2776)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12495: [FLINK-18048][Backport 1.10] Fix --host option for standalone job cluster

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12495:
URL: https://github.com/apache/flink/pull/12495#issuecomment-639245767


   
   ## CI report:
   
   * 6589b314827707fe2b20b00c4f14936b4adb4474 Travis: 
[PENDING](https://travis-ci.com/github/flink-ci/flink/builds/169752033) 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12335: [FLINK-17753] [table-planner-blink] Fix watermark defined in ddl does not work in Table api

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12335:
URL: https://github.com/apache/flink/pull/12335#issuecomment-633921246


   
   ## CI report:
   
   * 8f86268e43fdfdfd32c7ad4381e243eae33a123f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2196)
 
   * 1630336016f87ea92bb1186b0522bdcf0dca4fb3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12485: [FLINK-18130][hive][fs-connector] File name conflict for different jobs in filesystem/hive sink

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12485:
URL: https://github.com/apache/flink/pull/12485#issuecomment-638848123


   
   ## CI report:
   
   * 7fe7332d4767bbfb996e86f1b59d99c8c4713192 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2737)
 
   * fa51d430e64af7aa9d18cc5f463218aa91bf0d96 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2775)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12028: [FLINK-17553][table]fix plan error when constant exists in group window key

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12028:
URL: https://github.com/apache/flink/pull/12028#issuecomment-625656822


   
   ## CI report:
   
   * 86e04bf10b62c7dd52655fc8a20340c3c0ba4d38 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2172)
 
   * e8a455a8cc508ab9f8b9d28bfe4e4548709dd93a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2774)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18142) Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-18142:
-
Priority: Critical  (was: Major)

> Wrong state names in HiveContinuousMonitoringFunction
> -
>
> Key: FLINK-18142
> URL: https://issues.apache.org/jira/browse/FLINK-18142
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Critical
> Fix For: 1.11.0
>
>
> Should have different name between {{currReadTimeState}} and 
> {{distinctPartsState}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18142) Wrong state names in HiveContinuousMonitoringFunction

2020-06-04 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-18142:


 Summary: Wrong state names in HiveContinuousMonitoringFunction
 Key: FLINK-18142
 URL: https://issues.apache.org/jira/browse/FLINK-18142
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.11.0


Should have different name between {{currReadTimeState}} and 
{{distinctPartsState}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-04 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126373#comment-17126373
 ] 

Danny Chen commented on FLINK-16497:


I think as a popular streaming engine, ensure good throughput and performance 
should be in the first class. Most of the client tools have a default flush 
strategy(either buffer size or interval)[1][2]. We should also follow that.

I would suggest a default flush size (100) and flush interval(1s), it performs 
well for production and in local test 1s is also an acceptable latency.

[1] https://kafka.apache.org/22/documentation.html#producerconfigs
[2] https://github.com/searchbox-io/Jest

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126353#comment-17126353
 ] 

Yuan Mei edited comment on FLINK-17949 at 6/5/20, 4:09 AM:
---

These comments are quoted from [~becket_qin]

"Theoretically speaking, using KafkaAdminClient to create a topic does not 100% 
guarantee that a producer will not see the "does not host this topic-partition" 
error. This is because when the AdminClient can only guarantee the topic 
metadata information has existed in the broker to which it sent the 
CreateTopicRequest. When a producer comes at a later point, it might send 
TopicMetdataRequest to a different broker and that broker may have not received 
the updated topic metadata yet. But this is much unlikely to happen given the 
broker usually receives the metadata update at the same time. Having retries 
configured on the producer side should be sufficient to handle such cases. We 
can also do that for 0.10 and 0.11 producers. But given that we have the 
producer properties scattered over the places (which is something we probably 
should avoid, to begin with), it would be simpler to just make sure the topic 
has been created successfully before we start the tests."

 

"If we do not want this happen, we should check the metadata cache of each 
broker to make sure it has the topic metadata".

 

I think this is one case possibly to cause the situation. But not sure whether 
it is the same case encountered here (LEADER_NOT_AVAIBLE)
{code:java}
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 


was (Author: ym):
This comment is quoted from [~becket_qin]

"Theoretically speaking, using KafkaAdminClient to create a topic does not 100% 
guarantee that a producer will not see the "does not host this topic-partition" 
error. This is because when the AdminClient can only guarantee the topic 
metadata information has existed in the broker to which it sent the 
CreateTopicRequest. When a producer comes at a later point, it might send 
TopicMetdataRequest to a different broker and that broker may have not received 
the updated topic metadata yet. But this is much unlikely to happen given the 
broker usually receives the metadata update at the same time. Having retries 
configured on the producer side should be sufficient to handle such cases. We 
can also do that for 0.10 and 0.11 producers. But given that we have the 
producer properties scattered over the places (which is something we probably 
should avoid, to begin with), it would be simpler to just make sure the topic 
has been created successfully before we start the tests."

 

"If we do not want this happen, we should check the metadata cache of each 
broker to make sure it has the topic metadata".

 

I think this is one case possibly to cause the situation. But not sure whether 
it is the same case encountered here (LEADER_NOT_AVAIBLE)
{code:java}
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}

  

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> 

[GitHub] [flink] flinkbot commented on pull request #12496: [FLINK-18110][fs-connector] StreamingFileSink notifies for buckets detected to be inactive on restoring

2020-06-04 Thread GitBox


flinkbot commented on pull request #12496:
URL: https://github.com/apache/flink/pull/12496#issuecomment-639245821


   
   ## CI report:
   
   * 18db5ae4db2c5284bf0530c60e05f02be3a40d19 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on a change in pull request #12493: [FLINK-18139][checkpointing] Fixing unaligned checkpoints checks wrong channels for inflight data.

2020-06-04 Thread GitBox


zhijiangW commented on a change in pull request #12493:
URL: https://github.com/apache/flink/pull/12493#discussion_r435678966



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/io/StreamTaskNetworkInputTest.java
##
@@ -110,6 +118,69 @@ public void testNoDataProcessedAfterCheckpointBarrier() 
throws Exception {
assertEquals(0, output.getNumberOfEmittedRecords());
}
 
+   @Test
+   public void testSnapshotWithTwoInputGates() throws Exception {
+   CheckpointBarrierUnaligner unaligner = new 
CheckpointBarrierUnaligner(
+   new int[]{ 1, 1 },
+   ChannelStateWriter.NO_OP,
+   "test",
+   new DummyCheckpointInvokable());
+
+   SingleInputGate inputGate1 = new 
SingleInputGateBuilder().setSingleInputGateIndex(0).build();
+   RemoteInputChannel channel1 = 
InputChannelBuilder.newBuilder().buildRemoteChannel(inputGate1);
+   inputGate1.setInputChannels(channel1);
+   
inputGate1.registerBufferReceivedListener(unaligner.getBufferReceivedListener().get());
+   StreamTaskNetworkInput input1 = createInput(unaligner, 
inputGate1);
+
+   SingleInputGate inputGate2 = new 
SingleInputGateBuilder().setSingleInputGateIndex(1).build();
+   RemoteInputChannel channel2 = 
InputChannelBuilder.newBuilder().buildRemoteChannel(inputGate2);
+   inputGate2.setInputChannels(channel2);
+   
inputGate2.registerBufferReceivedListener(unaligner.getBufferReceivedListener().get());
+   StreamTaskNetworkInput input2 = createInput(unaligner, 
inputGate2);
+
+   CheckpointBarrier barrier = new CheckpointBarrier(0, 0L, 
CheckpointOptions.forCheckpointWithDefaultLocation());
+   channel1.onBuffer(EventSerializer.toBuffer(barrier), 0, 0);
+   channel1.onBuffer(BufferBuilderTestUtils.buildSomeBuffer(1), 1, 
0);
+
+   // all records on inputGate2 are now in-flight
+   channel2.onBuffer(BufferBuilderTestUtils.buildSomeBuffer(2), 0, 
0);
+   channel2.onBuffer(BufferBuilderTestUtils.buildSomeBuffer(3), 1, 
0);
+
+   // now snapshot all inflight buffers
+   RecordingChannelStateWriter channelStateWriter = new 
RecordingChannelStateWriter();
+   channelStateWriter.start(0, 
CheckpointOptions.forCheckpointWithDefaultLocation());
+   CompletableFuture completableFuture1 = 
input1.prepareSnapshot(channelStateWriter, 0);
+   CompletableFuture completableFuture2 = 
input2.prepareSnapshot(channelStateWriter, 0);
+
+   // finish unaligned checkpoint on input side
+   channel2.onBuffer(EventSerializer.toBuffer(barrier), 2, 0);
+
+   // futures should be completed
+   completableFuture1.join();
+   completableFuture2.join();
+
+   
assertEquals(channelStateWriter.getAddedInput().get(channel1.getChannelInfo()), 
Collections.emptyList());
+   assertEquals(Arrays.asList(2, 3),

Review comment:
   nit: maybe such indentation formatting:
   
   ```
   assertEquals(
Arrays.asList(2, 3),

channelStateWriter.getAddedInput().get(channel2.getChannelInfo())
.stream()
.map(Buffer::getSize)
.collect(Collectors.toList()));
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12495: [FLINK-18048][Backport 1.10] Fix --host option for standalone job cluster

2020-06-04 Thread GitBox


flinkbot commented on pull request #12495:
URL: https://github.com/apache/flink/pull/12495#issuecomment-639245767


   
   ## CI report:
   
   * 6589b314827707fe2b20b00c4f14936b4adb4474 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12485: [FLINK-18130][hive][fs-connector] File name conflict for different jobs in filesystem/hive sink

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12485:
URL: https://github.com/apache/flink/pull/12485#issuecomment-638848123


   
   ## CI report:
   
   * 7fe7332d4767bbfb996e86f1b59d99c8c4713192 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2737)
 
   * fa51d430e64af7aa9d18cc5f463218aa91bf0d96 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-17033) Upgrade OpenJDK docker image for Kubernetes

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-17033.
-
Resolution: Fixed

> Upgrade OpenJDK docker image for Kubernetes
> ---
>
> Key: FLINK-17033
> URL: https://issues.apache.org/jira/browse/FLINK-17033
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Priority: Major
> Fix For: 1.11.0, 1.12.0
>
>
> The current docker image {{openjdk:8-jre-alpine}} used by Kubernetes has many 
> problems, here is some of them:
>  # There is no official support for this image since the commit 
> [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099]
>  # 
> [DNS-lookup-5s-delay|https://k8s.imroc.io/troubleshooting/cases/dns-lookup-5s-delay]
>  # [DNS resolver does not read 
> /etc/hosts|https://github.com/golang/go/issues/22846]
>  # It uses musl libc instead of glibc. This could cause a problem if we want 
> to run native library like intel MKL for native BLAS support
> Therefore, this ticket proposes to investigate an alternative official JDK 
> docker image; I think it's a good choice to use {{openjdk:8-jre-slim}}(184MB) 
> instead, the reasons are as follows:
>  # It has official support from openjdk: 
> [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links]
>  # It is based on debian and does not have many problems such as DNS lookup 
> delay.
>  # It's much smaller in size than other official docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17033) Upgrade OpenJDK docker image for Kubernetes

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-17033:
--
Fix Version/s: 1.12.0

> Upgrade OpenJDK docker image for Kubernetes
> ---
>
> Key: FLINK-17033
> URL: https://issues.apache.org/jira/browse/FLINK-17033
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Priority: Major
> Fix For: 1.11.0, 1.12.0
>
>
> The current docker image {{openjdk:8-jre-alpine}} used by Kubernetes has many 
> problems, here is some of them:
>  # There is no official support for this image since the commit 
> [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099]
>  # 
> [DNS-lookup-5s-delay|https://k8s.imroc.io/troubleshooting/cases/dns-lookup-5s-delay]
>  # [DNS resolver does not read 
> /etc/hosts|https://github.com/golang/go/issues/22846]
>  # It uses musl libc instead of glibc. This could cause a problem if we want 
> to run native library like intel MKL for native BLAS support
> Therefore, this ticket proposes to investigate an alternative official JDK 
> docker image; I think it's a good choice to use {{openjdk:8-jre-slim}}(184MB) 
> instead, the reasons are as follows:
>  # It has official support from openjdk: 
> [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links]
>  # It is based on debian and does not have many problems such as DNS lookup 
> delay.
>  # It's much smaller in size than other official docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17033) Upgrade OpenJDK docker image for Kubernetes

2020-06-04 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126362#comment-17126362
 ] 

Yang Wang commented on FLINK-17033:
---

>From release-1.11(done via FLINK-17160), the base image has been replaced with 
>{{openjdk:8-jre}}. Refer here for more information[1].

 

[1]. 
[https://github.com/apache/flink-docker/blob/dev-master/Dockerfile-debian.template]

> Upgrade OpenJDK docker image for Kubernetes
> ---
>
> Key: FLINK-17033
> URL: https://issues.apache.org/jira/browse/FLINK-17033
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Priority: Major
> Fix For: 1.11.0
>
>
> The current docker image {{openjdk:8-jre-alpine}} used by Kubernetes has many 
> problems, here is some of them:
>  # There is no official support for this image since the commit 
> [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099]
>  # 
> [DNS-lookup-5s-delay|https://k8s.imroc.io/troubleshooting/cases/dns-lookup-5s-delay]
>  # [DNS resolver does not read 
> /etc/hosts|https://github.com/golang/go/issues/22846]
>  # It uses musl libc instead of glibc. This could cause a problem if we want 
> to run native library like intel MKL for native BLAS support
> Therefore, this ticket proposes to investigate an alternative official JDK 
> docker image; I think it's a good choice to use {{openjdk:8-jre-slim}}(184MB) 
> instead, the reasons are as follows:
>  # It has official support from openjdk: 
> [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links]
>  # It is based on debian and does not have many problems such as DNS lookup 
> delay.
>  # It's much smaller in size than other official docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12028: [FLINK-17553][table]fix plan error when constant exists in group window key

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12028:
URL: https://github.com/apache/flink/pull/12028#issuecomment-625656822


   
   ## CI report:
   
   * 86e04bf10b62c7dd52655fc8a20340c3c0ba4d38 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2172)
 
   * e8a455a8cc508ab9f8b9d28bfe4e4548709dd93a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-16508) Name the ports exposed by the main Container in Pod

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-16508.
-
Fix Version/s: 1.12.0
   Resolution: Fixed

Master and release-1.11 via 3d119079288f3e9da7a19a32d68446ae5228b626.

> Name the ports exposed by the main Container in Pod
> ---
>
> Key: FLINK-16508
> URL: https://issues.apache.org/jira/browse/FLINK-16508
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0
>Reporter: Canbin Zheng
>Assignee: Canbin Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, we expose some ports via the main Container of the JobManager and 
> the TaskManager, but we forget to name those ports so that people could be 
> confused because there is no description of the port usage. This ticket 
> proposes to explicitly name the ports in the Container to help people 
> understand the usage of those ports.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12496: [FLINK-18110][fs-connector] StreamingFileSink notifies for buckets detected to be inactive on restoring

2020-06-04 Thread GitBox


flinkbot commented on pull request #12496:
URL: https://github.com/apache/flink/pull/12496#issuecomment-639243709


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 18db5ae4db2c5284bf0530c60e05f02be3a40d19 (Fri Jun 05 
03:56:10 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126348#comment-17126348
 ] 

Yuan Mei edited comment on FLINK-17949 at 6/5/20, 3:55 AM:
---

Thanks, [~rmetzger] , I got a bit more msg this time:

It does sound like a metadata fetching problem.
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

I was wondering whether similar cases happen before while using Kafka, and how 
this situation could be avoided?

cc [~aljoscha], [~AHeise], [~becket_qin]

 


was (Author: ym):
Thanks, [~rmetzger] , I got a bit more msg this time:

It does sound like a metadata fetching problem.
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

I was wondering whether similar cases happen before while using Kafka, and how 
this situation could be avoided?

cc [~aljoscha]

 

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> 

[GitHub] [flink] wanglijie95 commented on a change in pull request #12427: [FLINK-16681][jdbc] Fix the bug that jdbc lost connection after a lon…

2020-06-04 Thread GitBox


wanglijie95 commented on a change in pull request #12427:
URL: https://github.com/apache/flink/pull/12427#discussion_r435676536



##
File path: 
flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/internal/executor/InsertOrUpdateJdbcExecutor.java
##
@@ -83,6 +88,21 @@ public void open(Connection connection) throws SQLException {
updateStatement = connection.prepareStatement(updateSQL);
}
 
+   @Override
+   public void reopen(Connection connection) throws SQLException {
+   try {
+   existStatement.close();
+   insertStatement.close();
+   updateStatement.close();
+   } catch (SQLException e) {
+   LOG.info("PreparedStatement close failed.", e);
+   }
+
+   existStatement = connection.prepareStatement(existSQL);
+   insertStatement = connection.prepareStatement(insertSQL);
+   updateStatement = connection.prepareStatement(updateSQL);

Review comment:
   Hi @wuchong , I have checked the implementation of `open` and `close`. 
The `reopen` is not the simple combination of `close` and `open`. The 
difference is that `close` will clear the `batch` map. It's not what we wan't. 
   
   So we need to introduce a new `reopen` interface. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126353#comment-17126353
 ] 

Yuan Mei commented on FLINK-17949:
--

This comment is quoted from [~becket_qin]

"Theoretically speaking, using KafkaAdminClient to create a topic does not 100% 
guarantee that a producer will not see the "does not host this topic-partition" 
error. This is because when the AdminClient can only guarantee the topic 
metadata information has existed in the broker to which it sent the 
CreateTopicRequest. When a producer comes at a later point, it might send 
TopicMetdataRequest to a different broker and that broker may have not received 
the updated topic metadata yet. But this is much unlikely to happen given the 
broker usually receives the metadata update at the same time. Having retries 
configured on the producer side should be sufficient to handle such cases. We 
can also do that for 0.10 and 0.11 producers. But given that we have the 
producer properties scattered over the places (which is something we probably 
should avoid, to begin with), it would be simpler to just make sure the topic 
has been created successfully before we start the tests."

 

"If we do not want this happen, we should check the metadata cache of each 
broker to make sure it has the topic metadata".

 

I think this is one case possibly to cause the situation. But not sure whether 
it is the same case encountered here (LEADER_NOT_AVAIBLE)
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
\{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}
 

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-26T13:35:19.4033188Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-26T13:35:19.4033638Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-26T13:35:19.4034103Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-05-26T13:35:19.4034593Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-26T13:35:19.4035118Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-26T13:35:19.4035570Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-26T13:35:19.4035888Z  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This 

[jira] [Comment Edited] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126353#comment-17126353
 ] 

Yuan Mei edited comment on FLINK-17949 at 6/5/20, 3:54 AM:
---

This comment is quoted from [~becket_qin]

"Theoretically speaking, using KafkaAdminClient to create a topic does not 100% 
guarantee that a producer will not see the "does not host this topic-partition" 
error. This is because when the AdminClient can only guarantee the topic 
metadata information has existed in the broker to which it sent the 
CreateTopicRequest. When a producer comes at a later point, it might send 
TopicMetdataRequest to a different broker and that broker may have not received 
the updated topic metadata yet. But this is much unlikely to happen given the 
broker usually receives the metadata update at the same time. Having retries 
configured on the producer side should be sufficient to handle such cases. We 
can also do that for 0.10 and 0.11 producers. But given that we have the 
producer properties scattered over the places (which is something we probably 
should avoid, to begin with), it would be simpler to just make sure the topic 
has been created successfully before we start the tests."

 

"If we do not want this happen, we should check the metadata cache of each 
broker to make sure it has the topic metadata".

 

I think this is one case possibly to cause the situation. But not sure whether 
it is the same case encountered here (LEADER_NOT_AVAIBLE)
{code:java}
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}

  


was (Author: ym):
This comment is quoted from [~becket_qin]

"Theoretically speaking, using KafkaAdminClient to create a topic does not 100% 
guarantee that a producer will not see the "does not host this topic-partition" 
error. This is because when the AdminClient can only guarantee the topic 
metadata information has existed in the broker to which it sent the 
CreateTopicRequest. When a producer comes at a later point, it might send 
TopicMetdataRequest to a different broker and that broker may have not received 
the updated topic metadata yet. But this is much unlikely to happen given the 
broker usually receives the metadata update at the same time. Having retries 
configured on the producer side should be sufficient to handle such cases. We 
can also do that for 0.10 and 0.11 producers. But given that we have the 
producer properties scattered over the places (which is something we probably 
should avoid, to begin with), it would be simpler to just make sure the topic 
has been created successfully before we start the tests."

 

"If we do not want this happen, we should check the metadata cache of each 
broker to make sure it has the topic metadata".

 

I think this is one case possibly to cause the situation. But not sure whether 
it is the same case encountered here (LEADER_NOT_AVAIBLE)
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
\{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}
 

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 

[jira] [Updated] (FLINK-18110) Bucket Listener in StreamingFileSink should notify for buckets detected to be inactive at recovery

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18110:
---
Labels: pull-request-available  (was: )

> Bucket Listener in StreamingFileSink should notify for buckets detected to be 
> inactive at recovery
> --
>
> Key: FLINK-18110
> URL: https://issues.apache.org/jira/browse/FLINK-18110
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Connectors / Hive
>Affects Versions: 1.11.0
>Reporter: Yun Gao
>Assignee: Yun Gao
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Current streaming file sink using bucket lifecycle listener to support the 
> global commit of  Hive sink, and the listener sends notification when buckets 
> are created and get inactive. However, after failover,  a bucket may be 
> directly detected to be inactive and deserted directly, in this case the 
> listener should also send the inactive notification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] gaoyunhaii opened a new pull request #12496: [FLINK-18110][fs-connector] StreamingFileSink notifies for buckets detected to be inactive on restoring

2020-06-04 Thread GitBox


gaoyunhaii opened a new pull request #12496:
URL: https://github.com/apache/flink/pull/12496


   ## What is the purpose of the change
   
   This PR fixed the issue that the `BucketLifeCycleListener` in 
`StreamingFileSink` misses the notification for buckets detected inactive on 
restoring. On restoring, the buckets with only pending files will get all the 
pending files committed and will not be recorded. At this time, 
`StreamingFileSink` should notify that the buckets get inactive.
   
   ## Brief change log
 - 18db5ae4db2c5284bf0530c60e05f02be3a40d19 adds the missed notification.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
 - Added test that validates that the notification is sent on restoring if 
buckets get inactive.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): **no**
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
 - The serializers: **no**
 - The runtime per-record code paths (performance sensitive): **no**
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: **no**
 - The S3 file system connector: **no**
   
   ## Documentation
   
 - Does this pull request introduce a new feature? **no**
 - If yes, how is the feature documented? **not applicable**



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17102) Add -Dkubernetes.container.image= for the start-flink-session section

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-17102:
--
Affects Version/s: 1.11.0
   1.10.1

> Add -Dkubernetes.container.image= for the start-flink-session 
> section
> 
>
> Key: FLINK-17102
> URL: https://issues.apache.org/jira/browse/FLINK-17102
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.1, 1.11.0
>Reporter: Canbin Zheng
>Priority: Minor
> Fix For: 1.11.0, 1.12.0
>
>
> Add {{-Dkubernetes.container.image=}} as a guide for new users in 
> the existing command:
> {quote}{{}}
>  
> {{./bin/kubernetes-session.sh \}}
> {{-Dkubernetes.cluster-id= 
> \-Dtaskmanager.memory.process.size=4096m \-Dkubernetes.taskmanager.cpu=2 
> \-Dtaskmanager.numberOfTaskSlots=4 
> \-Dresourcemanager.taskmanager-timeout=360}}{{}}
> {quote}
> Details could refer to 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#start-flink-session]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17102) Add -Dkubernetes.container.image= for the start-flink-session section

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-17102.
-
Resolution: Fixed

> Add -Dkubernetes.container.image= for the start-flink-session 
> section
> 
>
> Key: FLINK-17102
> URL: https://issues.apache.org/jira/browse/FLINK-17102
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.1, 1.11.0
>Reporter: Canbin Zheng
>Priority: Minor
> Fix For: 1.11.0, 1.12.0
>
>
> Add {{-Dkubernetes.container.image=}} as a guide for new users in 
> the existing command:
> {quote}{{}}
>  
> {{./bin/kubernetes-session.sh \}}
> {{-Dkubernetes.cluster-id= 
> \-Dtaskmanager.memory.process.size=4096m \-Dkubernetes.taskmanager.cpu=2 
> \-Dtaskmanager.numberOfTaskSlots=4 
> \-Dresourcemanager.taskmanager-timeout=360}}{{}}
> {quote}
> Details could refer to 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#start-flink-session]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17102) Add -Dkubernetes.container.image= for the start-flink-session section

2020-06-04 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126351#comment-17126351
 ] 

Yang Wang commented on FLINK-17102:
---

The image option has been added in custom-flink-docker-image section[1].

 

[1]. 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#custom-flink-docker-image]

> Add -Dkubernetes.container.image= for the start-flink-session 
> section
> 
>
> Key: FLINK-17102
> URL: https://issues.apache.org/jira/browse/FLINK-17102
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.1, 1.11.0
>Reporter: Canbin Zheng
>Priority: Minor
> Fix For: 1.11.0, 1.12.0
>
>
> Add {{-Dkubernetes.container.image=}} as a guide for new users in 
> the existing command:
> {quote}{{}}
>  
> {{./bin/kubernetes-session.sh \}}
> {{-Dkubernetes.cluster-id= 
> \-Dtaskmanager.memory.process.size=4096m \-Dkubernetes.taskmanager.cpu=2 
> \-Dtaskmanager.numberOfTaskSlots=4 
> \-Dresourcemanager.taskmanager-timeout=360}}{{}}
> {quote}
> Details could refer to 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#start-flink-session]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17102) Add -Dkubernetes.container.image= for the start-flink-session section

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-17102:
--
Fix Version/s: 1.12.0
   1.11.0

> Add -Dkubernetes.container.image= for the start-flink-session 
> section
> 
>
> Key: FLINK-17102
> URL: https://issues.apache.org/jira/browse/FLINK-17102
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Priority: Minor
> Fix For: 1.11.0, 1.12.0
>
>
> Add {{-Dkubernetes.container.image=}} as a guide for new users in 
> the existing command:
> {quote}{{}}
>  
> {{./bin/kubernetes-session.sh \}}
> {{-Dkubernetes.cluster-id= 
> \-Dtaskmanager.memory.process.size=4096m \-Dkubernetes.taskmanager.cpu=2 
> \-Dtaskmanager.numberOfTaskSlots=4 
> \-Dresourcemanager.taskmanager-timeout=360}}{{}}
> {quote}
> Details could refer to 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#start-flink-session]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

2020-06-04 Thread GitBox


zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435674952



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+   private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+   private final SlotSelectionStrategy slotSelectionStrategy;
+
+   private final SlotPool slotPool;
+
+   private final Clock clock;
+
+   private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+   BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, 
final SlotPool slotPool) {
+   this(slotSelectionStrategy, slotPool, 
SystemClock.getInstance());
+   }
+
+   @VisibleForTesting
+   BulkSlotProviderImpl(
+   final SlotSelectionStrategy slotSelectionStrategy,
+   final SlotPool slotPool,
+   final Clock clock) {
+
+   this.slotSelectionStrategy = 
checkNotNull(slotSelectionStrategy);
+   this.slotPool = checkNotNull(slotPool);
+   this.clock = checkNotNull(clock);
+
+   this.slotRequestBulkTracker = new 
PhysicalSlotRequestBulkTracker(clock);
+
+   this.componentMainThreadExecutor = new 
ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+   "Scheduler is not initialized with proper main thread 
executor. " +
+   "Call to BulkSlotProvider.start(...) 
required.");
+   }
+
+   @Override
+   public void start(final ComponentMainThreadExecutor mainThreadExecutor) 
{
+   this.componentMainThreadExecutor = mainThreadExecutor;
+   }
+
+   @Override
+   public CompletableFuture> 
allocatePhysicalSlots(
+   final Collection 
physicalSlotRequests,
+   final Time timeout) {
+
+   componentMainThreadExecutor.assertRunningInMainThread();
+
+   LOG.debug("Received {} slot requests.", 
physicalSlotRequests.size());
+
+   final PhysicalSlotRequestBulk slotRequestBulk = new 
PhysicalSlotRequestBulk(physicalSlotRequests);
+
+   final List> 
resultFutures = new ArrayList<>(physicalSlotRequests.size());
+   for (PhysicalSlotRequest request : physicalSlotRequests) {
+   final CompletableFuture 
resultFuture =
+   allocatePhysicalSlot(request, 
timeout).thenApply(result -> {
+   slotRequestBulk.markRequestFulfilled(
+   result.getSlotRequestId(),
+   

[jira] [Updated] (FLINK-16496) Improve default flush strategy for HBase sink to make it work out-of-box

2020-06-04 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-16496:

Priority: Critical  (was: Major)

> Improve default flush strategy for HBase sink to make it work out-of-box  
> -
>
> Key: FLINK-16496
> URL: https://issues.apache.org/jira/browse/FLINK-16496
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, HBase sink provides 3 flush options:
> {code}
> 'connector.write.buffer-flush.max-size' = '2mb' -- default 2mb
> 'connector.write.buffer-flush.max-rows' = '1000' -- no default value
> 'connector.write.buffer-flush.interval' = '2s' -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for HBase sink or default 
> 1 row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126348#comment-17126348
 ] 

Yuan Mei edited comment on FLINK-17949 at 6/5/20, 3:46 AM:
---

Thanks, [~rmetzger] , I got a bit more msg this time:

It does sound like a metadata fetching problem.
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

I was wondering whether similar cases happen before while using Kafka, and how 
this situation could be avoided?

[~aljoscha]

 


was (Author: ym):
Thanks, Robert, I got a bit more msg this time:

 
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

 

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-26T13:35:19.4033188Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-26T13:35:19.4033638Z  at 
> 

[jira] [Comment Edited] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126348#comment-17126348
 ] 

Yuan Mei edited comment on FLINK-17949 at 6/5/20, 3:46 AM:
---

Thanks, [~rmetzger] , I got a bit more msg this time:

It does sound like a metadata fetching problem.
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

I was wondering whether similar cases happen before while using Kafka, and how 
this situation could be avoided?

cc [~aljoscha]

 


was (Author: ym):
Thanks, [~rmetzger] , I got a bit more msg this time:

It does sound like a metadata fetching problem.
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

I was wondering whether similar cases happen before while using Kafka, and how 
this situation could be avoided?

[~aljoscha]

 

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> 

[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box

2020-06-04 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-16495:

Fix Version/s: (was: 1.12.0)
   1.11.0

> Improve default flush strategy for Elasticsearch sink to make it work 
> out-of-box
> 
>
> Key: FLINK-16495
> URL: https://issues.apache.org/jira/browse/FLINK-16495
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / ElasticSearch, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
>  Labels: usability
> Fix For: 1.11.0
>
>
> Currently, Elasticsearch sink provides 3 flush options: 
> {code:java}
> 'connector.bulk-flush.max-actions' = '42'
> 'connector.bulk-flush.max-size' = '42 mb'
> 'connector.bulk-flush.interval' = '6'
> {code}
> All of them are optional and have no default value in Flink side [1]. But 
> flush actions and flush size have a default value {{1000}} and {{5mb}} in 
> Elasticsearch client [2]. This results in some surprising behavior that no 
> results are outputed by default, see user report [3]. Because it has to wait 
> for 1000 records however there is no so many records in the testing. 
> This will also be a potential "problem" in production. Because if it's a low 
> throughout job, soem data may take a very long time to be visible in the 
> elasticsearch. 
> In this issue, I propose to have Flink's default values for these 3 options. 
> {code:java}
> 'connector.bulk-flush.max-actions' = '1000'   -- same to the ES client 
> default value
> 'connector.bulk-flush.max-size' = '5mb'  -- same to the ES client default 
> value
> 'connector.bulk-flush.interval' = '5s'  -- avoid no output result
> {code}
> [1]: 
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356
> [2]: 
> https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html
> [3]: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhijiangW commented on a change in pull request #12493: [FLINK-18139][checkpointing] Fixing unaligned checkpoints checks wrong channels for inflight data.

2020-06-04 Thread GitBox


zhijiangW commented on a change in pull request #12493:
URL: https://github.com/apache/flink/pull/12493#discussion_r435674425



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/CheckpointBarrierUnaligner.java
##
@@ -331,6 +331,7 @@ public synchronized void 
notifyBarrierReceived(CheckpointBarrier barrier, InputC
 
@Override
public synchronized void notifyBufferReceived(Buffer buffer, 
InputChannelInfo channelInfo) {
+   LOG.error("{}: notifyBufferReceived @ {}: {}.", 
handler.taskName, channelInfo, 
storeNewBuffers[handler.getFlattenedChannelIndex(channelInfo)]);

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box

2020-06-04 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-16495:

Priority: Critical  (was: Major)

> Improve default flush strategy for Elasticsearch sink to make it work 
> out-of-box
> 
>
> Key: FLINK-16495
> URL: https://issues.apache.org/jira/browse/FLINK-16495
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / ElasticSearch, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
>  Labels: usability
> Fix For: 1.11.0
>
>
> Currently, Elasticsearch sink provides 3 flush options: 
> {code:java}
> 'connector.bulk-flush.max-actions' = '42'
> 'connector.bulk-flush.max-size' = '42 mb'
> 'connector.bulk-flush.interval' = '6'
> {code}
> All of them are optional and have no default value in Flink side [1]. But 
> flush actions and flush size have a default value {{1000}} and {{5mb}} in 
> Elasticsearch client [2]. This results in some surprising behavior that no 
> results are outputed by default, see user report [3]. Because it has to wait 
> for 1000 records however there is no so many records in the testing. 
> This will also be a potential "problem" in production. Because if it's a low 
> throughout job, soem data may take a very long time to be visible in the 
> elasticsearch. 
> In this issue, I propose to have Flink's default values for these 3 options. 
> {code:java}
> 'connector.bulk-flush.max-actions' = '1000'   -- same to the ES client 
> default value
> 'connector.bulk-flush.max-size' = '5mb'  -- same to the ES client default 
> value
> 'connector.bulk-flush.interval' = '5s'  -- avoid no output result
> {code}
> [1]: 
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356
> [2]: 
> https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html
> [3]: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhijiangW commented on a change in pull request #12493: [FLINK-18139][checkpointing] Fixing unaligned checkpoints checks wrong channels for inflight data.

2020-06-04 Thread GitBox


zhijiangW commented on a change in pull request #12493:
URL: https://github.com/apache/flink/pull/12493#discussion_r435674253



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/CheckpointBarrierUnaligner.java
##
@@ -310,16 +310,16 @@ private void notifyCheckpoint(CheckpointBarrier barrier) 
throws IOException {
public synchronized void 
notifyBarrierReceived(CheckpointBarrier barrier, InputChannelInfo channelInfo) 
throws IOException {
long barrierId = barrier.getId();
 
+   int channelIndex = 
handler.getFlattenedChannelIndex(channelInfo);

Review comment:
   irrelevant changes should be cleanup?

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/CheckpointBarrierUnaligner.java
##
@@ -310,16 +310,16 @@ private void notifyCheckpoint(CheckpointBarrier barrier) 
throws IOException {
public synchronized void 
notifyBarrierReceived(CheckpointBarrier barrier, InputChannelInfo channelInfo) 
throws IOException {
long barrierId = barrier.getId();
 
+   int channelIndex = 
handler.getFlattenedChannelIndex(channelInfo);
+   LOG.error("{}: Received barrier from channel {} ({}) @ 
{}.", handler.taskName, channelIndex, storeNewBuffers[channelIndex], barrierId);
if (currentReceivedCheckpointId < barrierId) {
handleNewCheckpoint(barrier);
handler.executeInTaskThread(() -> 
handler.notifyCheckpoint(barrier), "notifyCheckpoint");
}
 
-   int channelIndex = 
handler.getFlattenedChannelIndex(channelInfo);
if (barrierId == currentReceivedCheckpointId && 
storeNewBuffers[channelIndex]) {
-   if (LOG.isDebugEnabled()) {
-   LOG.debug("{}: Received barrier from 
channel {} @ {}.", handler.taskName, channelIndex, barrierId);
-   }
+// if (LOG.isDebugEnabled()) {

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi commented on pull request #12485: [FLINK-18130][hive][fs-connector] File name conflict for different jobs in filesystem/hive sink

2020-06-04 Thread GitBox


JingsongLi commented on pull request #12485:
URL: https://github.com/apache/flink/pull/12485#issuecomment-639241037


   CC: @gaoyunhaii @lirui-apache 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-04 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126348#comment-17126348
 ] 

Yuan Mei commented on FLINK-17949:
--

Thanks, Robert, I got a bit more msg this time:

 
{code:java}
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Got error produce response with correlation id 9 on 
topic-partition test_serde_IngestionTime-0, retrying (2147483646 attempts 
left). Error: UNKNOWN_TOPIC_OR_PARTITION
13:34:40,706 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.producer.internals.Sender [] - [Producer 
clientId=producer-7] Received unknown topic or partition error in produce 
request on partition test_serde_IngestionTime-0. The topic-partition may not 
exist or the user may not have Describe access to it
13:34:40,854 [kafka-producer-network-thread | producer-7] WARN 
org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-7] 
Error while fetching metadata with correlation id 12 : 
{test_serde_IngestionTime=LEADER_NOT_AVAILABLE}{code}
 

 

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-26T13:35:19.4033188Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-26T13:35:19.4033638Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-26T13:35:19.4034103Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-05-26T13:35:19.4034593Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-26T13:35:19.4035118Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-26T13:35:19.4035570Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-26T13:35:19.4035888Z  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17416) Flink-kubernetes doesn't work on java 8 8u252

2020-06-04 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126347#comment-17126347
 ] 

Yang Wang commented on FLINK-17416:
---

Fixed in FLINK-17565.

> Flink-kubernetes doesn't work on java 8 8u252
> -
>
> Key: FLINK-17416
> URL: https://issues.apache.org/jira/browse/FLINK-17416
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.11.0
>Reporter: wangxiyuan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
> Attachments: log.k8s.session.8u252
>
>
> When using java-8-8u252 version, the flink container end-to-end failed. The 
> test  `Running 'Run kubernetes session test'` fails with the `Broken pipe` 
> error.
> See:
> [https://logs.openlabtesting.org/logs/periodic-20-flink-mail/github.com/apache/flink/master/flink-end-to-end-test-arm64-container/fcfdd47/job-output.txt.gz]
>  
> Flink Azure CI doesn't hit this problem because it runs under jdk-8-8u242
>  
> The reason is that the okhttp library which flink using doesn't work on 
> java-8-8u252:
> [https://github.com/square/okhttp/issues/5970]
>  
> The problem has been with the PR:
> [https://github.com/square/okhttp/pull/5977]
>  
> Maybe we can wait for a new 3.12.x release and bump the okhttp version in 
> Flink later.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17416) Flink-kubernetes doesn't work on java 8 8u252

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-17416:
--
Release Note:   (was: Fixed in FLINK-17565.)

> Flink-kubernetes doesn't work on java 8 8u252
> -
>
> Key: FLINK-17416
> URL: https://issues.apache.org/jira/browse/FLINK-17416
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.11.0
>Reporter: wangxiyuan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
> Attachments: log.k8s.session.8u252
>
>
> When using java-8-8u252 version, the flink container end-to-end failed. The 
> test  `Running 'Run kubernetes session test'` fails with the `Broken pipe` 
> error.
> See:
> [https://logs.openlabtesting.org/logs/periodic-20-flink-mail/github.com/apache/flink/master/flink-end-to-end-test-arm64-container/fcfdd47/job-output.txt.gz]
>  
> Flink Azure CI doesn't hit this problem because it runs under jdk-8-8u242
>  
> The reason is that the okhttp library which flink using doesn't work on 
> java-8-8u252:
> [https://github.com/square/okhttp/issues/5970]
>  
> The problem has been with the PR:
> [https://github.com/square/okhttp/pull/5977]
>  
> Maybe we can wait for a new 3.12.x release and bump the okhttp version in 
> Flink later.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17416) Flink-kubernetes doesn't work on java 8 8u252

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-17416.
-
Release Note: Fixed in FLINK-17565.
  Resolution: Fixed

> Flink-kubernetes doesn't work on java 8 8u252
> -
>
> Key: FLINK-17416
> URL: https://issues.apache.org/jira/browse/FLINK-17416
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.11.0
>Reporter: wangxiyuan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
> Attachments: log.k8s.session.8u252
>
>
> When using java-8-8u252 version, the flink container end-to-end failed. The 
> test  `Running 'Run kubernetes session test'` fails with the `Broken pipe` 
> error.
> See:
> [https://logs.openlabtesting.org/logs/periodic-20-flink-mail/github.com/apache/flink/master/flink-end-to-end-test-arm64-container/fcfdd47/job-output.txt.gz]
>  
> Flink Azure CI doesn't hit this problem because it runs under jdk-8-8u242
>  
> The reason is that the okhttp library which flink using doesn't work on 
> java-8-8u252:
> [https://github.com/square/okhttp/issues/5970]
>  
> The problem has been with the PR:
> [https://github.com/square/okhttp/pull/5977]
>  
> Maybe we can wait for a new 3.12.x release and bump the okhttp version in 
> Flink later.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17566) Fix potential K8s resources leak after JobManager finishes in Applicaion mode

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-17566.
-
Resolution: Fixed

> Fix potential K8s resources leak after JobManager finishes in Applicaion mode
> -
>
> Key: FLINK-17566
> URL: https://issues.apache.org/jira/browse/FLINK-17566
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Priority: Major
>
> FLINK-10934 introduces applicaion mode support in the native K8s setups., but 
> as the discussion in 
> [https://github.com/apache/flink/pull/12003|https://github.com/apache/flink/pull/12003,],
>  there's large probability that all the K8s resources leak after the 
> JobManager finishes except that the replica of Deployment is scaled down to 
> 0. We need to find out the root cause and fix it.
> This may be related to the way fabric8 SDK deletes a Deployment. It splits 
> the procedure into three steps as follows:
>  # Scales down the replica to 0
>  # Wait until the scaling down succeed
>  # Delete the ReplicaSet
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17566) Fix potential K8s resources leak after JobManager finishes in Applicaion mode

2020-06-04 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126345#comment-17126345
 ] 

Yang Wang commented on FLINK-17566:
---

Fixed in FLINK-17565.

> Fix potential K8s resources leak after JobManager finishes in Applicaion mode
> -
>
> Key: FLINK-17566
> URL: https://issues.apache.org/jira/browse/FLINK-17566
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Priority: Major
>
> FLINK-10934 introduces applicaion mode support in the native K8s setups., but 
> as the discussion in 
> [https://github.com/apache/flink/pull/12003|https://github.com/apache/flink/pull/12003,],
>  there's large probability that all the K8s resources leak after the 
> JobManager finishes except that the replica of Deployment is scaled down to 
> 0. We need to find out the root cause and fix it.
> This may be related to the way fabric8 SDK deletes a Deployment. It splits 
> the procedure into three steps as follows:
>  # Scales down the replica to 0
>  # Wait until the scaling down succeed
>  # Delete the ReplicaSet
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17416) Flink-kubernetes doesn't work on java 8 8u252

2020-06-04 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-17416:
--
Fix Version/s: 1.12.0

> Flink-kubernetes doesn't work on java 8 8u252
> -
>
> Key: FLINK-17416
> URL: https://issues.apache.org/jira/browse/FLINK-17416
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.11.0
>Reporter: wangxiyuan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
> Attachments: log.k8s.session.8u252
>
>
> When using java-8-8u252 version, the flink container end-to-end failed. The 
> test  `Running 'Run kubernetes session test'` fails with the `Broken pipe` 
> error.
> See:
> [https://logs.openlabtesting.org/logs/periodic-20-flink-mail/github.com/apache/flink/master/flink-end-to-end-test-arm64-container/fcfdd47/job-output.txt.gz]
>  
> Flink Azure CI doesn't hit this problem because it runs under jdk-8-8u242
>  
> The reason is that the okhttp library which flink using doesn't work on 
> java-8-8u252:
> [https://github.com/square/okhttp/issues/5970]
>  
> The problem has been with the PR:
> [https://github.com/square/okhttp/pull/5977]
>  
> Maybe we can wait for a new 3.12.x release and bump the okhttp version in 
> Flink later.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17883) Unable to configure write mode for FileSystem() connector in PyFlink

2020-06-04 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-17883.
---
Resolution: Won't Fix

[~rmetzger] Thanks for creating this ticket. I'm closing it as it should 
already be available in 1.11. Feel free to reopen this ticket if you feel that 
the behavior is not as expected.

> Unable to configure write mode for FileSystem() connector in PyFlink
> 
>
> Key: FLINK-17883
> URL: https://issues.apache.org/jira/browse/FLINK-17883
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.10.1
>Reporter: Robert Metzger
>Assignee: Nicholas Jiang
>Priority: Major
>
> As a user of PyFlink, I'm getting the following exception:
> {code}
> File or directory /tmp/output already exists. Existing files and directories 
> are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite 
> existing files and directories.
> {code}
> I would like to be able to configure writeMode = OVERWRITE for the FileSystem 
> connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18122) Kubernetes test fails with "error: timed out waiting for the condition on jobs/flink-job-cluster"

2020-06-04 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126337#comment-17126337
 ] 

Yang Wang commented on FLINK-18122:
---

I agree that we should {{exit}} when build the image failed.

> Kubernetes test fails with "error: timed out waiting for the condition on 
> jobs/flink-job-cluster"
> -
>
> Key: FLINK-18122
> URL: https://issues.apache.org/jira/browse/FLINK-18122
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Tests
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2697=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> {code}
> 2020-06-04T09:25:40.7205843Z service/flink-job-cluster created
> 2020-06-04T09:25:40.9661515Z job.batch/flink-job-cluster created
> 2020-06-04T09:25:41.2189123Z deployment.apps/flink-task-manager created
> 2020-06-04T10:32:32.6402983Z error: timed out waiting for the condition on 
> jobs/flink-job-cluster
> 2020-06-04T10:32:33.8057757Z error: unable to upgrade connection: container 
> not found ("flink-task-manager")
> 2020-06-04T10:32:33.8111302Z sort: cannot read: 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*':
>  No such file or directory
> 2020-06-04T10:32:33.8124455Z FAIL WordCount: Output hash mismatch.  Got 
> d41d8cd98f00b204e9800998ecf8427e, expected e682ec6622b5e83f2eb614617d5ab2cf.
> 2020-06-04T10:32:33.8125379Z head hexdump of actual:
> 2020-06-04T10:32:33.8136133Z head: cannot open 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*'
>  for reading: No such file or directory
> 2020-06-04T10:32:33.8344715Z Debugging failed Kubernetes test:
> 2020-06-04T10:32:33.8345469Z Currently existing Kubernetes resources
> 2020-06-04T10:32:36.4977853Z I0604 10:32:36.497383   13191 request.go:621] 
> Throttling request took 1.198606989s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:46.6975735Z I0604 10:32:46.697234   13191 request.go:621] 
> Throttling request took 4.398107353s, request: 
> GET:https://10.1.0.4:8443/apis/authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:57.4978637Z I0604 10:32:57.497209   13191 request.go:621] 
> Throttling request took 1.198449167s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:07.4980104Z I0604 10:33:07.497320   13191 request.go:621] 
> Throttling request took 4.198274438s, request: 
> GET:https://10.1.0.4:8443/apis/apiextensions.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:18.4976060Z I0604 10:33:18.497258   13191 request.go:621] 
> Throttling request took 1.19871495s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:28.4979129Z I0604 10:33:28.497276   13191 request.go:621] 
> Throttling request took 4.198369672s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:30.9182069Z NAME READY   
> STATUS  RESTARTS   AGE
> 2020-06-04T10:33:30.9184099Z pod/flink-job-cluster-dtb67  0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9184869Z pod/flink-task-manager-74ccc9bd9-psqwm   0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9185226Z 
> 2020-06-04T10:33:30.9185926Z NAMETYPE
> CLUSTER-IP  EXTERNAL-IP   PORT(S) 
>   AGE
> 2020-06-04T10:33:30.9186832Z service/flink-job-cluster   NodePort
> 10.111.92.199   
> 6123:32501/TCP,6124:31360/TCP,6125:30025/TCP,8081:30081/TCP   67m
> 2020-06-04T10:33:30.9187545Z service/kubernetes  ClusterIP   
> 10.96.0.1   443/TCP 
>   68m
> 2020-06-04T10:33:30.9187976Z 
> 2020-06-04T10:33:30.9188472Z NAME READY   
> UP-TO-DATE   AVAILABLE   AGE
> 2020-06-04T10:33:30.9189179Z deployment.apps/flink-task-manager   0/1 1   
>  0   67m
> 2020-06-04T10:33:30.9189508Z 
> 2020-06-04T10:33:30.9189815Z NAME   
> DESIRED   CURRENT   READY   AGE
> 2020-06-04T10:33:30.9190418Z replicaset.apps/flink-task-manager-74ccc9bd9   1 
> 1 0   67m
> 2020-06-04T10:33:30.9190662Z 
> 2020-06-04T10:33:30.9190891Z NAME  COMPLETIONS   
> DURATION   AGE
> 2020-06-04T10:33:30.9191423Z job.batch/flink-job-cluster   0/1   67m  
>   67m
> 2020-06-04T10:33:33.7840921Z 

[jira] [Comment Edited] (FLINK-18122) Kubernetes test fails with "error: timed out waiting for the condition on jobs/flink-job-cluster"

2020-06-04 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126337#comment-17126337
 ] 

Yang Wang edited comment on FLINK-18122 at 6/5/20, 3:17 AM:


I agree that we should {{exit}} when building the image failed.


was (Author: fly_in_gis):
I agree that we should {{exit}} when build the image failed.

> Kubernetes test fails with "error: timed out waiting for the condition on 
> jobs/flink-job-cluster"
> -
>
> Key: FLINK-18122
> URL: https://issues.apache.org/jira/browse/FLINK-18122
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Tests
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2697=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> {code}
> 2020-06-04T09:25:40.7205843Z service/flink-job-cluster created
> 2020-06-04T09:25:40.9661515Z job.batch/flink-job-cluster created
> 2020-06-04T09:25:41.2189123Z deployment.apps/flink-task-manager created
> 2020-06-04T10:32:32.6402983Z error: timed out waiting for the condition on 
> jobs/flink-job-cluster
> 2020-06-04T10:32:33.8057757Z error: unable to upgrade connection: container 
> not found ("flink-task-manager")
> 2020-06-04T10:32:33.8111302Z sort: cannot read: 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*':
>  No such file or directory
> 2020-06-04T10:32:33.8124455Z FAIL WordCount: Output hash mismatch.  Got 
> d41d8cd98f00b204e9800998ecf8427e, expected e682ec6622b5e83f2eb614617d5ab2cf.
> 2020-06-04T10:32:33.8125379Z head hexdump of actual:
> 2020-06-04T10:32:33.8136133Z head: cannot open 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*'
>  for reading: No such file or directory
> 2020-06-04T10:32:33.8344715Z Debugging failed Kubernetes test:
> 2020-06-04T10:32:33.8345469Z Currently existing Kubernetes resources
> 2020-06-04T10:32:36.4977853Z I0604 10:32:36.497383   13191 request.go:621] 
> Throttling request took 1.198606989s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:46.6975735Z I0604 10:32:46.697234   13191 request.go:621] 
> Throttling request took 4.398107353s, request: 
> GET:https://10.1.0.4:8443/apis/authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:57.4978637Z I0604 10:32:57.497209   13191 request.go:621] 
> Throttling request took 1.198449167s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:07.4980104Z I0604 10:33:07.497320   13191 request.go:621] 
> Throttling request took 4.198274438s, request: 
> GET:https://10.1.0.4:8443/apis/apiextensions.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:18.4976060Z I0604 10:33:18.497258   13191 request.go:621] 
> Throttling request took 1.19871495s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:28.4979129Z I0604 10:33:28.497276   13191 request.go:621] 
> Throttling request took 4.198369672s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:30.9182069Z NAME READY   
> STATUS  RESTARTS   AGE
> 2020-06-04T10:33:30.9184099Z pod/flink-job-cluster-dtb67  0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9184869Z pod/flink-task-manager-74ccc9bd9-psqwm   0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9185226Z 
> 2020-06-04T10:33:30.9185926Z NAMETYPE
> CLUSTER-IP  EXTERNAL-IP   PORT(S) 
>   AGE
> 2020-06-04T10:33:30.9186832Z service/flink-job-cluster   NodePort
> 10.111.92.199   
> 6123:32501/TCP,6124:31360/TCP,6125:30025/TCP,8081:30081/TCP   67m
> 2020-06-04T10:33:30.9187545Z service/kubernetes  ClusterIP   
> 10.96.0.1   443/TCP 
>   68m
> 2020-06-04T10:33:30.9187976Z 
> 2020-06-04T10:33:30.9188472Z NAME READY   
> UP-TO-DATE   AVAILABLE   AGE
> 2020-06-04T10:33:30.9189179Z deployment.apps/flink-task-manager   0/1 1   
>  0   67m
> 2020-06-04T10:33:30.9189508Z 
> 2020-06-04T10:33:30.9189815Z NAME   
> DESIRED   CURRENT   READY   AGE
> 2020-06-04T10:33:30.9190418Z replicaset.apps/flink-task-manager-74ccc9bd9   1 
> 1 0   67m
> 2020-06-04T10:33:30.9190662Z 
> 2020-06-04T10:33:30.9190891Z NAME  COMPLETIONS   

[GitHub] [flink] wuchong commented on a change in pull request #12303: [FLINK-17625] [table] Fix ArrayIndexOutOfBoundsException in AppendOnlyTopNFunction

2020-06-04 Thread GitBox


wuchong commented on a change in pull request #12303:
URL: https://github.com/apache/flink/pull/12303#discussion_r435667861



##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/operators/rank/AppendOnlyTopNFunction.java
##
@@ -210,14 +210,19 @@ private void processElementWithoutRowNumber(RowData 
input, Collector ou
RowData lastKey = lastEntry.getKey();
List lastList = (List) 
lastEntry.getValue();
// remove last one
-   RowData lastElement = lastList.remove(lastList.size() - 
1);
-   if (lastList.isEmpty()) {
+   int size = lastList.size();
+   RowData lastElement = null;
+   if (size > 0) {
+   lastElement = lastList.get(size - 1);

Review comment:
   Yes. I think we should improve `TopNBuffer#removeLast`, we can call 
`remove(index)` and `get(index)` when it is a `List`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17883) Unable to configure write mode for FileSystem() connector in PyFlink

2020-06-04 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126336#comment-17126336
 ] 

Dian Fu commented on FLINK-17883:
-

I have verified the following interface defined in StatementSet:
{code}
StatementSet addInsert(String targetPath, Table table, boolean overwrite);
{code}

Note: currently only FileSystemTableSink and HiveTableSink supports `overwrite`.

> Unable to configure write mode for FileSystem() connector in PyFlink
> 
>
> Key: FLINK-17883
> URL: https://issues.apache.org/jira/browse/FLINK-17883
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.10.1
>Reporter: Robert Metzger
>Assignee: Nicholas Jiang
>Priority: Major
>
> As a user of PyFlink, I'm getting the following exception:
> {code}
> File or directory /tmp/output already exists. Existing files and directories 
> are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite 
> existing files and directories.
> {code}
> I would like to be able to configure writeMode = OVERWRITE for the FileSystem 
> connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12495: [FLINK-18048][Backport 1.10] Fix --host option for standalone job cluster

2020-06-04 Thread GitBox


flinkbot commented on pull request #12495:
URL: https://github.com/apache/flink/pull/12495#issuecomment-639233022


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 6589b314827707fe2b20b00c4f14936b4adb4474 (Fri Jun 05 
03:15:26 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 opened a new pull request #12495: [FLINK-18048] Fix --host option for standalone job cluster

2020-06-04 Thread GitBox


wangyang0918 opened a new pull request #12495:
URL: https://github.com/apache/flink/pull/12495


   Backport #12426 to release-1.10 branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12488: [FLINK-18126][python] Correct the exception handling of the Python CompletableFuture

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12488:
URL: https://github.com/apache/flink/pull/12488#issuecomment-638911342


   
   ## CI report:
   
   * b0ec0676015c04ee58c9aa38b9733f42a99e1997 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2765)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12435: [FLINK-18059] [sql-client] Fix create/drop catalog statement can not be executed in sql client

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12435:
URL: https://github.com/apache/flink/pull/12435#issuecomment-637372360


   
   ## CI report:
   
   * 84524977ee6597667093943fce7c1730ab934c35 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2732)
 
   * 2014fd80a4ecd8d3a43c30e64ed7506e571e0b19 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2772)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12355: [FLINK-17893] [sql-client] SQL CLI should print the root cause if the statement is invalid

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12355:
URL: https://github.com/apache/flink/pull/12355#issuecomment-634548726


   
   ## CI report:
   
   * e733198f517f74de94d68622a3a039d3f888acaa Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2631)
 
   * 35be5d6088100c8e99655061e13161b7de9b5173 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2771)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12456: [FLINK-17113][sql-cli] Refactor view support in SQL Client

2020-06-04 Thread GitBox


flinkbot edited a comment on pull request #12456:
URL: https://github.com/apache/flink/pull/12456#issuecomment-638032908


   
   ## CI report:
   
   * fefdae96b41549bc9d2c75e126c3a06da6863cfb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2764)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-15126) migrate "show functions" from sql cli to sql parser

2020-06-04 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed FLINK-15126.
--
Resolution: Fixed

> migrate "show functions" from sql cli to sql parser
> ---
>
> Key: FLINK-15126
> URL: https://issues.apache.org/jira/browse/FLINK-15126
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Client, Table SQL / Planner
>Reporter: Bowen Li
>Assignee: Zhenqiu Huang
>Priority: Major
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   >