[GitHub] [flink] flinkbot edited a comment on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12702:
URL: https://github.com/apache/flink/pull/12702#issuecomment-645869341


   
   ## CI report:
   
   * d7e9bf3b59e41d5b8f70b8700733c32f851f75a3 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3858)
 
   * 135f96e93369341df6be18a1e2df23c20c7434f3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4042)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12702:
URL: https://github.com/apache/flink/pull/12702#issuecomment-645869341


   
   ## CI report:
   
   * d7e9bf3b59e41d5b8f70b8700733c32f851f75a3 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3858)
 
   * 135f96e93369341df6be18a1e2df23c20c7434f3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Aaaaaaron commented on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.

2020-06-25 Thread GitBox


Aaron commented on pull request #12702:
URL: https://github.com/apache/flink/pull/12702#issuecomment-649957192


   Hi, can someone review this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>

2020-06-25 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145991#comment-17145991
 ] 

Yuan Mei commented on FLINK-17949:
--

Based on the failure reported in this Jira, the failure occurs amongst three 
tests: `testSerDeIngestionTime`, `testSerDeEventTime` and 
`testSerDeProcessingTime`; they share the same code path of `testRecordSerDe`.

So I extract these three tests (based on master 0c20f25fd5a4351) to re-run on 
my local azure.

*I’ve run more than 3000 tests for about 11 hours in total without a single 
test failure on azure.*

Here is a bit more detail:
 * 200 runs; with 3 tests mentioned above; finished in 3 hours; total 600 tests
[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=39=results]
 * 270 runs; with 3 tests mentioned above; canceled after 4 hours by azure; 
total 810 tests
[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=40=results]
 * 149 runs; with all the original tests in KafkaShuffleITCase; canceled after 
4 hours by azure; total 1639 tests
[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=41=results]

Besides, this ticket seems not to report any new failure cases after 6/10, so 
it is possible that the fix of [FLINK-17327] Fix Kafka Producer Resource Leaks 
#12589 (merged around 6/10) also fix this? Kafka versions are also bumped in 
[FLINK-17327].

I guess this should provide enough confidence to say that KafkaShuffle test 
cases are stable now. 

If the ticket does not report any new failures, I will close this ticket.

> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 
> expected:<310> but was:<0>
> -
>
> Key: FLINK-17949
> URL: https://issues.apache.org/jira/browse/FLINK-17949
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: logs-ci-kafkagelly-1590500380.zip, 
> logs-ci-kafkagelly-1590524911.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-05-26T13:35:19.4022562Z [ERROR] 
> testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase)
>   Time elapsed: 5.786 s  <<< FAILURE!
> 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but 
> was:<0>
> 2020-05-26T13:35:19.4023498Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-05-26T13:35:19.4023825Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-05-26T13:35:19.4024461Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-05-26T13:35:19.4024900Z  at 
> org.junit.Assert.assertEquals(Assert.java:631)
> 2020-05-26T13:35:19.4028546Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388)
> 2020-05-26T13:35:19.4029629Z  at 
> org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156)
> 2020-05-26T13:35:19.4030253Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-26T13:35:19.4030673Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-26T13:35:19.4031332Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-26T13:35:19.4031763Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-26T13:35:19.4032155Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-26T13:35:19.4032630Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-26T13:35:19.4033188Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-26T13:35:19.4033638Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-26T13:35:19.4034103Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-05-26T13:35:19.4034593Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-26T13:35:19.4035118Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-26T13:35:19.4035570Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-26T13:35:19.4035888Z  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-web] tweise commented on a change in pull request #352: Add 1.11 Release announcement.

2020-06-25 Thread GitBox


tweise commented on a change in pull request #352:
URL: https://github.com/apache/flink-web/pull/352#discussion_r445853688



##
File path: _posts/2020-06-29-release-1.11.0.md
##
@@ -0,0 +1,299 @@
+---
+layout: post 
+title:  "Apache Flink 1.11.0 Release Announcement" 
+date: 2020-06-29T08:00:00.000Z
+categories: news
+authors:
+- morsapaes:
+  name: "Marta Paes"
+  twitter: "morsapaes"
+
+excerpt: The Apache Flink community is proud to announce the release of Flink 
1.11.0! This release marks the first milestone in realizing a new vision for 
fault tolerance in Flink, and adds a handful of new features that simplify (and 
unify) Flink handling across the API stack. In particular for users of the 
Table API/SQL, this release introduces significant improvements to usability 
and opens up completely new use cases, including the much-anticipated support 
for Change Data Capture (CDC)! A great deal of effort has also gone into 
optimizing PyFlink and ensuring that its functionality is available to a 
broader set of Flink users.
+---
+
+The Apache Flink community is proud to announce the release of Flink 1.11.0! 
This release marks the first milestone in realizing a new vision for fault 
tolerance in Flink, and adds a handful of new features that simplify (and 
unify) Flink handling across the API stack. In particular for users of the 
Table API/SQL, this release introduces significant improvements to usability 
and opens up completely new use cases, including the much-anticipated support 
for Change Data Capture (CDC)! A great deal of effort has also gone into 
optimizing PyFlink and ensuring that its functionality is available to a 
broader set of Flink users.
+
+This blog post describes all major new features and improvements, important 
changes to be aware of and what to expect moving forward.
+
+{% toc %}
+
+The binary distribution and source artifacts are now available on the updated 
[Downloads page]({{ site.baseurl }}/downloads.html) of the Flink website, and 
the most recent distribution of PyFlink is available on 
[PyPI](https://pypi.org/project/apache-flink/). For more details, check the 
complete [release 
changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346364=Html=12315522)
 and the [updated documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/flink-docs-release-1.11/). 
+
+We encourage you to download the release and share your feedback with the 
community through the [Flink mailing 
lists](https://flink.apache.org/community.html#mailing-lists) or 
[JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## New Features and Improvements
+
+### Unaligned Checkpoints (Beta)
+
+Triggering a checkpoint in Flink will cause a [checkpoint barrier]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/internals/stream_checkpointing.html#barriers) to flow 
from the sources of your topology all the way towards the sinks. For operators 
that receive more than one input stream, the barriers flowing through each 
channel need to be aligned before the operator can snapshot its state and 
forward the checkpoint barrier — typically, this alignment will take just a few 
milliseconds to complete, but it can become a bottleneck in backpressured 
pipelines as:
+
+ * Checkpoint barriers will flow much slower through backpressured channels, 
effectively blocking the remaining channels and their upstream operators during 
checkpointing;
+
+ * Slow checkpoint barrier propagation leads to longer checkpointing times and 
can, worst case, result in little to no progress in the application.
+
+To improve the performance of checkpointing under backpressure scenarios, the 
community is rolling out the first iteration of unaligned checkpoints 
([FLIP-76](https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints))
 with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1), 
this approach doesn’t wait for barrier alignment across input channels, instead 
allowing barriers to overtake in-flight records and forwarding them downstream 
before the synchronous part of the checkpoint takes place (Fig. 2).
+
+
+
+
+
+
+  
+
+  
+   
+   
+   Fig.1: Aligned 
Checkpoints
+ 
+
+  
+  
+
+  
+   
+   
+   Fig.2: Unaligned 
Checkpoints
+ 
+
+  
+
+
+
+
+
+
+Because in-flight records have to be persisted as part of the snapshot, 
unaligned checkpoints will lead to increased checkpoints sizes. On the upside, 
**checkpointing times are heavily reduced**, so users will see more progress 
(even in unstable environments) as more up-to-date checkpoints will lighten the 
recovery process. You can learn more about the current limitations of unaligned 
checkpoints in the [documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/concepts/stateful-stream-processing.html#unaligned-checkpointing),
 and track the improvement 

[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859


   
   ## CI report:
   
   * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4038)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859


   
   ## CI report:
   
   * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4038)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-25 Thread GitBox


flinkbot commented on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859


   
   ## CI report:
   
   * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022


   
   ## CI report:
   
   * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4035)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-25 Thread GitBox


flinkbot commented on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649743424


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 (Thu Jun 25 
18:22:35 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18200) Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18200:
---
Labels: pull-request-available  (was: )

> Replace the deprecated interfaces with the new interfaces in the tests and 
> examples
> ---
>
> Key: FLINK-18200
> URL: https://issues.apache.org/jira/browse/FLINK-18200
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently, there are a few deprecated interfaces which are still heavily used 
> in the tests and examples, e.g. register_function, etc. We should improve the 
> tests and examples to use the new interfaces, such as 
> create_temporary_system_function instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] SteNicholas opened a new pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-25 Thread GitBox


SteNicholas opened a new pull request #12770:
URL: https://github.com/apache/flink/pull/12770


   ## What is the purpose of the change
   
   *Currently, there are a few deprecated interfaces which are still heavily 
used in the tests and examples, e.g. register_function, etc. Deprecated 
interfaces should be replaced to improve the tests and examples to use the new 
interfaces, such as create_temporary_system_function instead.*
   
   ## Brief change log
   
 - *Modity tests and examples including calling method `insert_into`, 
`scan`, `sql_update`, `register_function`, `register_java_function` of 
`TableEnvironment`.*
 - *Modity tests and examples including calling method `insert_into` of 
`Table`.*
 - *Deprecate method `from_table_source`, `_from_elements`, `connect` of 
`TableEnvironment`.*
   
   ## Verifying this change
   
 - *Modity unit tests including calling method `insert_into`, `scan`, 
`sql_update`, `register_function`, `register_java_function` of 
`TableEnvironment`.*
 - *Modity unit tests including calling method `insert_into` of `Table`.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145692#comment-17145692
 ] 

Stephan Ewen edited comment on FLINK-18433 at 6/25/20, 5:51 PM:


I would not dismiss the runtime code, yet.

The fact that it is both in MemStateBackend and RocksDBStateBackend would be 
quite well explained with more frequent stalls, for example during buffer 
allocation.

If there are fewer buffers by default, or a different buffer 
allocation/acquisition code path now, I'd consider that a strong suspect.


was (Author: stephanewen):
I would not dismiss the runtime code so far.

The fact that it is both in MemStateBackend and RocksDBStateBackend would be 
quite well explained with more frequent stalls, for example during buffer 
allocation.

If there are fewer buffers by default, or a different buffer 
allocation/acquisition code path now, I'd consider that a strong suspect.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> 

[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145692#comment-17145692
 ] 

Stephan Ewen edited comment on FLINK-18433 at 6/25/20, 5:51 PM:


I would not dismiss the runtime code, yet.

The fact that it is both in MemStateBackend and RocksDBStateBackend would be 
quite well explained with more frequent stalls, for example during buffer 
allocation.

If there are fewer buffers by default, or a different buffer 
allocation/acquisition code path now, that would be a plausible cause.


was (Author: stephanewen):
I would not dismiss the runtime code, yet.

The fact that it is both in MemStateBackend and RocksDBStateBackend would be 
quite well explained with more frequent stalls, for example during buffer 
allocation.

If there are fewer buffers by default, or a different buffer 
allocation/acquisition code path now, I'd consider that a strong suspect.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> 

[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145698#comment-17145698
 ] 

Stephan Ewen commented on FLINK-18433:
--

Other sources of stalls (like GC) of course as well.
I don't recall us changing anything on the memory configuration.
If we have more object allocation on the per-record (possibly per buffer) code 
paths, it would also be an explanation.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145692#comment-17145692
 ] 

Stephan Ewen commented on FLINK-18433:
--

I would not dismiss the runtime code so far.

The fact that it is both in MemStateBackend and RocksDBStateBackend would be 
quite well explained with more frequent stalls, for example during buffer 
allocation.

If there are fewer buffers by default, or a different buffer 
allocation/acquisition code path now, I'd consider that a strong suspect.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce 

[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * 265d6eb7970325c88d2b3a9c77fa308be89641a9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4037)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'

2020-06-25 Thread David Anderson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145131#comment-17145131
 ] 

David Anderson commented on FLINK-18422:


[~RocMarshal] Nevermind my previous reply; I was mistaken. Assigning this to 
you.

> Update Prefer tag in documentation 'Fault Tolerance training lesson'
> 
>
> Key: FLINK-18422
> URL: https://issues.apache.org/jira/browse/FLINK-18422
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Documentation / Training
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Roc Marshal
>Priority: Minor
>  Labels: document, easyfix
> Attachments: current_prefer_mode.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Update Prefer tag in documentation 'Fault Tolerance training lesson' 
> according to 
> [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html].
>   
> The location is: docs/learn-flink/fault_tolerance.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'

2020-06-25 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson reassigned FLINK-18422:
--

Assignee: Roc Marshal

> Update Prefer tag in documentation 'Fault Tolerance training lesson'
> 
>
> Key: FLINK-18422
> URL: https://issues.apache.org/jira/browse/FLINK-18422
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Documentation / Training
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Roc Marshal
>Assignee: Roc Marshal
>Priority: Minor
>  Labels: document, easyfix
> Attachments: current_prefer_mode.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Update Prefer tag in documentation 'Fault Tolerance training lesson' 
> according to 
> [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html].
>   
> The location is: docs/learn-flink/fault_tolerance.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'

2020-06-25 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-18422:
---
Comment: was deleted

(was: [~RocMarshal] The link tag does not apply to the cases you have 
highlighted. It's only used for creating links, not for embedded images.)

> Update Prefer tag in documentation 'Fault Tolerance training lesson'
> 
>
> Key: FLINK-18422
> URL: https://issues.apache.org/jira/browse/FLINK-18422
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Documentation / Training
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Roc Marshal
>Priority: Minor
>  Labels: document, easyfix
> Attachments: current_prefer_mode.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Update Prefer tag in documentation 'Fault Tolerance training lesson' 
> according to 
> [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html].
>   
> The location is: docs/learn-flink/fault_tolerance.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'

2020-06-25 Thread David Anderson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145127#comment-17145127
 ] 

David Anderson commented on FLINK-18422:


[~RocMarshal] The link tag does not apply to the cases you have highlighted. 
It's only used for creating links, not for embedded images.

> Update Prefer tag in documentation 'Fault Tolerance training lesson'
> 
>
> Key: FLINK-18422
> URL: https://issues.apache.org/jira/browse/FLINK-18422
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Documentation / Training
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Roc Marshal
>Priority: Minor
>  Labels: document, easyfix
> Attachments: current_prefer_mode.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Update Prefer tag in documentation 'Fault Tolerance training lesson' 
> according to 
> [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html].
>   
> The location is: docs/learn-flink/fault_tolerance.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022


   
   ## CI report:
   
   * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034)
 
   * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4035)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * a4c5e8c756f9380734727742390ce88d7bc47372 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4036)
 
   * 265d6eb7970325c88d2b3a9c77fa308be89641a9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4037)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17075) Add task status reconciliation between TM and JM

2020-06-25 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145062#comment-17145062
 ] 

Chesnay Schepler commented on FLINK-17075:
--

[~trohrmann] yes, this was just a mistake while copying; 
{{ExecutionIdsProvider#getExecutions}} accepts a ResourceID, as does 
{{ExecutionDeploymentTracker#startTracking}}.
I've amended by comment accordingly,

> Add task status reconciliation between TM and JM
> 
>
> Key: FLINK-17075
> URL: https://issues.apache.org/jira/browse/FLINK-17075
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Assignee: Chesnay Schepler
>Priority: Critical
> Fix For: 1.10.2, 1.12.0, 1.11.1
>
>
> In order to harden the TM and JM communication I suggest to let the 
> {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of 
> the heartbeat payload (similar to FLINK-11059). This would allow to reconcile 
> the states of both components in case that a status update message was lost 
> as described by a user on the ML.
> https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17075) Add task status reconciliation between TM and JM

2020-06-25 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143818#comment-17143818
 ] 

Chesnay Schepler edited comment on FLINK-17075 at 6/25/20, 4:14 PM:


h3. Problem description:

The {{JobMaster}} keeps track of the state of all {{Executions}} belonging to a 
job. After deployment, this tracking relies on updates from the 
{{TaskExecutors}}, transmitted via dedicated RPC messages.
If one such message is lost then the tracked state may no longer match the 
actual one. In the worst case this prevents an {{Execution}} from ever reaching 
a terminal state, which in turn prevents the job from terminating.

h3. Proposed solution:

To prevent the worst case from happening, we propose that the {{TaskExecutor}} 
also submits a report of all currently deployed {{Tasks}} (identified by the 
{{ExecutionAttemptID}}) with each heartbeat. This allows us to detect 
discrepancies between the set of executions of the {{JobManager}} and 
{{TaskExecutor}}, and act accordingly.

This in the end boils down to a comparison of 2 {{Set}}.

If an execution exists only in the {{JobMaster}} set, then the execution was 
dropped by the {{TaskExecutor}}.
This could imply a loss of a terminal state transition.
We cannot determine which terminal state the task has reached, since all 
information was already cleaned up.
In this case we will fail the execution in the {{ExecutionGraph}}, typically 
resulting in a restart.

If an execution exists only in the {{TaskExecutor}} set, then some leftover 
task from a previous attempt is still running on the {{TaskExecutor}}.
In this case we will cancel the task on the {{TaskExecutor}}. Running jobs are 
unaffected.

If an execution exists in both sets, then we don't do anything.

h4. Required changes:

{{TaskExecutor}}
--

The existing {{TaskSlotTable}} supports iterating over all {{Tasks}} for a 
given {{JobID}}, allowing us to extract the {{ExecutionAttemptID}}.
>From this we generate a {{Set}}, and submit it via 
>heartbeats.


{{JobMaster}}
--

Here we need to be able to:
a) (un)track actually deployed {{Executions}}
c) cancel tasks on the {{TaskExecutor}}
d) fail tasks in the {{ExecutionGraph}}

These capabilities are split across 2 new components:
1) ExecutionDeploymentTracker
2) ExecutionDeploymentReconciler

1) The tracker lives in the Scheduler, with the following interface:

{code}
public interface ExecutionDeploymentTracker {
void startTrackingDeployment(ExecutionAttemptID deployment, ResourceID 
host);

void stopTrackingDeployment(ExecutionAttemptID deployment);

Set getExecutions(ResourceID host);
{code}

It's basically a {{Set}}.

The tracker is notified by the {{ExecutionGraph}} of deployed/finished 
executions through 2 new listeners:

{code}
public interface ExecutionDeploymentListener {
void onCompletedDeployment(ExecutionAttemptID execution);
}

public interface ExecutionStateUpdateListener {
void onStateUpdate(ExecutionAttemptID execution, ExecutionState 
newState);
}
{code}

{{onCompletedDeployment}} is called in {{Execution#deploy}} when the deployment 
future completes; an implementation will initiate the tracking.
{{onStateUpdate}} is called in {{Execution#transitionState}} on any successful 
state transition, an implementation will stop the tracking if the new state is 
a terminal one.

Note: The deployment listener is required since there is no dedicated state for 
a deployed task;
executions are switched to DEPLOYING, submitted to the {{TaskExecutor}},
and switched to running after an update from the {{TaskExecutor}}.
Since this update can be lost we cannot rely on it.
A dedicated DEPLOYED state would be preferable, but this would require too many 
changes to the {{ExecutionGraph}} at this time.

2) The reconciler lives in the {{JobMaster}} and uses the IDs provided by the 
tracker and {{TaskExecutor}} heartbeats to detect mismatches, and fire events 
accordingly.
By defining a {{ReconciliationHandler}} the {{JobMaster}} can decide how each 
case should be handled:

{code}
public interface ExecutionDeploymentReconciler {

// conceptual factory interface
interface Factory {
ExecutionDeploymentStateReconciler get(ReconciliationHandler 
trigger);
}

void reconcileExecutionStates(ResourceID origin, DeploymentReport 
deploymentReport, Set knownExecutionAttemptsIds);

interface ExecutionIdsProvider {
Set getExecutions(ResourceID host);
}

interface ReconciliationHandler {
// fail the execution in the ExecutionGraph
void onMissingDeployment(ExecutionAttemptID deployment);

// cancel the task on the TaskExecutor
void onUnknownDeployment(ExecutionAttemptID deployment, 
ResourceID 

[GitHub] [flink] flinkbot edited a comment on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12769:
URL: https://github.com/apache/flink/pull/12769#issuecomment-649496339


   
   ## CI report:
   
   * e4671572632d30c334c40d726409ea56152e2c1a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4033)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * e180236be81cbad2b936db2f57e764d1243e980c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3976)
 
   * a4c5e8c756f9380734727742390ce88d7bc47372 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4036)
 
   * 265d6eb7970325c88d2b3a9c77fa308be89641a9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-25 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145003#comment-17145003
 ] 

Jark Wu commented on FLINK-18434:
-

Thanks [~dwysakowicz]. I think this is a design flaw of {{Catalog}} interface. 
Maybe this is the time to have separate {{ReadOnlyCatalog}}. 

For temporary fix, I think the {{FunctionCatalog}} should call 
{{Catalog#functionExists(..)}} first, and {{JdbcCatalog#functionExists(..)}} 
should always return false. WDYT?

> Can not select fields with JdbcCatalog
> --
>
> Key: FLINK-18434
> URL: https://issues.apache.org/jira/browse/FLINK-18434
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Priority: Blocker
> Fix For: 1.12.0, 1.11.1
>
>
> A query which selects fields from a table will fail if we set the 
> PostgresCatalog as default.
> Steps to reproduce:
>  # Create postgres catalog and set it as default
>  # Create any table (in any catalog)
>  # Query that table with {{SELECT field FROM t}} (Important it must be a 
> field name not '{{*}}'
>  #  The query will fail
> Stack trace:
> {code}
> org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
> statement.
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
> failed. null
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   ... 6 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
>  ~[?:?]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> 

[GitHub] [flink] nielsbasjes commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


nielsbasjes commented on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-649607408


   Note that this should be squashed into a single commit before merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18427) Job failed under java 11

2020-06-25 Thread Zhijiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144996#comment-17144996
 ] 

Zhijiang commented on FLINK-18427:
--

Fully agree with [~sewen]'s above analysis.

In addition, we also planned to back port this improvement 
https://issues.apache.org/jira/browse/FLINK-15962 to release-1.10 later, then 
it can also help to reduce the netty direct memory overhead somehow. 

> Job failed under java 11
> 
>
> Key: FLINK-18427
> URL: https://issues.apache.org/jira/browse/FLINK-18427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration, Runtime / Network
>Affects Versions: 1.10.0
>Reporter: Zhang Hao
>Priority: Critical
>
> flink version:1.10.0
> deployment mode:cluster
> os:linux redhat7.5
> Job parallelism:greater than 1
> My job run normally under java 8, but failed under java 11.Excpetion info 
> like below,netty send message failed.In addition, I found job would failed 
> when task was distributed on multi node, if I set job's parallelism = 1, job 
> run normally under java 11 too.
>  
> 2020-06-24 09:52:162020-06-24 
> 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
>  Sending the partition request to '/170.0.50.19:33320' failed. at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.io.IOException: Error while serializing message: 
> PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716)
>  ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: 
> Direct buffer memory at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174)
>  ... 12 moreCaused 

[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022


   
   ## CI report:
   
   * 2a146962687424a20b253ed3fcc42700e416375d Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935)
 
   * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034)
 
   * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4035)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] azagrebin edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


azagrebin edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-649594733







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] azagrebin commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


azagrebin commented on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-649594733


   @nielsbasjes
   The last suggestion looks good to me. Let's just avoid uppercase: `NOT` -> 
`not`.
I agree, snapshot versions would make things cleaner, hopefully community 
will get to it soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] morsapaes opened a new pull request #352: Add 1.11 Release announcement.

2020-06-25 Thread GitBox


morsapaes opened a new pull request #352:
URL: https://github.com/apache/flink-web/pull/352


   Adding a blogpost for the upcoming 1.11 release announcement.
   
   Feel free to comment and leave any feedback!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144952#comment-17144952
 ] 

Arvid Heise commented on FLINK-18433:
-

Doing some bisect on a specific test would definitively help. Could you drive 
that [~Aihua]?

Interestingly, AT_LEAST_ONCE suffered the most regression. 

Additional suspicion: it could be related to 
https://github.com/apache/flink/pull/12697 . Do we have a high DOP?

How long are the jobs running btw?

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  




[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022


   
   ## CI report:
   
   * 2a146962687424a20b253ed3fcc42700e416375d Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935)
 
   * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034)
 
   * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * e180236be81cbad2b936db2f57e764d1243e980c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3976)
 
   * a4c5e8c756f9380734727742390ce88d7bc47372 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] nielsbasjes commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


nielsbasjes commented on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-649561412


   @azagrebin (Weird, cannot respond to your feedback regarding the comment 
with 'latest' directly...)
   
   I put the "over time" mention in because your application may work today but 
as soon as a new version is released or Scala 2.13 is added it will break in 
unexpected ways.
   
   How about this? ```# The 'latest' tag contains the latest released version 
of Flink for a specific Scala version. Do NOT use this in production as it will 
break your setup automatically when a new version is released.```
   
   I still strongly believe we should simply NOT have the 'latest' tag at all 
and we should have docker images with SNAPSHOT versions which are pushed at the 
same time a SNAPSHOT build is pushed to maven. That would make this entire 
situation go away cleanly.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] fsk119 commented on pull request #12758: [FLINK-18386][docs-zh] Translate "Print SQL Connector" page into Chinese

2020-06-25 Thread GitBox


fsk119 commented on pull request #12758:
URL: https://github.com/apache/flink/pull/12758#issuecomment-649547883


   Thanks for your contribution @houmaozheng . Looks good to me. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] liyubin117 commented on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


liyubin117 commented on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-649547234


   @libenchao Thanks for your review, I have finished it



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022


   
   ## CI report:
   
   * 2a146962687424a20b253ed3fcc42700e416375d Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935)
 
   * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12748:
URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022


   
   ## CI report:
   
   * 2a146962687424a20b253ed3fcc42700e416375d Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935)
 
   * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] sjwiesman closed pull request #350: [hotfix] Documentation Style Guide: Sync + Correction

2020-06-25 Thread GitBox


sjwiesman closed pull request #350:
URL: https://github.com/apache/flink-web/pull/350


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] sjwiesman commented on pull request #350: [hotfix] Documentation Style Guide: Sync + Correction

2020-06-25 Thread GitBox


sjwiesman commented on pull request #350:
URL: https://github.com/apache/flink-web/pull/350#issuecomment-649526151


   Whoops, yes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-25 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-18434:
-
Priority: Blocker  (was: Critical)

> Can not select fields with JdbcCatalog
> --
>
> Key: FLINK-18434
> URL: https://issues.apache.org/jira/browse/FLINK-18434
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Priority: Blocker
> Fix For: 1.12.0, 1.11.1
>
>
> A query which selects fields from a table will fail if we set the 
> PostgresCatalog as default.
> Steps to reproduce:
>  # Create postgres catalog and set it as default
>  # Create any table (in any catalog)
>  # Query that table with {{SELECT field FROM t}} (Important it must be a 
> field name not '{{*}}'
>  #  The query will fail
> Stack trace:
> {code}
> org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
> statement.
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
> failed. null
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   ... 6 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
>  ~[?:?]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73)
>  

[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-25 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-18434:
-
Fix Version/s: 1.11.1

> Can not select fields with JdbcCatalog
> --
>
> Key: FLINK-18434
> URL: https://issues.apache.org/jira/browse/FLINK-18434
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.11.1
>
>
> A query which selects fields from a table will fail if we set the 
> PostgresCatalog as default.
> Steps to reproduce:
>  # Create postgres catalog and set it as default
>  # Create any table (in any catalog)
>  # Query that table with {{SELECT field FROM t}} (Important it must be a 
> field name not '{{*}}'
>  #  The query will fail
> Stack trace:
> {code}
> org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
> statement.
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
> failed. null
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   ... 6 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
>  ~[?:?]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73)
>  

[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-25 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-18434:
-
Fix Version/s: 1.12.0

> Can not select fields with JdbcCatalog
> --
>
> Key: FLINK-18434
> URL: https://issues.apache.org/jira/browse/FLINK-18434
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0, 1.11.1
>
>
> A query which selects fields from a table will fail if we set the 
> PostgresCatalog as default.
> Steps to reproduce:
>  # Create postgres catalog and set it as default
>  # Create any table (in any catalog)
>  # Query that table with {{SELECT field FROM t}} (Important it must be a 
> field name not '{{*}}'
>  #  The query will fail
> Stack trace:
> {code}
> org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
> statement.
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
> failed. null
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   ... 6 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
>  ~[?:?]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73)
>  

[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-25 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-18434:
-
Priority: Critical  (was: Major)

> Can not select fields with JdbcCatalog
> --
>
> Key: FLINK-18434
> URL: https://issues.apache.org/jira/browse/FLINK-18434
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
>
> A query which selects fields from a table will fail if we set the 
> PostgresCatalog as default.
> Steps to reproduce:
>  # Create postgres catalog and set it as default
>  # Create any table (in any catalog)
>  # Query that table with {{SELECT field FROM t}} (Important it must be a 
> field name not '{{*}}'
>  #  The query will fail
> Stack trace:
> {code}
> org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
> statement.
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
> failed. null
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   ... 6 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
>  ~[?:?]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]

[jira] [Created] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-25 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-18434:


 Summary: Can not select fields with JdbcCatalog
 Key: FLINK-18434
 URL: https://issues.apache.org/jira/browse/FLINK-18434
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.0
Reporter: Dawid Wysakowicz


A query which selects fields from a table will fail if we set the 
PostgresCatalog as default.

Steps to reproduce:
 # Create postgres catalog and set it as default
 # Create any table (in any catalog)
 # Query that table with {{SELECT field FROM t}} (Important it must be a field 
name not '{{*}}'
 #  The query will fail

Stack trace:
{code}
org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
statement.
at 
org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
 ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
 ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
failed. null
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) 
~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
 ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
 ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
 ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
 ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
... 6 more
Caused by: java.lang.UnsupportedOperationException
at 
org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
 ~[?:?]
at 
org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252]
at 
org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.makeNullaryCall(SqlValidatorImpl.java:1754)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5987)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
   

[GitHub] [flink] flinkbot edited a comment on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12769:
URL: https://github.com/apache/flink/pull/12769#issuecomment-649496339


   
   ## CI report:
   
   * e4671572632d30c334c40d726409ea56152e2c1a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4033)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors

2020-06-25 Thread GitBox


flinkbot commented on pull request #12769:
URL: https://github.com/apache/flink/pull/12769#issuecomment-649496339


   
   ## CI report:
   
   * e4671572632d30c334c40d726409ea56152e2c1a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-18349) Add Flink 1.11 release notes to documentation

2020-06-25 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-18349.
--
Resolution: Fixed

Merged to master as e63345354e..6227fffbe6 release as 6ecf2d3096

> Add Flink 1.11 release notes to documentation
> -
>
> Key: FLINK-18349
> URL: https://issues.apache.org/jira/browse/FLINK-18349
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.11.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Gather, edit, and add Flink 1.10 release notes to documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] pnowojski merged pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11

2020-06-25 Thread GitBox


pnowojski merged pull request #12699:
URL: https://github.com/apache/flink/pull/12699


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11

2020-06-25 Thread GitBox


pnowojski commented on a change in pull request #12699:
URL: https://github.com/apache/flink/pull/12699#discussion_r445498558



##
File path: docs/release-notes/flink-1.11.md
##
@@ -0,0 +1,291 @@
+---
+title: "Release Notes - Flink 1.11"
+---
+
+
+
+These release notes discuss important aspects, such as configuration, behavior,
+or dependencies, that changed between Flink 1.10 and Flink 1.11. Please read
+these notes carefully if you are planning to upgrade your Flink version to 
1.11.
+
+* This will be replaced by the TOC
+{:toc}
+
+### Clusters & Deployment
+ Support for Hadoop 3.0.0 and higher 
([FLINK-11086](https://issues.apache.org/jira/browse/FLINK-11086))
+Flink project does not provide any updated "flink-shaded-hadoop-*" jars.
+Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH 
environment variable (recommended) or via `lib/` folder.
+Also, the `include-hadoop` Maven profile has been removed.
+
+ Removal of `LegacyScheduler` 
([FLINK-15629](https://issues.apache.org/jira/browse/FLINK-15629))
+Flink no longer supports the legacy scheduler. 
+Hence, setting `jobmanager.scheduler: legacy` will no longer work and fail 
with an `IllegalArgumentException`. 
+The only valid option for `jobmanager.scheduler` is the default value `ng`.
+
+ Bind user code class loader to lifetime of a slot 
([FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408))
+The user code class loader is being reused by the `TaskExecutor` as long as 
there is at least a single slot allocated for the respective job. 
+This changes Flink's recovery behaviour slightly so that it will not reload 
static fields.
+The benefit is that this change drastically reduces pressure on the JVM's 
metaspace.
+
+ Replaced `slave` file name with `workers` 
([FLINK-18307](https://issues.apache.org/jira/browse/FLINK-18307))
+For Standalone Setups, the file with the worker nodes is no longer called 
`slaves` but `workers`.
+Previous setups that use the `start-cluster.sh` and `stop-cluster.sh` scripts 
need to rename that file.
+
+ Flink Docker Integration Improvements
+The examples of `Dockerfiles` and docker image `build.sh` scripts have been 
removed from [the Flink Github repository](https://github.com/apache/flink). 
The examples will no longer be maintained by community in the Flink Github 
repository, including the examples of integration with Bluemix. Therefore, the 
following modules have been deleted from the Flink Github repository:
+- `flink-contrib/docker-flink`
+- `flink-container/docker`
+- `flink-container/kubernetes`
+
+Check the updated user documentation for [Flink Docker 
integration](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html)
 instead. It now describes in detail how to 
[use](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-a-flink-image)
 and 
[customize](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image)
 [the Flink official docker 
image](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#docker-hub-flink-images):
 configuration options, logging, plugins, adding more dependencies and 
installing software. The documentation also includes examples for Session and 
Job cluster deployments with:
+- [docker 
run](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-flink-image)
+- [docker 
compose](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-compose)
+- [docker 
swarm](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-swarm)
+- [standalone 
Kubernetes](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html)
+
+### Memory Management
+ New Flink Master Memory Model
+# Overview
+With 
[FLIP-116](https://cwiki.apache.org/confluence/display/FLINK/FLIP-116%3A+Unified+Memory+Configuration+for+Job+Managers),
 a new memory model has been introduced for the Flink Master. New configuration 
options have been introduced to control the memory consumption of the Flink 
Master process. This affects all types of deployments: standalone, YARN, Mesos, 
and the new active Kubernetes integration.
+
+Please, check the user documentation for [more 
details](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup_master.html).
+
+If you try to reuse your previous Flink configuration without any adjustments, 
the new memory model can result in differently computed memory parameters for 
the JVM and, thus, performance changes or even failures. See also [the 
migration 
guide](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration).
+
+# Deprecation and breaking changes
+The following options are deprecated:
+ * `jobmanager.heap.size`
+ * 

[GitHub] [flink] pnowojski commented on a change in pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11

2020-06-25 Thread GitBox


pnowojski commented on a change in pull request #12699:
URL: https://github.com/apache/flink/pull/12699#discussion_r445496733



##
File path: docs/release-notes/flink-1.11.md
##
@@ -0,0 +1,291 @@
+---
+title: "Release Notes - Flink 1.11"
+---
+
+
+
+These release notes discuss important aspects, such as configuration, behavior,
+or dependencies, that changed between Flink 1.10 and Flink 1.11. Please read
+these notes carefully if you are planning to upgrade your Flink version to 
1.11.
+
+* This will be replaced by the TOC
+{:toc}
+
+### Clusters & Deployment
+ Support for Hadoop 3.0.0 and higher 
([FLINK-11086](https://issues.apache.org/jira/browse/FLINK-11086))
+Flink project does not provide any updated "flink-shaded-hadoop-*" jars.
+Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH 
environment variable (recommended) or via `lib/` folder.
+Also, the `include-hadoop` Maven profile has been removed.
+
+ Removal of `LegacyScheduler` 
([FLINK-15629](https://issues.apache.org/jira/browse/FLINK-15629))
+Flink no longer supports the legacy scheduler. 
+Hence, setting `jobmanager.scheduler: legacy` will no longer work and fail 
with an `IllegalArgumentException`. 
+The only valid option for `jobmanager.scheduler` is the default value `ng`.
+
+ Bind user code class loader to lifetime of a slot 
([FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408))
+The user code class loader is being reused by the `TaskExecutor` as long as 
there is at least a single slot allocated for the respective job. 
+This changes Flink's recovery behaviour slightly so that it will not reload 
static fields.
+The benefit is that this change drastically reduces pressure on the JVM's 
metaspace.
+
+ Replaced `slave` file name with `workers` 
([FLINK-18307](https://issues.apache.org/jira/browse/FLINK-18307))
+For Standalone Setups, the file with the worker nodes is no longer called 
`slaves` but `workers`.
+Previous setups that use the `start-cluster.sh` and `stop-cluster.sh` scripts 
need to rename that file.
+
+ Flink Docker Integration Improvements
+The examples of `Dockerfiles` and docker image `build.sh` scripts have been 
removed from [the Flink Github repository](https://github.com/apache/flink). 
The examples will no longer be maintained by community in the Flink Github 
repository, including the examples of integration with Bluemix. Therefore, the 
following modules have been deleted from the Flink Github repository:
+- `flink-contrib/docker-flink`
+- `flink-container/docker`
+- `flink-container/kubernetes`
+
+Check the updated user documentation for [Flink Docker 
integration](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html)
 instead. It now describes in detail how to 
[use](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-a-flink-image)
 and 
[customize](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image)
 [the Flink official docker 
image](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#docker-hub-flink-images):
 configuration options, logging, plugins, adding more dependencies and 
installing software. The documentation also includes examples for Session and 
Job cluster deployments with:
+- [docker 
run](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-flink-image)
+- [docker 
compose](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-compose)
+- [docker 
swarm](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-swarm)
+- [standalone 
Kubernetes](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html)
+
+### Memory Management
+ New Flink Master Memory Model
+# Overview
+With 
[FLIP-116](https://cwiki.apache.org/confluence/display/FLINK/FLIP-116%3A+Unified+Memory+Configuration+for+Job+Managers),
 a new memory model has been introduced for the Flink Master. New configuration 
options have been introduced to control the memory consumption of the Flink 
Master process. This affects all types of deployments: standalone, YARN, Mesos, 
and the new active Kubernetes integration.
+
+Please, check the user documentation for [more 
details](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup_master.html).
+
+If you try to reuse your previous Flink configuration without any adjustments, 
the new memory model can result in differently computed memory parameters for 
the JVM and, thus, performance changes or even failures. See also [the 
migration 
guide](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration).
+
+# Deprecation and breaking changes
+The following options are deprecated:
+ * `jobmanager.heap.size`
+ * 

[GitHub] [flink] flinkbot commented on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors

2020-06-25 Thread GitBox


flinkbot commented on pull request #12769:
URL: https://github.com/apache/flink/pull/12769#issuecomment-649482720


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit e4671572632d30c334c40d726409ea56152e2c1a (Thu Jun 25 
11:29:10 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-15414).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur opened a new pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors

2020-06-25 Thread GitBox


curcur opened a new pull request #12769:
URL: https://github.com/apache/flink/pull/12769


   The right exception should be `org.apache.kafka.common.KafkaException` 
instead of `kafka.common.KafkaException`
   
   At least the "numTries" exception should be thrown instead of the binding 
exception.
   
   ## What is the purpose of the change
   
   Currently, binding exception is never caught in `KafkaTestEnvironmentImpl`, 
hence re-tries of different port allocation never happens.
   
   Use the right KafkaException in this fix.
   
   
   ## Brief change log
   catch `org.apache.kafka.common.KafkaException` instead of 
`kafka.common.KafkaException` when binding fails to enable retry logic.
   
   
   ## Verifying this change
   
   TBD
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-15414) KafkaITCase#prepare failed in travis

2020-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-15414:
---
Labels: pull-request-available test-stability  (was: test-stability)

> KafkaITCase#prepare failed in travis
> 
>
> Key: FLINK-15414
> URL: https://issues.apache.org/jira/browse/FLINK-15414
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> The travis for release-1.9 failed with the following error:
> {code}
> org.apache.kafka.common.KafkaException: Socket server failed to bind to 
> 0.0.0.0:44867: Address already in use.
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaITCase.prepare(KafkaITCase.java:58)
> Caused by: java.net.BindException: Address already in use
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaITCase.prepare(KafkaITCase.java:58)
> {code}
> instance: [https://api.travis-ci.org/v3/job/629636116/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] azagrebin commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


azagrebin commented on a change in pull request #11245:
URL: https://github.com/apache/flink/pull/11245#discussion_r445467634



##
File path: docs/ops/deployment/kubernetes.md
##
@@ -262,7 +262,7 @@ spec:
 spec:
   containers:
   - name: jobmanager
-image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %}
+image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 
'latest' tag contains the latest released version of Flink for a specific Scala 
version which will mismatch with your application over time.{% endif %}

Review comment:
   We added a 
[note](https://github.com/apache/flink/pull/11245/files#diff-a530b60989203f9e1f03c64c57deb56cR49)
 for this which can be promoted to warning.
   I think the idea here is to keep the comment in this particular line (and 
others like this) brief enough as it is a code example.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zentol commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


zentol commented on a change in pull request #11245:
URL: https://github.com/apache/flink/pull/11245#discussion_r445462983



##
File path: docs/ops/deployment/kubernetes.md
##
@@ -262,7 +262,7 @@ spec:
 spec:
   containers:
   - name: jobmanager
-image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %}
+image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 
'latest' tag contains the latest released version of Flink for a specific Scala 
version which will mismatch with your application over time.{% endif %}

Review comment:
   I think we should be way more explicit about this; scala users should 
_never_ use `latest`; it is simply to easy to run into issues. If we release a 
newer release, with scala 2.13 support we suddenly break everything.
   I would explicitly phrase it like that, with a big warn sign: use it only if 
you are only using java, otherwise select a specific Flink version with a 
matching scala version.
   
   We should also look into providing a `latest_` tag to make 
this easier for scala users.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zentol commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


zentol commented on a change in pull request #11245:
URL: https://github.com/apache/flink/pull/11245#discussion_r445462983



##
File path: docs/ops/deployment/kubernetes.md
##
@@ -262,7 +262,7 @@ spec:
 spec:
   containers:
   - name: jobmanager
-image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %}
+image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 
'latest' tag contains the latest released version of Flink for a specific Scala 
version which will mismatch with your application over time.{% endif %}

Review comment:
   I think we should be way more explicit about this; scala users should 
_never_ use `latest`; it is simply to easy to run into issues. If we release a 
newer release, with scala 2.13 support we suddenly break everything.
   I would explicitly phrase it like that; use it only if you are only using 
java, otherwise select a specific Flink version with a matching scala version.
   
   We should also look into providing a `latest_` tag to make 
this easier for scala users.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface

2020-06-25 Thread rinkako (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rinkako closed FLINK-18432.
---
Fix Version/s: 1.12.0
   Resolution: Duplicate

> add open and close methods for ElasticsearchSinkFunction interface
> --
>
> Key: FLINK-18432
> URL: https://issues.apache.org/jira/browse/FLINK-18432
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / ElasticSearch
>Reporter: rinkako
>Priority: Major
>  Labels: usability
> Fix For: 1.12.0
>
>
> Here comes a example story: we want to sink data to ES with a day-rolling 
> index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a 
> day-rolling postfix to add to the index pattern of `return 
> Requests.indexRequest().index("mydoc_"+postfix)` in a custom 
> `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable 
> class and must be a transient field of this `ElasticsearchSinkFunction`, and 
> it must be checked null every time we call `process` (since the field is 
> transient, it may be null at a distributed task manager), which can be done 
> at a `open` method only run once.
> So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` 
> interface can handle this well, users can control their sink function 
> life-cycle more flexiblely. And, for compatibility, this two methods may have 
> a empty default implementation at `ElasticsearchSinkFunction` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface

2020-06-25 Thread rinkako (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144821#comment-17144821
 ] 

rinkako commented on FLINK-18432:
-

Thanks [~jark] , this is actually what I mean, and it already fixed in that 
issue. I'll close my issue.  :D

> add open and close methods for ElasticsearchSinkFunction interface
> --
>
> Key: FLINK-18432
> URL: https://issues.apache.org/jira/browse/FLINK-18432
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / ElasticSearch
>Reporter: rinkako
>Priority: Major
>  Labels: usability
>
> Here comes a example story: we want to sink data to ES with a day-rolling 
> index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a 
> day-rolling postfix to add to the index pattern of `return 
> Requests.indexRequest().index("mydoc_"+postfix)` in a custom 
> `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable 
> class and must be a transient field of this `ElasticsearchSinkFunction`, and 
> it must be checked null every time we call `process` (since the field is 
> transient, it may be null at a distributed task manager), which can be done 
> at a `open` method only run once.
> So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` 
> interface can handle this well, users can control their sink function 
> life-cycle more flexiblely. And, for compatibility, this two methods may have 
> a empty default implementation at `ElasticsearchSinkFunction` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18427) Job failed under java 11

2020-06-25 Thread Andrey Zagrebin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin updated FLINK-18427:

Component/s: (was: API / DataStream)
 (was: API / Core)
 Runtime / Network
 Runtime / Configuration

> Job failed under java 11
> 
>
> Key: FLINK-18427
> URL: https://issues.apache.org/jira/browse/FLINK-18427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration, Runtime / Network
>Reporter: Zhang Hao
>Priority: Critical
>
> flink version:1.10.0
> deployment mode:cluster
> os:linux redhat7.5
> Job parallelism:greater than 1
> My job run normally under java 8, but failed under java 11.Excpetion info 
> like below,netty send message failed.In addition, I found job would failed 
> when task was distributed on multi node, if I set job's parallelism = 1, job 
> run normally under java 11 too.
>  
> 2020-06-24 09:52:162020-06-24 
> 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
>  Sending the partition request to '/170.0.50.19:33320' failed. at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.io.IOException: Error while serializing message: 
> PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716)
>  ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: 
> Direct buffer memory at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174)
>  ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at 
> java.base/java.nio.Bits.reserveMemory(Bits.java:175) at 
> 

[jira] [Updated] (FLINK-18427) Job failed under java 11

2020-06-25 Thread Andrey Zagrebin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin updated FLINK-18427:

Affects Version/s: 1.10.0

> Job failed under java 11
> 
>
> Key: FLINK-18427
> URL: https://issues.apache.org/jira/browse/FLINK-18427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration, Runtime / Network
>Affects Versions: 1.10.0
>Reporter: Zhang Hao
>Priority: Critical
>
> flink version:1.10.0
> deployment mode:cluster
> os:linux redhat7.5
> Job parallelism:greater than 1
> My job run normally under java 8, but failed under java 11.Excpetion info 
> like below,netty send message failed.In addition, I found job would failed 
> when task was distributed on multi node, if I set job's parallelism = 1, job 
> run normally under java 11 too.
>  
> 2020-06-24 09:52:162020-06-24 
> 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
>  Sending the partition request to '/170.0.50.19:33320' failed. at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.io.IOException: Error while serializing message: 
> PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716)
>  ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: 
> Direct buffer memory at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174)
>  ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at 
> java.base/java.nio.Bits.reserveMemory(Bits.java:175) at 
> java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at 
> 

[GitHub] [flink] azagrebin commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-25 Thread GitBox


azagrebin commented on a change in pull request #11245:
URL: https://github.com/apache/flink/pull/11245#discussion_r445404435



##
File path: 
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/configuration/KubernetesConfigOptions.java
##
@@ -137,11 +139,17 @@
.withDescription("The cluster-id, which should be no more than 
45 characters, is used for identifying " +
"a unique Flink cluster. If not set, the client will 
automatically generate it with a random ID.");
 
+   // The default container image that ties to the exact needed versions 
of both Flink and Scala.
+   public static final String DEFAULT_CONTAINER_IMAGE = "flink:" + 
EnvironmentInformation.getVersion() + "-scala_" + 
EnvironmentInformation.getScalaVersion();
+
+   @Documentation.OverrideDefault("The default value depends on the 
actually running version. In general it looks like 
\"flink_-scala_\"")

Review comment:
   ```suggestion
@Documentation.OverrideDefault("The default value depends on the 
actually running version. In general it looks like 
\"flink:-scala_\"")
   ```

##
File path: docs/ops/deployment/kubernetes.md
##
@@ -262,7 +262,7 @@ spec:
 spec:
   containers:
   - name: jobmanager
-image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %}
+image: flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 
'latest' tag contains the latest released version of Flink for a specific Scala 
version which will mismatch with your application over time.{% endif %}

Review comment:
   How about this:
   ```
   flink:latest # The 'latest' tag contains the latest released version of 
Flink for a specific Scala version which can conflict with the versions, used 
by your application.
   ```
   Somehow this sounds confusing:
   `which will mismatch with your application over time`, in particular `over 
time`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144814#comment-17144814
 ] 

Yu Li commented on FLINK-18433:
---

In parallel with the manual analysis, does it worth to use binary search on 
benchmark with the most noticeable regression, i.e. 
{{TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap}} to locate the problematic commit?

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144812#comment-17144812
 ] 

Yu Li commented on FLINK-18433:
---

And IMHO the most noticeable regressions seem to be barrier alignment related:

|TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
|TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
|TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
|TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 

[jira] [Commented] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener

2020-06-25 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144807#comment-17144807
 ] 

Chesnay Schepler commented on FLINK-18430:
--

I suppose the proper way would be to duplicate the interface in some API 
module, have the runtime version extend this new interface, and deprecate the 
runtime version.

> Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
> 
>
> Key: FLINK-18430
> URL: https://issues.apache.org/jira/browse/FLINK-18430
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.11.0
>
>
> The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are 
> used by many users, but are still (for years now) marked as 
> {{@PublicEvolving}}.
> I think this is not correct. They are very core to the DataStream API and are 
> used widely and should be treated as {{@Public}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144797#comment-17144797
 ] 

Yu Li commented on FLINK-18433:
---

First of all, are you using the latest release-1.11 branch, some RC tag, or 
some older release-1.11? I'd mainly like to know whether changes of FLINK-17800 
is there in your test branch [~Aihua], thanks.

Checking all the [changes|https://s.apache.org/b7xiw] in (latest) 1.11.0 but 
not in 1.10.0, I think the only suspect is FLINK-17865 (with a low chance, 
though). Maybe worth to try setting `state.backend.fs.memory-threshold` to `1 
kb` and check the result [~Aihua].

btw, we've also watched the micro-benchmarks and didn't see any regression 
before, which indicates it worth to get FLINK-14917 in to guard our release.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 

[jira] [Commented] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface

2020-06-25 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144796#comment-17144796
 ] 

Jark Wu commented on FLINK-18432:
-

I think this has already been done by FLINK-17623?

> add open and close methods for ElasticsearchSinkFunction interface
> --
>
> Key: FLINK-18432
> URL: https://issues.apache.org/jira/browse/FLINK-18432
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / ElasticSearch
>Reporter: rinkako
>Priority: Major
>  Labels: usability
>
> Here comes a example story: we want to sink data to ES with a day-rolling 
> index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a 
> day-rolling postfix to add to the index pattern of `return 
> Requests.indexRequest().index("mydoc_"+postfix)` in a custom 
> `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable 
> class and must be a transient field of this `ElasticsearchSinkFunction`, and 
> it must be checked null every time we call `process` (since the field is 
> transient, it may be null at a distributed task manager), which can be done 
> at a `open` method only run once.
> So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` 
> interface can handle this well, users can control their sink function 
> life-cycle more flexiblely. And, for compatibility, this two methods may have 
> a empty default implementation at `ElasticsearchSinkFunction` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese

2020-06-25 Thread GitBox


flinkbot edited a comment on pull request #12727:
URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602


   
   ## CI report:
   
   * 5a76928eaf6355d22fb655a821cb5f922f560fe2 UNKNOWN
   * ed389accd5133d94808984cbbe573cc89517beb1 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4026)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-25 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144776#comment-17144776
 ] 

Aljoscha Krettek commented on FLINK-18150:
--

Does this also happen when you don't have a producer in the Job. I want to rule 
out that you're running into FLINK-17327 before I look deeper into this.

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:821)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.cancel(StreamSource.java:147)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.cancelTask(SourceStreamTask.java:136)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:602)
> at 
> org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1355)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It recovers, but processes fewer than the expected amount of records.
> Finally,  the job fails with
> {code:none}
> 2020-06-05 13:59:37
> org.apache.kafka.common.errors.TimeoutException: Timeout expired while 
> fetching topic metadata
> {code}
> and repeats doing so while not processing any records. (The exception comes 
> without any backtrace or otherwise interesting information)
> I have also observed this behavior with partition-discovery turned off, but 
> only when the Flink job failed (after a broker failure) and had to run 
> checkpoint recovery for some other reason.
> Please see the [Environment] description for information on how to reproduce 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] aljoscha commented on pull request #12754: [FLINK-5717][streaming] Fix NPE on SessionWindows with ContinuousProc…

2020-06-25 Thread GitBox


aljoscha commented on pull request #12754:
URL: https://github.com/apache/flink/pull/12754#issuecomment-649407553


   Feel free to fix `ContinuousEventTimeTrigger` and ping me in the PR. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener

2020-06-25 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144770#comment-17144770
 ] 

Aljoscha Krettek commented on FLINK-18430:
--

And yes, I see that moving {{CheckpointListener}} will break some user code but 
I think it's the better long-term solution, especially if we want to decouple 
our API/SDK from the runtime.

> Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
> 
>
> Key: FLINK-18430
> URL: https://issues.apache.org/jira/browse/FLINK-18430
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.11.0
>
>
> The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are 
> used by many users, but are still (for years now) marked as 
> {{@PublicEvolving}}.
> I think this is not correct. They are very core to the DataStream API and are 
> used widely and should be treated as {{@Public}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener

2020-06-25 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144769#comment-17144769
 ] 

Aljoscha Krettek commented on FLINK-18430:
--

I agree that we should make more interfaces public to benefit from automated 
checking, however, {{CheckpointListener}} is in {{package 
org.apache.flink.runtime.state}} in the {{flink-runtime}} module. If we want to 
make it {{@Public}} we should 1) move it to an API module and 3) move it to an 
API package. Moving it next to {{CheckpointedFunction}} would be good, I think.

Side note: {{flink-runtime}} has only three {{@Public}} classes. One of them is 
{{SerializableFunction}} which we just introduced for 1.11. We should probably 
also move that to an API package/module.

> Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
> 
>
> Key: FLINK-18430
> URL: https://issues.apache.org/jira/browse/FLINK-18430
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.11.0
>
>
> The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are 
> used by many users, but are still (for years now) marked as 
> {{@PublicEvolving}}.
> I think this is not correct. They are very core to the DataStream API and are 
> used widely and should be treated as {{@Public}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-25 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144768#comment-17144768
 ] 

Julius Michaelis commented on FLINK-18150:
--

My current guess is that this relates to the way that the Kafka client [tries 
to pick a 
broker|https://github.com/apache/kafka/blob/2.2.0/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L643]
 for retrieving the metadata from:
{{request.timeout.ms}} defaults to 3 but {{default.api.timeout.ms}} and 
{{max.block.ms}} default to 6. Hence, only two attempts are made. However, 
if retrieving metadata from one broker failed, that broker may be retired, 
leading to at least a few subtasks attempting to get metadata from a dead 
broker twice, failing the task.

A log of the situation, for reference:
{code:none}
[Producer clientId=producer-202] Removing node kafka1:9091 (id: -1 rack: null) 
from least loaded node selection: is-blacked-out: false, in-flight-requests: 0
[Producer clientId=producer-202] Found least loaded node kafka2:9091 (id: -2 
rack: null)
[Producer clientId=producer-480] Removing node kafka1:9091 (id: -1 rack: null) 
from least loaded node selection: is-blacked-out: false, in-flight-requests: 0
[Producer clientId=producer-480] Found least loaded node kafka2:9091 (id: -2 
rack: null)
[Producer clientId=producer-158] Removing node kafka1:9091 (id: -1 rack: null) 
from least loaded node selection: is-blacked-out: false, in-flight-requests: 0
[Producer clientId=producer-158] Found least loaded node kafka2:9091 (id: -2 
rack: null)
[Producer clientId=producer-285] Removing node kafka2:9091 (id: -2 rack: null) 
from least loaded node selection: is-blacked-out: false, in-flight-requests: 0
[Producer clientId=producer-285] Found least loaded node kafka1:9091 (id: -1 
rack: null)
[Producer clientId=producer-321] Removing node kafka2:9091 (id: -2 rack: null) 
from least loaded node selection: is-blacked-out: false, in-flight-requests: 0
[Producer clientId=producer-321] Found least loaded node kafka1:9091 (id: -1 
rack: null)
[Producer clientId=producer-477] Removing node kafka1:9091 (id: -1 rack: null) 
from least loaded node selection: is-blacked-out: false, in-flight-requests: 0
[Producer clientId=producer-477] Found least loaded node kafka2:9091 (id: -2 
rack: null)
{code}

I've played around a bit, and setting
{code:yaml}
session.timeout.ms: 5000
request.timeout.ms: 1
default.api.timeout.ms: 18
{code}
on the consumer and
{code:yaml}
default.api.timeout.ms: 18
max.block.ms: 181000
{code}
on the producer seem to make the problem go away. But I'm puzzled as to why.




> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> 

[GitHub] [flink] azagrebin commented on a change in pull request #12690: [FLINK-18186][doc] Various updates on standalone kubernetes document

2020-06-25 Thread GitBox


azagrebin commented on a change in pull request #12690:
URL: https://github.com/apache/flink/pull/12690#discussion_r445413795



##
File path: docs/ops/deployment/kubernetes.md
##
@@ -264,6 +264,24 @@ spec:
 component: jobmanager
 {% endhighlight %}
 
+`taskmanager-query-state-service.yaml`. Optional service, that exposes the 
taskmanager `query-state` port as public Kubernetes node's port.

Review comment:
   ```suggestion
   `taskmanager-query-state-service.yaml`. Optional service, that exposes the 
taskmanager port to access the queryable state as a public Kubernetes node's 
port.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] azagrebin commented on a change in pull request #12690: [FLINK-18186][doc] Various updates on standalone kubernetes document

2020-06-25 Thread GitBox


azagrebin commented on a change in pull request #12690:
URL: https://github.com/apache/flink/pull/12690#discussion_r445413795



##
File path: docs/ops/deployment/kubernetes.md
##
@@ -264,6 +264,24 @@ spec:
 component: jobmanager
 {% endhighlight %}
 
+`taskmanager-query-state-service.yaml`. Optional service, that exposes the 
taskmanager `query-state` port as public Kubernetes node's port.

Review comment:
   ```suggestion
   `taskmanager-query-state-service.yaml`. Optional service, that exposes the 
TaskManager port to access the queryable state as a public Kubernetes node's 
port.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] azagrebin commented on a change in pull request #12690: [FLINK-18186][doc] Various updates on standalone kubernetes document

2020-06-25 Thread GitBox


azagrebin commented on a change in pull request #12690:
URL: https://github.com/apache/flink/pull/12690#discussion_r445412890



##
File path: docs/ops/deployment/kubernetes.md
##
@@ -264,6 +264,24 @@ spec:
 component: jobmanager
 {% endhighlight %}
 
+`taskmanager-query-state-service.yaml`. Optional service, that exposes the 
taskmanager `query-state` port as public Kubernetes node's port.

Review comment:
   should we mention this service in `## Deploy Flink cluster on 
Kubernetes` chapter?
   like we describe `jobmanager-rest-service.yaml`:
   - after description of access REST API, we could mention that the 
`query-state` port can be accessed in a similar way.
   - add the optional `kubectl create/delete -f 
taskmanager-query-state-service.yaml` command





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144762#comment-17144762
 ] 

Piotr Nowojski commented on FLINK-18433:


{quote}
Could this have anything to do with the n-ary input operator? 
{quote}
Almost definitely not - n-ary input operator/task are independent beings, not 
sharing the same code path.

For the same reasons as [~AHeise] I would rather suspect something that wasn't 
covered by our manual cluster tests or JMH. Some more elaborate state backend 
access or cluster setups (like memory). Also similar performance drop across 
the board, including both heap and rocksdb would rather suggest to me, that's 
the regression is not in the runtime code. Performance regressions in the 
runtime code around ~10% when using heap state backend shouldn't be visible 
when using RocksDB. That also suggests things like memory settings.

I'm OoO so I won't be able to investigate this. 

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> 

[jira] [Closed] (FLINK-18417) Support List as a conversion class for ARRAY

2020-06-25 Thread Timo Walther (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther closed FLINK-18417.

Fix Version/s: 1.12.0
   Resolution: Fixed

Fixed in 1.12.0: 6834ed181aa9edda6b9c7fb61696de93170a88d1

> Support List as a conversion class for ARRAY
> 
>
> Key: FLINK-18417
> URL: https://issues.apache.org/jira/browse/FLINK-18417
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently we don't support {{List}} as a conversion class in the new type 
> system. However, there are multiple reasons why we should support this 
> conversion:
> 1) Hive uses {{List}} as the default for representing arrays e.g. in 
> functions.
> 2) The new Expression DSL supports converting lists to array literals already.
> 3) The list interface is essential for Java users and part of many structured 
> types.
> 4) We need to represent lists in {{ListView}} for aggregate functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] twalthr closed pull request #12765: [FLINK-18417][table] Support List as a conversion class for ARRAY

2020-06-25 Thread GitBox


twalthr closed pull request #12765:
URL: https://github.com/apache/flink/pull/12765


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17075) Add task status reconciliation between TM and JM

2020-06-25 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144759#comment-17144759
 ] 

Till Rohrmann commented on FLINK-17075:
---

Thanks for creating this proposal [~chesnay]. It sounds good to me. I think we 
need the {{ExecutionIdsProvider}} to return the {{Executions}} which are 
deployed to a given {{ResourceID}}.

> Add task status reconciliation between TM and JM
> 
>
> Key: FLINK-17075
> URL: https://issues.apache.org/jira/browse/FLINK-17075
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Assignee: Chesnay Schepler
>Priority: Critical
> Fix For: 1.10.2, 1.12.0, 1.11.1
>
>
> In order to harden the TM and JM communication I suggest to let the 
> {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of 
> the heartbeat payload (similar to FLINK-11059). This would allow to reconcile 
> the states of both components in case that a status update message was lost 
> as described by a user on the ML.
> https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144757#comment-17144757
 ] 

Till Rohrmann commented on FLINK-18433:
---

Pulling in [~liyu] who might now more about possible state backend changes.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15416) Add Retry Mechanism for PartitionRequestClientFactory.ConnectingChannel

2020-06-25 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-15416:
--
Release Note: new configuration parameter: taskmanager.network.retries 

> Add Retry Mechanism for PartitionRequestClientFactory.ConnectingChannel
> ---
>
> Key: FLINK-15416
> URL: https://issues.apache.org/jira/browse/FLINK-15416
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / Network
>Affects Versions: 1.10.0
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We run a flink with 256 TMs in production. The job internally has keyby 
> logic. Thus, it builds a 256 * 256 communication channels. An outage happened 
> when there is a chip internal link of one of the network switchs broken that 
> connecting these machines. During the outage, the flink can't restart 
> successfully as there is always an exception like  "Connecting the channel 
> failed: Connecting to remote task manager + '/10.14.139.6:41300' has 
> failed. This might indicate that the remote task manager has been lost. 
> After deep investigation with the network infrastructure team, we found there 
> are 6 switchs connecting with these machines. Each switch has 32 physcal 
> links. Every socket is round-robin assigned to each of links for load 
> balances. Thus, there is always average 256 * 256 / 6 * 32  * 2 = 170 
> channels will be assigned to the broken link. The issue lasted for 4 hours 
> until we found the broken link and restart the problematic switch. 
> Given this, we found that the retry of creating channel will help to resolve 
> this issue. For our networking topology, we can set retry to 2. As 170 / (132 
> * 132) < 1, which means after retry twice no channel in 170 channels will be 
> assigned to the broken link in the average case.
> I think it is valuable fix for this kind of partial network partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17073) Slow checkpoint cleanup causing OOMs

2020-06-25 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144726#comment-17144726
 ] 

Etienne Chauchot edited comment on FLINK-17073 at 6/25/20, 8:04 AM:


[~trohrmann], I wrote [this|[https://s.apache.org/checkpoint-backpressure]] 
FLIP style design document for checkpoint backpressure. Can you tell me what 
you think? Also I don't have the rights to create FLIP design documents in 
flink confluence workspace so I did the FLIP in a google doc. Can you give me 
the rights?


was (Author: echauchot):
[~trohrmann], I wrote [this|[https://s.apache.org/checkpoint-backpressure]] 
FLIP style design document for checkpoint backpressure. Can you tell me what 
you think? Also I don't have the rights to create FLIP design documents in 
flink confluence workspace so I did the FLIP in a google doc. Can you give me 
the rights?

> Slow checkpoint cleanup causing OOMs
> 
>
> Key: FLINK-17073
> URL: https://issues.apache.org/jira/browse/FLINK-17073
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.12.0
>
>
> A user reported that he sees a decline in checkpoint cleanup speed when 
> upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup 
> tasks are waiting in the execution queue occupying memory. Ultimately, the JM 
> process dies with an OOM.
> Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is 
> used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the 
> {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max 
> parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as 
> CPU cores. This change might have caused the decline in completed checkpoint 
> discard throughput. This suspicion needs to be validated before trying to fix 
> it!
> [1] 
> https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17073) Slow checkpoint cleanup causing OOMs

2020-06-25 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144733#comment-17144733
 ] 

Etienne Chauchot commented on FLINK-17073:
--

I'm always glad to help ! No problem about syncing with [~pnowojski] and 
[~SleePy]. Not an trivial problem indeed, but I figured out that it could be a 
good way to learn quite a lot about Flink internals :)

> Slow checkpoint cleanup causing OOMs
> 
>
> Key: FLINK-17073
> URL: https://issues.apache.org/jira/browse/FLINK-17073
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.12.0
>
>
> A user reported that he sees a decline in checkpoint cleanup speed when 
> upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup 
> tasks are waiting in the execution queue occupying memory. Ultimately, the JM 
> process dies with an OOM.
> Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is 
> used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the 
> {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max 
> parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as 
> CPU cores. This change might have caused the decline in completed checkpoint 
> discard throughput. This suspicion needs to be validated before trying to fix 
> it!
> [1] 
> https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-web] morsapaes commented on pull request #350: [hotfix] Documentation Style Guide: Sync + Correction

2020-06-25 Thread GitBox


morsapaes commented on pull request #350:
URL: https://github.com/apache/flink-web/pull/350#issuecomment-649336113


   @sjwiesman , can this PR be closed? The changes seem to have been merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on a change in pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11

2020-06-25 Thread GitBox


tillrohrmann commented on a change in pull request #12699:
URL: https://github.com/apache/flink/pull/12699#discussion_r445373126



##
File path: docs/release-notes/flink-1.11.md
##
@@ -0,0 +1,220 @@
+---
+title: "Release Notes - Flink 1.11"
+---
+
+
+
+These release notes discuss important aspects, such as configuration, behavior,
+or dependencies, that changed between Flink 1.10 and Flink 1.11. Please read
+these notes carefully if you are planning to upgrade your Flink version to 
1.11.
+
+* This will be replaced by the TOC
+{:toc}
+
+### Clusters & Deployment
+ Removal of `LegacyScheduler` 
([FLINK-15629](https://issues.apache.org/jira/browse/FLINK-15629))
+Flink no longer supports the legacy scheduler. 
+Hence, setting `jobmanager.scheduler: legacy` will no longer work and fail 
with an `IllegalArgumentException`. 
+The only valid option for `jobmanager.scheduler` is the default value `ng`.
+
+ Bind user code class loader to lifetime of a slot 
([FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408))
+The user code class loader is being reused by the `TaskExecutor` as long as 
there is at least a single slot allocated for the respective job. 
+This changes Flink's recovery behaviour slightly so that it will not reload 
static fields.
+The benefit is that this change drastically reduces pressure on the JVM's 
metaspace.
+
+### Memory Management
+ Removal of deprecated mesos.resourcemanager.tasks.mem 
([FLINK-15198](https://issues.apache.org/jira/browse/FLINK-15198))
+
+The `mesos.resourcemanager.tasks.mem` option, deprecated in 1.10 in favour of 
`taskmanager.memory.process.size`, has been completely removed and will have no 
effect anymore in 1.11+.
+
+### Table API & SQL
+ Changed packages of `TableEnvironment` 
([FLINK-15947](https://issues.apache.org/jira/browse/FLINK-15947))
+FLINK-15947Finish moving scala expression DSL to flink-table-api-scala
+   
+Due to various issues with packages `org.apache.flink.table.api.scala/java` 
all classes from those packages were relocated. 
+Moreover the scala expressions were moved to `org.apache.flink.table.api` as 
anounced in Flink 1.9.
+
+If you used one of:
+* `org.apache.flink.table.api.java.StreamTableEnvironment`
+* `org.apache.flink.table.api.scala.StreamTableEnvironment`
+* `org.apache.flink.table.api.java.BatchTableEnvironment`
+* `org.apache.flink.table.api.scala.BatchTableEnvironment` 
+
+And you do not convert to/from DataStream switch to:
+* `org.apache.flink.table.api.TableEnvironment` 
+
+If you do convert to/from DataStream/DataSet change your imports to one of:
+* `org.apache.flink.table.api.bridge.java.StreamTableEnvironment`
+* `org.apache.flink.table.api.bridge.scala.StreamTableEnvironment`
+* `org.apache.flink.table.api.bridge.java.BatchTableEnvironment`
+* `org.apache.flink.table.api.bridge.scala.BatchTableEnvironment` 
+
+For the Scala expressions use the import:
+* `org.apache.flink.table.api._` instead of 
`org.apache.flink.table.api.bridge.scala._` 
+
+Additionally if you use Scala's implicit conversions to/from 
DataStream/DataSet import `org.apache.flink.table.api.bridge.scala._` instead 
of `org.apache.flink.table.api.scala._`
+
+ Removal of deprecated `StreamTableSink` 
([FLINK-16362](https://issues.apache.org/jira/browse/FLINK-16362))
+The existing `StreamTableSink` implementations should remove emitDataStream 
method.
+
+ Removal of `BatchTableSink#emitDataSet` 
([FLINK-16535](https://issues.apache.org/jira/browse/FLINK-16535))
+The existing `BatchTableSink` implementations should rename `emitDataSet` to 
`consumeDataSet` and return `DataSink`.
+  
+ Corrected execution behavior of TableEnvironment.execute() and 
StreamTableEnvironment.execute() 
([FLINK-16363](https://issues.apache.org/jira/browse/FLINK-16363))
+
+In previous versions, `TableEnvironment.execute()` and 
`StreamExecutionEnvironment.execute()` can both trigger table and DataStream 
programs.
+Since Flink 1.11.0, table programs can only be triggered by 
`TableEnvironment.execute()`. 
+Once table program is converted into DataStream program (through 
`toAppendStream()` or `toRetractStream()` method), it can only be triggered by 
`StreamExecutionEnvironment.execute()`.
+
+ Corrected execution behavior of ExecutionEnvironment.execute() and 
BatchTableEnvironment.execute() 
([FLINK-17126](https://issues.apache.org/jira/browse/FLINK-17126))
+
+In previous versions, `BatchTableEnvironment.execute()` and 
`ExecutionEnvironment.execute()` can both trigger table and DataSet programs 
for legacy batch planner.
+Since Flink 1.11.0, batch table programs can only be triggered by 
`BatchEnvironment.execute()`.
+Once table program is converted into DataSet program (through `toDataSet()` 
method), it can only be triggered by `ExecutionEnvironment.execute()`.
+
+### Configuration
+
+ Renamed log4j-yarn-session.properties and logback-yarn.xml properties 
files 

[GitHub] [flink-web] morsapaes opened a new pull request #350: [hotfix] Documentation Style Guide: Sync + Correction

2020-06-25 Thread GitBox


morsapaes opened a new pull request #350:
URL: https://github.com/apache/flink-web/pull/350


   Syncing the ZH version of the Documentation Style Guide with a recent change 
to the english version and correcting the naming guidelines under "Repository".



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-web] morsapaes closed pull request #350: [hotfix] Documentation Style Guide: Sync + Correction

2020-06-25 Thread GitBox


morsapaes closed pull request #350:
URL: https://github.com/apache/flink-web/pull/350


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17073) Slow checkpoint cleanup causing OOMs

2020-06-25 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144726#comment-17144726
 ] 

Etienne Chauchot commented on FLINK-17073:
--

[~trohrmann], I wrote [this|[https://s.apache.org/checkpoint-backpressure]] 
FLIP style design document for checkpoint backpressure. Can you tell me what 
you think? Also I don't have the rights to create FLIP design documents in 
flink confluence workspace so I did the FLIP in a google doc. Can you give me 
the rights?

> Slow checkpoint cleanup causing OOMs
> 
>
> Key: FLINK-17073
> URL: https://issues.apache.org/jira/browse/FLINK-17073
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.12.0
>
>
> A user reported that he sees a decline in checkpoint cleanup speed when 
> upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup 
> tasks are waiting in the execution queue occupying memory. Ultimately, the JM 
> process dies with an OOM.
> Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is 
> used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the 
> {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max 
> parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as 
> CPU cores. This change might have caused the decline in completed checkpoint 
> discard throughput. This suspicion needs to be validated before trying to fix 
> it!
> [1] 
> https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144723#comment-17144723
 ] 

Arvid Heise edited comment on FLINK-18433 at 6/25/20, 7:47 AM:
---

Some possible causes from the network stack:
- We added a few additional statements in InputGate/ResultPartition to support 
Unaligned Checkpoints (UC). https://issues.apache.org/jira/browse/FLINK-14551 
They will be even executed when UC is disabled. However, it would be odd if JIT 
wouldn't optimize that out.
- (We now use fewer exclusive network buffers by default). (doesn't seem to be 
fully merged yet, but there might be some related things merged already 
https://issues.apache.org/jira/browse/FLINK-16428)
- I also thought non-blocking output might affect it, but afaik everything was 
already in 1.10.0. https://issues.apache.org/jira/browse/FLINK-14553

btw we see no regression in our micro benchmarks and our cluster 
[benchmarks|https://docs.google.com/spreadsheets/d/18GO15zO-WI2EzK0fTkWucMmAGicmi0mGjxHlHgzitHQ/edit#gid=0]
 with 
[Throughput|https://github.com/dataArtisans/performance/blob/master/flink-jobs/src/main/java/com/github/projectflink/streaming/Throughput.java]
 job rather suggest that we gained performance. So before diving deeper, I'd 
rather like to rule out state backend related changes first.


was (Author: aheise):
Some possible causes from the network stack:
- We added a few additional statements in InputGate/ResultPartition to support 
Unaligned Checkpoints (UC). https://issues.apache.org/jira/browse/FLINK-14551 
They will be even executed when UC is disabled. However, it would be odd if JIT 
wouldn't optimize that out.
- (We now use fewer exclusive network buffers by default). (doesn't seem to be 
fully merged yet, but there might be some related things merged already 
https://issues.apache.org/jira/browse/FLINK-16428)
- I also thought non-blocking output might affect it, but afaik everything was 
already in 1.10.0. https://issues.apache.org/jira/browse/FLINK-14553

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> 

[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-25 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144723#comment-17144723
 ] 

Arvid Heise commented on FLINK-18433:
-

Some possible causes from the network stack:
- We added a few additional statements in InputGate/ResultPartition to support 
Unaligned Checkpoints (UC). https://issues.apache.org/jira/browse/FLINK-14551 
They will be even executed when UC is disabled. However, it would be odd if JIT 
wouldn't optimize that out.
- (We now use fewer exclusive network buffers by default). (doesn't seem to be 
fully merged yet, but there might be some related things merged already 
https://issues.apache.org/jira/browse/FLINK-16428)
- I also thought non-blocking output might affect it, but afaik everything was 
already in 1.10.0. https://issues.apache.org/jira/browse/FLINK-14553

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> 

[GitHub] [flink] tillrohrmann closed pull request #11852: [FLINK-17300] Log the lineage information between ExecutionAttemptID …

2020-06-25 Thread GitBox


tillrohrmann closed pull request #11852:
URL: https://github.com/apache/flink/pull/11852


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-17300) Log the lineage information between ExecutionAttemptID and AllocationID

2020-06-25 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-17300.
-
Fix Version/s: (was: 1.11.0)
   1.12.0
   Resolution: Fixed

Fixed via 7e48549b822bc54f8dc0a5e1f9cbb5f3156fda06

> Log the lineage information between ExecutionAttemptID and AllocationID
> ---
>
> Key: FLINK-17300
> URL: https://issues.apache.org/jira/browse/FLINK-17300
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Yangze Guo
>Assignee: Yangze Guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15693) Stop receiving incoming RPC messages when RpcEndpoint is closing

2020-06-25 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144719#comment-17144719
 ] 

Till Rohrmann commented on FLINK-15693:
---

How exactly would this work [~Anglenet]? I think a {{RpcEndpoint}} uses the 
{{AkkaInvocationHandler}} to send itself messages (e.g. {{runAsync}} or when 
using the {{MainThreadExecutor}}).

> Stop receiving incoming RPC messages when RpcEndpoint is closing
> 
>
> Key: FLINK-15693
> URL: https://issues.apache.org/jira/browse/FLINK-15693
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.1, 1.10.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.11.0
>
>
> When calling {{RpcEndpoint#closeAsync()}}, the system triggers 
> {{RpcEndpoint#onStop}} and transitions the endpoint into the 
> {{TerminatingState}}. In order to allow asynchronous clean up operations, the 
> main thread executor is not shut down immediately. As a side effect, the 
> {{RpcEndpoint}} still accepts incoming RPC messages from other components. 
> I think it would be cleaner to no longer accept incoming RPC messages once we 
> are in the {{TerminatingState}}. That way we would not worry about the 
> internal state of the {{RpcEndpoint}} when processing RPC messages (similar 
> to 
> [here|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L952]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17073) Slow checkpoint cleanup causing OOMs

2020-06-25 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144715#comment-17144715
 ] 

Till Rohrmann commented on FLINK-17073:
---

Yes, your help is highly appreciated. Given that this is not a trivial problem 
I would suggest to sync with [~pnowojski] and [~SleePy] about the next steps.

> Slow checkpoint cleanup causing OOMs
> 
>
> Key: FLINK-17073
> URL: https://issues.apache.org/jira/browse/FLINK-17073
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.12.0
>
>
> A user reported that he sees a decline in checkpoint cleanup speed when 
> upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup 
> tasks are waiting in the execution queue occupying memory. Ultimately, the JM 
> process dies with an OOM.
> Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is 
> used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the 
> {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max 
> parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as 
> CPU cores. This change might have caused the decline in completed checkpoint 
> discard throughput. This suspicion needs to be validated before trying to fix 
> it!
> [1] 
> https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18168) Error results when use UDAF with Object Array return type

2020-06-25 Thread Timo Walther (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther closed FLINK-18168.

Fix Version/s: 1.11.1
   1.12.0
   1.10.2
   Resolution: Fixed

Fixed in 1.12.0: 581fabe7df12961d20753dff8947dec07cbb2c56
Fixed in 1.11.1: f5ac8c352b7fb5aff5a78cffa348f72bd8492509
Fixed in 1.10.2: 23ad7e33117b7c02b28fda77596b12668b5117c1

> Error results when use UDAF with Object Array return type
> -
>
> Key: FLINK-18168
> URL: https://issues.apache.org/jira/browse/FLINK-18168
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Reporter: Zou
>Assignee: Zou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.2, 1.12.0, 1.11.1
>
>
> I get error results when I use an UDAF with Object Array return type (e.g. 
> Row[]). I find that the problem is we reuse 'reuseArray' as the return value 
> of ObjectArrayConverter.toBinaryArray(). It leads to 'prevAggValue' and 
> 'newAggValue' in GroupAggFunction.processElement() contains exactly the same 
> BinaryArray, so 'equaliser.equalsWithoutHeader(prevAggValue, newAggValue)' is 
> always true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >