[GitHub] [flink] flinkbot edited a comment on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.
flinkbot edited a comment on pull request #12702: URL: https://github.com/apache/flink/pull/12702#issuecomment-645869341 ## CI report: * d7e9bf3b59e41d5b8f70b8700733c32f851f75a3 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3858) * 135f96e93369341df6be18a1e2df23c20c7434f3 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4042) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.
flinkbot edited a comment on pull request #12702: URL: https://github.com/apache/flink/pull/12702#issuecomment-645869341 ## CI report: * d7e9bf3b59e41d5b8f70b8700733c32f851f75a3 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3858) * 135f96e93369341df6be18a1e2df23c20c7434f3 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Aaaaaaron commented on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.
Aaron commented on pull request #12702: URL: https://github.com/apache/flink/pull/12702#issuecomment-649957192 Hi, can someone review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-17949) KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0>
[ https://issues.apache.org/jira/browse/FLINK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145991#comment-17145991 ] Yuan Mei commented on FLINK-17949: -- Based on the failure reported in this Jira, the failure occurs amongst three tests: `testSerDeIngestionTime`, `testSerDeEventTime` and `testSerDeProcessingTime`; they share the same code path of `testRecordSerDe`. So I extract these three tests (based on master 0c20f25fd5a4351) to re-run on my local azure. *I’ve run more than 3000 tests for about 11 hours in total without a single test failure on azure.* Here is a bit more detail: * 200 runs; with 3 tests mentioned above; finished in 3 hours; total 600 tests [https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=39=results] * 270 runs; with 3 tests mentioned above; canceled after 4 hours by azure; total 810 tests [https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=40=results] * 149 runs; with all the original tests in KafkaShuffleITCase; canceled after 4 hours by azure; total 1639 tests [https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=41=results] Besides, this ticket seems not to report any new failure cases after 6/10, so it is possible that the fix of [FLINK-17327] Fix Kafka Producer Resource Leaks #12589 (merged around 6/10) also fix this? Kafka versions are also bumped in [FLINK-17327]. I guess this should provide enough confidence to say that KafkaShuffle test cases are stable now. If the ticket does not report any new failures, I will close this ticket. > KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 > expected:<310> but was:<0> > - > > Key: FLINK-17949 > URL: https://issues.apache.org/jira/browse/FLINK-17949 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Tests >Affects Versions: 1.11.0, 1.12.0 >Reporter: Robert Metzger >Priority: Critical > Labels: test-stability > Attachments: logs-ci-kafkagelly-1590500380.zip, > logs-ci-kafkagelly-1590524911.zip > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2209=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20 > {code} > 2020-05-26T13:35:19.4022562Z [ERROR] > testSerDeIngestionTime(org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase) > Time elapsed: 5.786 s <<< FAILURE! > 2020-05-26T13:35:19.4023185Z java.lang.AssertionError: expected:<310> but > was:<0> > 2020-05-26T13:35:19.4023498Z at org.junit.Assert.fail(Assert.java:88) > 2020-05-26T13:35:19.4023825Z at > org.junit.Assert.failNotEquals(Assert.java:834) > 2020-05-26T13:35:19.4024461Z at > org.junit.Assert.assertEquals(Assert.java:645) > 2020-05-26T13:35:19.4024900Z at > org.junit.Assert.assertEquals(Assert.java:631) > 2020-05-26T13:35:19.4028546Z at > org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testRecordSerDe(KafkaShuffleITCase.java:388) > 2020-05-26T13:35:19.4029629Z at > org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testSerDeIngestionTime(KafkaShuffleITCase.java:156) > 2020-05-26T13:35:19.4030253Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-05-26T13:35:19.4030673Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-05-26T13:35:19.4031332Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-05-26T13:35:19.4031763Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-05-26T13:35:19.4032155Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > 2020-05-26T13:35:19.4032630Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-05-26T13:35:19.4033188Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > 2020-05-26T13:35:19.4033638Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-05-26T13:35:19.4034103Z at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 2020-05-26T13:35:19.4034593Z at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > 2020-05-26T13:35:19.4035118Z at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > 2020-05-26T13:35:19.4035570Z at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2020-05-26T13:35:19.4035888Z at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink-web] tweise commented on a change in pull request #352: Add 1.11 Release announcement.
tweise commented on a change in pull request #352: URL: https://github.com/apache/flink-web/pull/352#discussion_r445853688 ## File path: _posts/2020-06-29-release-1.11.0.md ## @@ -0,0 +1,299 @@ +--- +layout: post +title: "Apache Flink 1.11.0 Release Announcement" +date: 2020-06-29T08:00:00.000Z +categories: news +authors: +- morsapaes: + name: "Marta Paes" + twitter: "morsapaes" + +excerpt: The Apache Flink community is proud to announce the release of Flink 1.11.0! This release marks the first milestone in realizing a new vision for fault tolerance in Flink, and adds a handful of new features that simplify (and unify) Flink handling across the API stack. In particular for users of the Table API/SQL, this release introduces significant improvements to usability and opens up completely new use cases, including the much-anticipated support for Change Data Capture (CDC)! A great deal of effort has also gone into optimizing PyFlink and ensuring that its functionality is available to a broader set of Flink users. +--- + +The Apache Flink community is proud to announce the release of Flink 1.11.0! This release marks the first milestone in realizing a new vision for fault tolerance in Flink, and adds a handful of new features that simplify (and unify) Flink handling across the API stack. In particular for users of the Table API/SQL, this release introduces significant improvements to usability and opens up completely new use cases, including the much-anticipated support for Change Data Capture (CDC)! A great deal of effort has also gone into optimizing PyFlink and ensuring that its functionality is available to a broader set of Flink users. + +This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward. + +{% toc %} + +The binary distribution and source artifacts are now available on the updated [Downloads page]({{ site.baseurl }}/downloads.html) of the Flink website, and the most recent distribution of PyFlink is available on [PyPI](https://pypi.org/project/apache-flink/). For more details, check the complete [release changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346364=Html=12315522) and the [updated documentation]({{ site.DOCS_BASE_URL }}flink-docs-release-1.11/flink-docs-release-1.11/). + +We encourage you to download the release and share your feedback with the community through the [Flink mailing lists](https://flink.apache.org/community.html#mailing-lists) or [JIRA](https://issues.apache.org/jira/projects/FLINK/summary). + +## New Features and Improvements + +### Unaligned Checkpoints (Beta) + +Triggering a checkpoint in Flink will cause a [checkpoint barrier]({{ site.DOCS_BASE_URL }}flink-docs-release-1.11/internals/stream_checkpointing.html#barriers) to flow from the sources of your topology all the way towards the sinks. For operators that receive more than one input stream, the barriers flowing through each channel need to be aligned before the operator can snapshot its state and forward the checkpoint barrier — typically, this alignment will take just a few milliseconds to complete, but it can become a bottleneck in backpressured pipelines as: + + * Checkpoint barriers will flow much slower through backpressured channels, effectively blocking the remaining channels and their upstream operators during checkpointing; + + * Slow checkpoint barrier propagation leads to longer checkpointing times and can, worst case, result in little to no progress in the application. + +To improve the performance of checkpointing under backpressure scenarios, the community is rolling out the first iteration of unaligned checkpoints ([FLIP-76](https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints)) with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1), this approach doesn’t wait for barrier alignment across input channels, instead allowing barriers to overtake in-flight records and forwarding them downstream before the synchronous part of the checkpoint takes place (Fig. 2). + + + + + + + + + + + + Fig.1: Aligned Checkpoints + + + + + + + + + Fig.2: Unaligned Checkpoints + + + + + + + + + +Because in-flight records have to be persisted as part of the snapshot, unaligned checkpoints will lead to increased checkpoints sizes. On the upside, **checkpointing times are heavily reduced**, so users will see more progress (even in unstable environments) as more up-to-date checkpoints will lighten the recovery process. You can learn more about the current limitations of unaligned checkpoints in the [documentation]({{ site.DOCS_BASE_URL }}flink-docs-release-1.11/concepts/stateful-stream-processing.html#unaligned-checkpointing), and track the improvement
[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples
flinkbot edited a comment on pull request #12770: URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859 ## CI report: * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4038) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples
flinkbot edited a comment on pull request #12770: URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859 ## CI report: * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4038) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples
flinkbot commented on pull request #12770: URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859 ## CI report: * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
flinkbot edited a comment on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022 ## CI report: * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4035) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples
flinkbot commented on pull request #12770: URL: https://github.com/apache/flink/pull/12770#issuecomment-649743424 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 (Thu Jun 25 18:22:35 UTC 2020) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-18200) Replace the deprecated interfaces with the new interfaces in the tests and examples
[ https://issues.apache.org/jira/browse/FLINK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-18200: --- Labels: pull-request-available (was: ) > Replace the deprecated interfaces with the new interfaces in the tests and > examples > --- > > Key: FLINK-18200 > URL: https://issues.apache.org/jira/browse/FLINK-18200 > Project: Flink > Issue Type: Improvement > Components: API / Python >Reporter: Dian Fu >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Currently, there are a few deprecated interfaces which are still heavily used > in the tests and examples, e.g. register_function, etc. We should improve the > tests and examples to use the new interfaces, such as > create_temporary_system_function instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] SteNicholas opened a new pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples
SteNicholas opened a new pull request #12770: URL: https://github.com/apache/flink/pull/12770 ## What is the purpose of the change *Currently, there are a few deprecated interfaces which are still heavily used in the tests and examples, e.g. register_function, etc. Deprecated interfaces should be replaced to improve the tests and examples to use the new interfaces, such as create_temporary_system_function instead.* ## Brief change log - *Modity tests and examples including calling method `insert_into`, `scan`, `sql_update`, `register_function`, `register_java_function` of `TableEnvironment`.* - *Modity tests and examples including calling method `insert_into` of `Table`.* - *Deprecate method `from_table_source`, `_from_elements`, `connect` of `TableEnvironment`.* ## Verifying this change - *Modity unit tests including calling method `insert_into`, `scan`, `sql_update`, `register_function`, `register_java_function` of `TableEnvironment`.* - *Modity unit tests including calling method `insert_into` of `Table`.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145692#comment-17145692 ] Stephan Ewen edited comment on FLINK-18433 at 6/25/20, 5:51 PM: I would not dismiss the runtime code, yet. The fact that it is both in MemStateBackend and RocksDBStateBackend would be quite well explained with more frequent stalls, for example during buffer allocation. If there are fewer buffers by default, or a different buffer allocation/acquisition code path now, I'd consider that a strong suspect. was (Author: stephanewen): I would not dismiss the runtime code so far. The fact that it is both in MemStateBackend and RocksDBStateBackend would be quite well explained with more frequent stalls, for example during buffer allocation. If there are fewer buffers by default, or a different buffer allocation/acquisition code path now, I'd consider that a strong suspect. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: >
[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145692#comment-17145692 ] Stephan Ewen edited comment on FLINK-18433 at 6/25/20, 5:51 PM: I would not dismiss the runtime code, yet. The fact that it is both in MemStateBackend and RocksDBStateBackend would be quite well explained with more frequent stalls, for example during buffer allocation. If there are fewer buffers by default, or a different buffer allocation/acquisition code path now, that would be a plausible cause. was (Author: stephanewen): I would not dismiss the runtime code, yet. The fact that it is both in MemStateBackend and RocksDBStateBackend would be quite well explained with more frequent stalls, for example during buffer allocation. If there are fewer buffers by default, or a different buffer allocation/acquisition code path now, I'd consider that a strong suspect. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: >
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145698#comment-17145698 ] Stephan Ewen commented on FLINK-18433: -- Other sources of stalls (like GC) of course as well. I don't recall us changing anything on the memory configuration. If we have more object allocation on the per-record (possibly per buffer) code paths, it would also be an explanation. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145692#comment-17145692 ] Stephan Ewen commented on FLINK-18433: -- I would not dismiss the runtime code so far. The fact that it is both in MemStateBackend and RocksDBStateBackend would be quite well explained with more frequent stalls, for example during buffer allocation. If there are fewer buffers by default, or a different buffer allocation/acquisition code path now, I'd consider that a strong suspect. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce
[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
flinkbot edited a comment on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512 ## CI report: * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN * 265d6eb7970325c88d2b3a9c77fa308be89641a9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4037) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'
[ https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145131#comment-17145131 ] David Anderson commented on FLINK-18422: [~RocMarshal] Nevermind my previous reply; I was mistaken. Assigning this to you. > Update Prefer tag in documentation 'Fault Tolerance training lesson' > > > Key: FLINK-18422 > URL: https://issues.apache.org/jira/browse/FLINK-18422 > Project: Flink > Issue Type: Improvement > Components: Documentation, Documentation / Training >Affects Versions: 1.10.0, 1.10.1 >Reporter: Roc Marshal >Priority: Minor > Labels: document, easyfix > Attachments: current_prefer_mode.png > > Original Estimate: 48h > Remaining Estimate: 48h > > Update Prefer tag in documentation 'Fault Tolerance training lesson' > according to > [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html]. > > The location is: docs/learn-flink/fault_tolerance.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'
[ https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Anderson reassigned FLINK-18422: -- Assignee: Roc Marshal > Update Prefer tag in documentation 'Fault Tolerance training lesson' > > > Key: FLINK-18422 > URL: https://issues.apache.org/jira/browse/FLINK-18422 > Project: Flink > Issue Type: Improvement > Components: Documentation, Documentation / Training >Affects Versions: 1.10.0, 1.10.1 >Reporter: Roc Marshal >Assignee: Roc Marshal >Priority: Minor > Labels: document, easyfix > Attachments: current_prefer_mode.png > > Original Estimate: 48h > Remaining Estimate: 48h > > Update Prefer tag in documentation 'Fault Tolerance training lesson' > according to > [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html]. > > The location is: docs/learn-flink/fault_tolerance.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'
[ https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Anderson updated FLINK-18422: --- Comment: was deleted (was: [~RocMarshal] The link tag does not apply to the cases you have highlighted. It's only used for creating links, not for embedded images.) > Update Prefer tag in documentation 'Fault Tolerance training lesson' > > > Key: FLINK-18422 > URL: https://issues.apache.org/jira/browse/FLINK-18422 > Project: Flink > Issue Type: Improvement > Components: Documentation, Documentation / Training >Affects Versions: 1.10.0, 1.10.1 >Reporter: Roc Marshal >Priority: Minor > Labels: document, easyfix > Attachments: current_prefer_mode.png > > Original Estimate: 48h > Remaining Estimate: 48h > > Update Prefer tag in documentation 'Fault Tolerance training lesson' > according to > [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html]. > > The location is: docs/learn-flink/fault_tolerance.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18422) Update Prefer tag in documentation 'Fault Tolerance training lesson'
[ https://issues.apache.org/jira/browse/FLINK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145127#comment-17145127 ] David Anderson commented on FLINK-18422: [~RocMarshal] The link tag does not apply to the cases you have highlighted. It's only used for creating links, not for embedded images. > Update Prefer tag in documentation 'Fault Tolerance training lesson' > > > Key: FLINK-18422 > URL: https://issues.apache.org/jira/browse/FLINK-18422 > Project: Flink > Issue Type: Improvement > Components: Documentation, Documentation / Training >Affects Versions: 1.10.0, 1.10.1 >Reporter: Roc Marshal >Priority: Minor > Labels: document, easyfix > Attachments: current_prefer_mode.png > > Original Estimate: 48h > Remaining Estimate: 48h > > Update Prefer tag in documentation 'Fault Tolerance training lesson' > according to > [Prefer|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html]. > > The location is: docs/learn-flink/fault_tolerance.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
flinkbot edited a comment on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022 ## CI report: * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034) * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4035) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
flinkbot edited a comment on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512 ## CI report: * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN * a4c5e8c756f9380734727742390ce88d7bc47372 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4036) * 265d6eb7970325c88d2b3a9c77fa308be89641a9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4037) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-17075) Add task status reconciliation between TM and JM
[ https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145062#comment-17145062 ] Chesnay Schepler commented on FLINK-17075: -- [~trohrmann] yes, this was just a mistake while copying; {{ExecutionIdsProvider#getExecutions}} accepts a ResourceID, as does {{ExecutionDeploymentTracker#startTracking}}. I've amended by comment accordingly, > Add task status reconciliation between TM and JM > > > Key: FLINK-17075 > URL: https://issues.apache.org/jira/browse/FLINK-17075 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Assignee: Chesnay Schepler >Priority: Critical > Fix For: 1.10.2, 1.12.0, 1.11.1 > > > In order to harden the TM and JM communication I suggest to let the > {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of > the heartbeat payload (similar to FLINK-11059). This would allow to reconcile > the states of both components in case that a status update message was lost > as described by a user on the ML. > https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17075) Add task status reconciliation between TM and JM
[ https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143818#comment-17143818 ] Chesnay Schepler edited comment on FLINK-17075 at 6/25/20, 4:14 PM: h3. Problem description: The {{JobMaster}} keeps track of the state of all {{Executions}} belonging to a job. After deployment, this tracking relies on updates from the {{TaskExecutors}}, transmitted via dedicated RPC messages. If one such message is lost then the tracked state may no longer match the actual one. In the worst case this prevents an {{Execution}} from ever reaching a terminal state, which in turn prevents the job from terminating. h3. Proposed solution: To prevent the worst case from happening, we propose that the {{TaskExecutor}} also submits a report of all currently deployed {{Tasks}} (identified by the {{ExecutionAttemptID}}) with each heartbeat. This allows us to detect discrepancies between the set of executions of the {{JobManager}} and {{TaskExecutor}}, and act accordingly. This in the end boils down to a comparison of 2 {{Set}}. If an execution exists only in the {{JobMaster}} set, then the execution was dropped by the {{TaskExecutor}}. This could imply a loss of a terminal state transition. We cannot determine which terminal state the task has reached, since all information was already cleaned up. In this case we will fail the execution in the {{ExecutionGraph}}, typically resulting in a restart. If an execution exists only in the {{TaskExecutor}} set, then some leftover task from a previous attempt is still running on the {{TaskExecutor}}. In this case we will cancel the task on the {{TaskExecutor}}. Running jobs are unaffected. If an execution exists in both sets, then we don't do anything. h4. Required changes: {{TaskExecutor}} -- The existing {{TaskSlotTable}} supports iterating over all {{Tasks}} for a given {{JobID}}, allowing us to extract the {{ExecutionAttemptID}}. >From this we generate a {{Set}}, and submit it via >heartbeats. {{JobMaster}} -- Here we need to be able to: a) (un)track actually deployed {{Executions}} c) cancel tasks on the {{TaskExecutor}} d) fail tasks in the {{ExecutionGraph}} These capabilities are split across 2 new components: 1) ExecutionDeploymentTracker 2) ExecutionDeploymentReconciler 1) The tracker lives in the Scheduler, with the following interface: {code} public interface ExecutionDeploymentTracker { void startTrackingDeployment(ExecutionAttemptID deployment, ResourceID host); void stopTrackingDeployment(ExecutionAttemptID deployment); Set getExecutions(ResourceID host); {code} It's basically a {{Set}}. The tracker is notified by the {{ExecutionGraph}} of deployed/finished executions through 2 new listeners: {code} public interface ExecutionDeploymentListener { void onCompletedDeployment(ExecutionAttemptID execution); } public interface ExecutionStateUpdateListener { void onStateUpdate(ExecutionAttemptID execution, ExecutionState newState); } {code} {{onCompletedDeployment}} is called in {{Execution#deploy}} when the deployment future completes; an implementation will initiate the tracking. {{onStateUpdate}} is called in {{Execution#transitionState}} on any successful state transition, an implementation will stop the tracking if the new state is a terminal one. Note: The deployment listener is required since there is no dedicated state for a deployed task; executions are switched to DEPLOYING, submitted to the {{TaskExecutor}}, and switched to running after an update from the {{TaskExecutor}}. Since this update can be lost we cannot rely on it. A dedicated DEPLOYED state would be preferable, but this would require too many changes to the {{ExecutionGraph}} at this time. 2) The reconciler lives in the {{JobMaster}} and uses the IDs provided by the tracker and {{TaskExecutor}} heartbeats to detect mismatches, and fire events accordingly. By defining a {{ReconciliationHandler}} the {{JobMaster}} can decide how each case should be handled: {code} public interface ExecutionDeploymentReconciler { // conceptual factory interface interface Factory { ExecutionDeploymentStateReconciler get(ReconciliationHandler trigger); } void reconcileExecutionStates(ResourceID origin, DeploymentReport deploymentReport, Set knownExecutionAttemptsIds); interface ExecutionIdsProvider { Set getExecutions(ResourceID host); } interface ReconciliationHandler { // fail the execution in the ExecutionGraph void onMissingDeployment(ExecutionAttemptID deployment); // cancel the task on the TaskExecutor void onUnknownDeployment(ExecutionAttemptID deployment, ResourceID
[GitHub] [flink] flinkbot edited a comment on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors
flinkbot edited a comment on pull request #12769: URL: https://github.com/apache/flink/pull/12769#issuecomment-649496339 ## CI report: * e4671572632d30c334c40d726409ea56152e2c1a Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4033) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
flinkbot edited a comment on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512 ## CI report: * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN * e180236be81cbad2b936db2f57e764d1243e980c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3976) * a4c5e8c756f9380734727742390ce88d7bc47372 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4036) * 265d6eb7970325c88d2b3a9c77fa308be89641a9 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18434) Can not select fields with JdbcCatalog
[ https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145003#comment-17145003 ] Jark Wu commented on FLINK-18434: - Thanks [~dwysakowicz]. I think this is a design flaw of {{Catalog}} interface. Maybe this is the time to have separate {{ReadOnlyCatalog}}. For temporary fix, I think the {{FunctionCatalog}} should call {{Catalog#functionExists(..)}} first, and {{JdbcCatalog#functionExists(..)}} should always return false. WDYT? > Can not select fields with JdbcCatalog > -- > > Key: FLINK-18434 > URL: https://issues.apache.org/jira/browse/FLINK-18434 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Dawid Wysakowicz >Priority: Blocker > Fix For: 1.12.0, 1.11.1 > > > A query which selects fields from a table will fail if we set the > PostgresCatalog as default. > Steps to reproduce: > # Create postgres catalog and set it as default > # Create any table (in any catalog) > # Query that table with {{SELECT field FROM t}} (Important it must be a > field name not '{{*}}' > # The query will fail > Stack trace: > {code} > org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL > statement. > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > Caused by: org.apache.flink.table.api.ValidationException: SQL validation > failed. null > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > ... 6 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261) > ~[?:?] > at > org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252] > at > org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at >
[GitHub] [flink] nielsbasjes commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
nielsbasjes commented on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-649607408 Note that this should be squashed into a single commit before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18427) Job failed under java 11
[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144996#comment-17144996 ] Zhijiang commented on FLINK-18427: -- Fully agree with [~sewen]'s above analysis. In addition, we also planned to back port this improvement https://issues.apache.org/jira/browse/FLINK-15962 to release-1.10 later, then it can also help to reduce the netty direct memory overhead somehow. > Job failed under java 11 > > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / Network >Affects Versions: 1.10.0 >Reporter: Zhang Hao >Priority: Critical > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. > > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716) > ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: > Direct buffer memory at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174) > ... 12 moreCaused
[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
flinkbot edited a comment on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022 ## CI report: * 2a146962687424a20b253ed3fcc42700e416375d Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935) * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034) * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4035) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] azagrebin edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
azagrebin edited a comment on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-649594733 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] azagrebin commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
azagrebin commented on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-649594733 @nielsbasjes The last suggestion looks good to me. Let's just avoid uppercase: `NOT` -> `not`. I agree, snapshot versions would make things cleaner, hopefully community will get to it soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] morsapaes opened a new pull request #352: Add 1.11 Release announcement.
morsapaes opened a new pull request #352: URL: https://github.com/apache/flink-web/pull/352 Adding a blogpost for the upcoming 1.11 release announcement. Feel free to comment and leave any feedback! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144952#comment-17144952 ] Arvid Heise commented on FLINK-18433: - Doing some bisect on a specific test would definitively help. Could you drive that [~Aihua]? Interestingly, AT_LEAST_ONCE suffered the most regression. Additional suspicion: it could be related to https://github.com/apache/flink/pull/12697 . Do we have a high DOP? How long are the jobs running btw? > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb >
[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
flinkbot edited a comment on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022 ## CI report: * 2a146962687424a20b253ed3fcc42700e416375d Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935) * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034) * 98a273694702b8878c6a6a337f4eb93c1ac2e5f4 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
flinkbot edited a comment on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512 ## CI report: * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN * e180236be81cbad2b936db2f57e764d1243e980c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3976) * a4c5e8c756f9380734727742390ce88d7bc47372 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] nielsbasjes commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
nielsbasjes commented on pull request #11245: URL: https://github.com/apache/flink/pull/11245#issuecomment-649561412 @azagrebin (Weird, cannot respond to your feedback regarding the comment with 'latest' directly...) I put the "over time" mention in because your application may work today but as soon as a new version is released or Scala 2.13 is added it will break in unexpected ways. How about this? ```# The 'latest' tag contains the latest released version of Flink for a specific Scala version. Do NOT use this in production as it will break your setup automatically when a new version is released.``` I still strongly believe we should simply NOT have the 'latest' tag at all and we should have docker images with SNAPSHOT versions which are pushed at the same time a SNAPSHOT build is pushed to maven. That would make this entire situation go away cleanly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] fsk119 commented on pull request #12758: [FLINK-18386][docs-zh] Translate "Print SQL Connector" page into Chinese
fsk119 commented on pull request #12758: URL: https://github.com/apache/flink/pull/12758#issuecomment-649547883 Thanks for your contribution @houmaozheng . Looks good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] liyubin117 commented on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
liyubin117 commented on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-649547234 @libenchao Thanks for your review, I have finished it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
flinkbot edited a comment on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022 ## CI report: * 2a146962687424a20b253ed3fcc42700e416375d Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935) * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4034) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12748: [FLINK-18324][docs-zh] Translate updated data type into Chinese
flinkbot edited a comment on pull request #12748: URL: https://github.com/apache/flink/pull/12748#issuecomment-647889022 ## CI report: * 2a146962687424a20b253ed3fcc42700e416375d Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3935) * 719a106bdb3e4aeced30673f43d7a2a0f90b2535 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] sjwiesman closed pull request #350: [hotfix] Documentation Style Guide: Sync + Correction
sjwiesman closed pull request #350: URL: https://github.com/apache/flink-web/pull/350 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] sjwiesman commented on pull request #350: [hotfix] Documentation Style Guide: Sync + Correction
sjwiesman commented on pull request #350: URL: https://github.com/apache/flink-web/pull/350#issuecomment-649526151 Whoops, yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog
[ https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-18434: - Priority: Blocker (was: Critical) > Can not select fields with JdbcCatalog > -- > > Key: FLINK-18434 > URL: https://issues.apache.org/jira/browse/FLINK-18434 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Dawid Wysakowicz >Priority: Blocker > Fix For: 1.12.0, 1.11.1 > > > A query which selects fields from a table will fail if we set the > PostgresCatalog as default. > Steps to reproduce: > # Create postgres catalog and set it as default > # Create any table (in any catalog) > # Query that table with {{SELECT field FROM t}} (Important it must be a > field name not '{{*}}' > # The query will fail > Stack trace: > {code} > org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL > statement. > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > Caused by: org.apache.flink.table.api.ValidationException: SQL validation > failed. null > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > ... 6 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261) > ~[?:?] > at > org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252] > at > org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73) >
[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog
[ https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-18434: - Fix Version/s: 1.11.1 > Can not select fields with JdbcCatalog > -- > > Key: FLINK-18434 > URL: https://issues.apache.org/jira/browse/FLINK-18434 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Dawid Wysakowicz >Priority: Critical > Fix For: 1.11.1 > > > A query which selects fields from a table will fail if we set the > PostgresCatalog as default. > Steps to reproduce: > # Create postgres catalog and set it as default > # Create any table (in any catalog) > # Query that table with {{SELECT field FROM t}} (Important it must be a > field name not '{{*}}' > # The query will fail > Stack trace: > {code} > org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL > statement. > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > Caused by: org.apache.flink.table.api.ValidationException: SQL validation > failed. null > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > ... 6 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261) > ~[?:?] > at > org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252] > at > org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73) >
[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog
[ https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-18434: - Fix Version/s: 1.12.0 > Can not select fields with JdbcCatalog > -- > > Key: FLINK-18434 > URL: https://issues.apache.org/jira/browse/FLINK-18434 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Dawid Wysakowicz >Priority: Critical > Fix For: 1.12.0, 1.11.1 > > > A query which selects fields from a table will fail if we set the > PostgresCatalog as default. > Steps to reproduce: > # Create postgres catalog and set it as default > # Create any table (in any catalog) > # Query that table with {{SELECT field FROM t}} (Important it must be a > field name not '{{*}}' > # The query will fail > Stack trace: > {code} > org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL > statement. > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > Caused by: org.apache.flink.table.api.ValidationException: SQL validation > failed. null > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > ... 6 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261) > ~[?:?] > at > org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252] > at > org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73) >
[jira] [Updated] (FLINK-18434) Can not select fields with JdbcCatalog
[ https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-18434: - Priority: Critical (was: Major) > Can not select fields with JdbcCatalog > -- > > Key: FLINK-18434 > URL: https://issues.apache.org/jira/browse/FLINK-18434 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Dawid Wysakowicz >Priority: Critical > > A query which selects fields from a table will fail if we set the > PostgresCatalog as default. > Steps to reproduce: > # Create postgres catalog and set it as default > # Create any table (in any catalog) > # Query that table with {{SELECT field FROM t}} (Important it must be a > field name not '{{*}}' > # The query will fail > Stack trace: > {code} > org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL > statement. > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) > [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > Caused by: org.apache.flink.table.api.ValidationException: SQL validation > failed. null > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98) > ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > ... 6 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261) > ~[?:?] > at > org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252] > at > org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
[jira] [Created] (FLINK-18434) Can not select fields with JdbcCatalog
Dawid Wysakowicz created FLINK-18434: Summary: Can not select fields with JdbcCatalog Key: FLINK-18434 URL: https://issues.apache.org/jira/browse/FLINK-18434 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.11.0 Reporter: Dawid Wysakowicz A query which selects fields from a table will fail if we set the PostgresCatalog as default. Steps to reproduce: # Create postgres catalog and set it as default # Create any table (in any catalog) # Query that table with {{SELECT field FROM t}} (Important it must be a field name not '{{*}}' # The query will fail Stack trace: {code} org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL statement. at org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100) ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91) ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] Caused by: org.apache.flink.table.api.ValidationException: SQL validation failed. null at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430) ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430) ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98) ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] ... 6 more Caused by: java.lang.UnsupportedOperationException at org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261) ~[?:?] at org.apache.flink.table.catalog.FunctionCatalog.resolvePreciseFunctionReference(FunctionCatalog.java:570) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.catalog.FunctionCatalog.lambda$resolveAmbiguousFunctionReference$2(FunctionCatalog.java:617) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at java.util.Optional.orElseGet(Optional.java:267) ~[?:1.8.0_252] at org.apache.flink.table.catalog.FunctionCatalog.resolveAmbiguousFunctionReference(FunctionCatalog.java:617) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.catalog.FunctionCatalog.lookupFunction(FunctionCatalog.java:370) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.flink.table.planner.catalog.FunctionCatalogOperatorTable.lookupOperatorOverloads(FunctionCatalogOperatorTable.java:99) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.calcite.sql.util.ChainedSqlOperatorTable.lookupOperatorOverloads(ChainedSqlOperatorTable.java:73) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.calcite.sql.validate.SqlValidatorImpl.makeNullaryCall(SqlValidatorImpl.java:1754) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5987) ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
[GitHub] [flink] flinkbot edited a comment on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors
flinkbot edited a comment on pull request #12769: URL: https://github.com/apache/flink/pull/12769#issuecomment-649496339 ## CI report: * e4671572632d30c334c40d726409ea56152e2c1a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4033) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors
flinkbot commented on pull request #12769: URL: https://github.com/apache/flink/pull/12769#issuecomment-649496339 ## CI report: * e4671572632d30c334c40d726409ea56152e2c1a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18349) Add Flink 1.11 release notes to documentation
[ https://issues.apache.org/jira/browse/FLINK-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski closed FLINK-18349. -- Resolution: Fixed Merged to master as e63345354e..6227fffbe6 release as 6ecf2d3096 > Add Flink 1.11 release notes to documentation > - > > Key: FLINK-18349 > URL: https://issues.apache.org/jira/browse/FLINK-18349 > Project: Flink > Issue Type: Task > Components: Documentation >Affects Versions: 1.11.0 >Reporter: Piotr Nowojski >Assignee: Piotr Nowojski >Priority: Critical > Labels: pull-request-available > Fix For: 1.11.0 > > > Gather, edit, and add Flink 1.10 release notes to documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] pnowojski merged pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
pnowojski merged pull request #12699: URL: https://github.com/apache/flink/pull/12699 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pnowojski commented on a change in pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
pnowojski commented on a change in pull request #12699: URL: https://github.com/apache/flink/pull/12699#discussion_r445498558 ## File path: docs/release-notes/flink-1.11.md ## @@ -0,0 +1,291 @@ +--- +title: "Release Notes - Flink 1.11" +--- + + + +These release notes discuss important aspects, such as configuration, behavior, +or dependencies, that changed between Flink 1.10 and Flink 1.11. Please read +these notes carefully if you are planning to upgrade your Flink version to 1.11. + +* This will be replaced by the TOC +{:toc} + +### Clusters & Deployment + Support for Hadoop 3.0.0 and higher ([FLINK-11086](https://issues.apache.org/jira/browse/FLINK-11086)) +Flink project does not provide any updated "flink-shaded-hadoop-*" jars. +Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH environment variable (recommended) or via `lib/` folder. +Also, the `include-hadoop` Maven profile has been removed. + + Removal of `LegacyScheduler` ([FLINK-15629](https://issues.apache.org/jira/browse/FLINK-15629)) +Flink no longer supports the legacy scheduler. +Hence, setting `jobmanager.scheduler: legacy` will no longer work and fail with an `IllegalArgumentException`. +The only valid option for `jobmanager.scheduler` is the default value `ng`. + + Bind user code class loader to lifetime of a slot ([FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408)) +The user code class loader is being reused by the `TaskExecutor` as long as there is at least a single slot allocated for the respective job. +This changes Flink's recovery behaviour slightly so that it will not reload static fields. +The benefit is that this change drastically reduces pressure on the JVM's metaspace. + + Replaced `slave` file name with `workers` ([FLINK-18307](https://issues.apache.org/jira/browse/FLINK-18307)) +For Standalone Setups, the file with the worker nodes is no longer called `slaves` but `workers`. +Previous setups that use the `start-cluster.sh` and `stop-cluster.sh` scripts need to rename that file. + + Flink Docker Integration Improvements +The examples of `Dockerfiles` and docker image `build.sh` scripts have been removed from [the Flink Github repository](https://github.com/apache/flink). The examples will no longer be maintained by community in the Flink Github repository, including the examples of integration with Bluemix. Therefore, the following modules have been deleted from the Flink Github repository: +- `flink-contrib/docker-flink` +- `flink-container/docker` +- `flink-container/kubernetes` + +Check the updated user documentation for [Flink Docker integration](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html) instead. It now describes in detail how to [use](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-a-flink-image) and [customize](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image) [the Flink official docker image](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#docker-hub-flink-images): configuration options, logging, plugins, adding more dependencies and installing software. The documentation also includes examples for Session and Job cluster deployments with: +- [docker run](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-flink-image) +- [docker compose](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-compose) +- [docker swarm](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-swarm) +- [standalone Kubernetes](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html) + +### Memory Management + New Flink Master Memory Model +# Overview +With [FLIP-116](https://cwiki.apache.org/confluence/display/FLINK/FLIP-116%3A+Unified+Memory+Configuration+for+Job+Managers), a new memory model has been introduced for the Flink Master. New configuration options have been introduced to control the memory consumption of the Flink Master process. This affects all types of deployments: standalone, YARN, Mesos, and the new active Kubernetes integration. + +Please, check the user documentation for [more details](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup_master.html). + +If you try to reuse your previous Flink configuration without any adjustments, the new memory model can result in differently computed memory parameters for the JVM and, thus, performance changes or even failures. See also [the migration guide](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration). + +# Deprecation and breaking changes +The following options are deprecated: + * `jobmanager.heap.size` + *
[GitHub] [flink] pnowojski commented on a change in pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
pnowojski commented on a change in pull request #12699: URL: https://github.com/apache/flink/pull/12699#discussion_r445496733 ## File path: docs/release-notes/flink-1.11.md ## @@ -0,0 +1,291 @@ +--- +title: "Release Notes - Flink 1.11" +--- + + + +These release notes discuss important aspects, such as configuration, behavior, +or dependencies, that changed between Flink 1.10 and Flink 1.11. Please read +these notes carefully if you are planning to upgrade your Flink version to 1.11. + +* This will be replaced by the TOC +{:toc} + +### Clusters & Deployment + Support for Hadoop 3.0.0 and higher ([FLINK-11086](https://issues.apache.org/jira/browse/FLINK-11086)) +Flink project does not provide any updated "flink-shaded-hadoop-*" jars. +Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH environment variable (recommended) or via `lib/` folder. +Also, the `include-hadoop` Maven profile has been removed. + + Removal of `LegacyScheduler` ([FLINK-15629](https://issues.apache.org/jira/browse/FLINK-15629)) +Flink no longer supports the legacy scheduler. +Hence, setting `jobmanager.scheduler: legacy` will no longer work and fail with an `IllegalArgumentException`. +The only valid option for `jobmanager.scheduler` is the default value `ng`. + + Bind user code class loader to lifetime of a slot ([FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408)) +The user code class loader is being reused by the `TaskExecutor` as long as there is at least a single slot allocated for the respective job. +This changes Flink's recovery behaviour slightly so that it will not reload static fields. +The benefit is that this change drastically reduces pressure on the JVM's metaspace. + + Replaced `slave` file name with `workers` ([FLINK-18307](https://issues.apache.org/jira/browse/FLINK-18307)) +For Standalone Setups, the file with the worker nodes is no longer called `slaves` but `workers`. +Previous setups that use the `start-cluster.sh` and `stop-cluster.sh` scripts need to rename that file. + + Flink Docker Integration Improvements +The examples of `Dockerfiles` and docker image `build.sh` scripts have been removed from [the Flink Github repository](https://github.com/apache/flink). The examples will no longer be maintained by community in the Flink Github repository, including the examples of integration with Bluemix. Therefore, the following modules have been deleted from the Flink Github repository: +- `flink-contrib/docker-flink` +- `flink-container/docker` +- `flink-container/kubernetes` + +Check the updated user documentation for [Flink Docker integration](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html) instead. It now describes in detail how to [use](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-a-flink-image) and [customize](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image) [the Flink official docker image](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#docker-hub-flink-images): configuration options, logging, plugins, adding more dependencies and installing software. The documentation also includes examples for Session and Job cluster deployments with: +- [docker run](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#how-to-run-flink-image) +- [docker compose](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-compose) +- [docker swarm](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#flink-with-docker-swarm) +- [standalone Kubernetes](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html) + +### Memory Management + New Flink Master Memory Model +# Overview +With [FLIP-116](https://cwiki.apache.org/confluence/display/FLINK/FLIP-116%3A+Unified+Memory+Configuration+for+Job+Managers), a new memory model has been introduced for the Flink Master. New configuration options have been introduced to control the memory consumption of the Flink Master process. This affects all types of deployments: standalone, YARN, Mesos, and the new active Kubernetes integration. + +Please, check the user documentation for [more details](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup_master.html). + +If you try to reuse your previous Flink configuration without any adjustments, the new memory model can result in differently computed memory parameters for the JVM and, thus, performance changes or even failures. See also [the migration guide](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration). + +# Deprecation and breaking changes +The following options are deprecated: + * `jobmanager.heap.size` + *
[GitHub] [flink] flinkbot commented on pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors
flinkbot commented on pull request #12769: URL: https://github.com/apache/flink/pull/12769#issuecomment-649482720 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit e4671572632d30c334c40d726409ea56152e2c1a (Thu Jun 25 11:29:10 UTC 2020) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-15414).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] curcur opened a new pull request #12769: [FLINK-15414] catch the right KafkaException for binding errors
curcur opened a new pull request #12769: URL: https://github.com/apache/flink/pull/12769 The right exception should be `org.apache.kafka.common.KafkaException` instead of `kafka.common.KafkaException` At least the "numTries" exception should be thrown instead of the binding exception. ## What is the purpose of the change Currently, binding exception is never caught in `KafkaTestEnvironmentImpl`, hence re-tries of different port allocation never happens. Use the right KafkaException in this fix. ## Brief change log catch `org.apache.kafka.common.KafkaException` instead of `kafka.common.KafkaException` when binding fails to enable retry logic. ## Verifying this change TBD ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-15414) KafkaITCase#prepare failed in travis
[ https://issues.apache.org/jira/browse/FLINK-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-15414: --- Labels: pull-request-available test-stability (was: test-stability) > KafkaITCase#prepare failed in travis > > > Key: FLINK-15414 > URL: https://issues.apache.org/jira/browse/FLINK-15414 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Tests >Affects Versions: 1.12.0 >Reporter: Dian Fu >Priority: Major > Labels: pull-request-available, test-stability > > The travis for release-1.9 failed with the following error: > {code} > org.apache.kafka.common.KafkaException: Socket server failed to bind to > 0.0.0.0:44867: Address already in use. > at > org.apache.flink.streaming.connectors.kafka.KafkaITCase.prepare(KafkaITCase.java:58) > Caused by: java.net.BindException: Address already in use > at > org.apache.flink.streaming.connectors.kafka.KafkaITCase.prepare(KafkaITCase.java:58) > {code} > instance: [https://api.travis-ci.org/v3/job/629636116/log.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] azagrebin commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
azagrebin commented on a change in pull request #11245: URL: https://github.com/apache/flink/pull/11245#discussion_r445467634 ## File path: docs/ops/deployment/kubernetes.md ## @@ -262,7 +262,7 @@ spec: spec: containers: - name: jobmanager -image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} +image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 'latest' tag contains the latest released version of Flink for a specific Scala version which will mismatch with your application over time.{% endif %} Review comment: We added a [note](https://github.com/apache/flink/pull/11245/files#diff-a530b60989203f9e1f03c64c57deb56cR49) for this which can be promoted to warning. I think the idea here is to keep the comment in this particular line (and others like this) brief enough as it is a code example. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
zentol commented on a change in pull request #11245: URL: https://github.com/apache/flink/pull/11245#discussion_r445462983 ## File path: docs/ops/deployment/kubernetes.md ## @@ -262,7 +262,7 @@ spec: spec: containers: - name: jobmanager -image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} +image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 'latest' tag contains the latest released version of Flink for a specific Scala version which will mismatch with your application over time.{% endif %} Review comment: I think we should be way more explicit about this; scala users should _never_ use `latest`; it is simply to easy to run into issues. If we release a newer release, with scala 2.13 support we suddenly break everything. I would explicitly phrase it like that, with a big warn sign: use it only if you are only using java, otherwise select a specific Flink version with a matching scala version. We should also look into providing a `latest_` tag to make this easier for scala users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
zentol commented on a change in pull request #11245: URL: https://github.com/apache/flink/pull/11245#discussion_r445462983 ## File path: docs/ops/deployment/kubernetes.md ## @@ -262,7 +262,7 @@ spec: spec: containers: - name: jobmanager -image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} +image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 'latest' tag contains the latest released version of Flink for a specific Scala version which will mismatch with your application over time.{% endif %} Review comment: I think we should be way more explicit about this; scala users should _never_ use `latest`; it is simply to easy to run into issues. If we release a newer release, with scala 2.13 support we suddenly break everything. I would explicitly phrase it like that; use it only if you are only using java, otherwise select a specific Flink version with a matching scala version. We should also look into providing a `latest_` tag to make this easier for scala users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface
[ https://issues.apache.org/jira/browse/FLINK-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rinkako closed FLINK-18432. --- Fix Version/s: 1.12.0 Resolution: Duplicate > add open and close methods for ElasticsearchSinkFunction interface > -- > > Key: FLINK-18432 > URL: https://issues.apache.org/jira/browse/FLINK-18432 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: rinkako >Priority: Major > Labels: usability > Fix For: 1.12.0 > > > Here comes a example story: we want to sink data to ES with a day-rolling > index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a > day-rolling postfix to add to the index pattern of `return > Requests.indexRequest().index("mydoc_"+postfix)` in a custom > `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable > class and must be a transient field of this `ElasticsearchSinkFunction`, and > it must be checked null every time we call `process` (since the field is > transient, it may be null at a distributed task manager), which can be done > at a `open` method only run once. > So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` > interface can handle this well, users can control their sink function > life-cycle more flexiblely. And, for compatibility, this two methods may have > a empty default implementation at `ElasticsearchSinkFunction` interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface
[ https://issues.apache.org/jira/browse/FLINK-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144821#comment-17144821 ] rinkako commented on FLINK-18432: - Thanks [~jark] , this is actually what I mean, and it already fixed in that issue. I'll close my issue. :D > add open and close methods for ElasticsearchSinkFunction interface > -- > > Key: FLINK-18432 > URL: https://issues.apache.org/jira/browse/FLINK-18432 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: rinkako >Priority: Major > Labels: usability > > Here comes a example story: we want to sink data to ES with a day-rolling > index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a > day-rolling postfix to add to the index pattern of `return > Requests.indexRequest().index("mydoc_"+postfix)` in a custom > `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable > class and must be a transient field of this `ElasticsearchSinkFunction`, and > it must be checked null every time we call `process` (since the field is > transient, it may be null at a distributed task manager), which can be done > at a `open` method only run once. > So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` > interface can handle this well, users can control their sink function > life-cycle more flexiblely. And, for compatibility, this two methods may have > a empty default implementation at `ElasticsearchSinkFunction` interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18427) Job failed under java 11
[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Zagrebin updated FLINK-18427: Component/s: (was: API / DataStream) (was: API / Core) Runtime / Network Runtime / Configuration > Job failed under java 11 > > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / Network >Reporter: Zhang Hao >Priority: Critical > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. > > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716) > ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: > Direct buffer memory at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174) > ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at > java.base/java.nio.Bits.reserveMemory(Bits.java:175) at >
[jira] [Updated] (FLINK-18427) Job failed under java 11
[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Zagrebin updated FLINK-18427: Affects Version/s: 1.10.0 > Job failed under java 11 > > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / Network >Affects Versions: 1.10.0 >Reporter: Zhang Hao >Priority: Critical > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. > > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716) > ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: > Direct buffer memory at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174) > ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at > java.base/java.nio.Bits.reserveMemory(Bits.java:175) at > java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at >
[GitHub] [flink] azagrebin commented on a change in pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version
azagrebin commented on a change in pull request #11245: URL: https://github.com/apache/flink/pull/11245#discussion_r445404435 ## File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/configuration/KubernetesConfigOptions.java ## @@ -137,11 +139,17 @@ .withDescription("The cluster-id, which should be no more than 45 characters, is used for identifying " + "a unique Flink cluster. If not set, the client will automatically generate it with a random ID."); + // The default container image that ties to the exact needed versions of both Flink and Scala. + public static final String DEFAULT_CONTAINER_IMAGE = "flink:" + EnvironmentInformation.getVersion() + "-scala_" + EnvironmentInformation.getScalaVersion(); + + @Documentation.OverrideDefault("The default value depends on the actually running version. In general it looks like \"flink_-scala_\"") Review comment: ```suggestion @Documentation.OverrideDefault("The default value depends on the actually running version. In general it looks like \"flink:-scala_\"") ``` ## File path: docs/ops/deployment/kubernetes.md ## @@ -262,7 +262,7 @@ spec: spec: containers: - name: jobmanager -image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} +image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest # The 'latest' tag contains the latest released version of Flink for a specific Scala version which will mismatch with your application over time.{% endif %} Review comment: How about this: ``` flink:latest # The 'latest' tag contains the latest released version of Flink for a specific Scala version which can conflict with the versions, used by your application. ``` Somehow this sounds confusing: `which will mismatch with your application over time`, in particular `over time`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144814#comment-17144814 ] Yu Li commented on FLINK-18433: --- In parallel with the manual analysis, does it worth to use binary search on benchmark with the most noticeable regression, i.e. {{TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap}} to locate the problematic commit? > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144812#comment-17144812 ] Yu Li commented on FLINK-18433: --- And IMHO the most noticeable regressions seem to be barrier alignment related: |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource
[jira] [Commented] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
[ https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144807#comment-17144807 ] Chesnay Schepler commented on FLINK-18430: -- I suppose the proper way would be to duplicate the interface in some API module, have the runtime version extend this new interface, and deprecate the runtime version. > Upgrade stability to @Public for CheckpointedFunction and CheckpointListener > > > Key: FLINK-18430 > URL: https://issues.apache.org/jira/browse/FLINK-18430 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Critical > Fix For: 1.11.0 > > > The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are > used by many users, but are still (for years now) marked as > {{@PublicEvolving}}. > I think this is not correct. They are very core to the DataStream API and are > used widely and should be treated as {{@Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144797#comment-17144797 ] Yu Li commented on FLINK-18433: --- First of all, are you using the latest release-1.11 branch, some RC tag, or some older release-1.11? I'd mainly like to know whether changes of FLINK-17800 is there in your test branch [~Aihua], thanks. Checking all the [changes|https://s.apache.org/b7xiw] in (latest) 1.11.0 but not in 1.10.0, I think the only suspect is FLINK-17865 (with a low chance, though). Maybe worth to try setting `state.backend.fs.memory-threshold` to `1 kb` and check the result [~Aihua]. btw, we've also watched the micro-benchmarks and didn't see any regression before, which indicates it worth to get FLINK-14917 in to guard our release. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m
[jira] [Commented] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface
[ https://issues.apache.org/jira/browse/FLINK-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144796#comment-17144796 ] Jark Wu commented on FLINK-18432: - I think this has already been done by FLINK-17623? > add open and close methods for ElasticsearchSinkFunction interface > -- > > Key: FLINK-18432 > URL: https://issues.apache.org/jira/browse/FLINK-18432 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: rinkako >Priority: Major > Labels: usability > > Here comes a example story: we want to sink data to ES with a day-rolling > index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a > day-rolling postfix to add to the index pattern of `return > Requests.indexRequest().index("mydoc_"+postfix)` in a custom > `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable > class and must be a transient field of this `ElasticsearchSinkFunction`, and > it must be checked null every time we call `process` (since the field is > transient, it may be null at a distributed task manager), which can be done > at a `open` method only run once. > So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` > interface can handle this well, users can control their sink function > life-cycle more flexiblely. And, for compatibility, this two methods may have > a empty default implementation at `ElasticsearchSinkFunction` interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 5a76928eaf6355d22fb655a821cb5f922f560fe2 UNKNOWN * ed389accd5133d94808984cbbe573cc89517beb1 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4026) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata
[ https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144776#comment-17144776 ] Aljoscha Krettek commented on FLINK-18150: -- Does this also happen when you don't have a producer in the Job. I want to rule out that you're running into FLINK-17327 before I look deeper into this. > A single failing Kafka broker may cause jobs to fail indefinitely with > TimeoutException: Timeout expired while fetching topic metadata > -- > > Key: FLINK-18150 > URL: https://issues.apache.org/jira/browse/FLINK-18150 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.10.1 > Environment: It is a bit unclear to me under what circumstances this > can be reproduced. I created a "minimum" non-working example at > https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the > minimum number of Kafka brokers, but it works just as well with replication > factor 3 and 8 brokers, e.g. > I run this with > {code:bash} > docker-compose kill; and docker-compose rm -vf; and docker-compose up > --abort-on-container-exit --build > {code} > The exception should appear on the webui after 5~6 minutes. > To make sure that this isn't dependent on my machine, I've also checked > reproducibility on a m5a.2xlarge EC2 instance. > You verify that the Kafka cluster is running "normally" e.g. with: > {code:bash} > kafkacat -b localhost,localhost:9093 -L > {code} > So far, I only know that > * {{flink.partition-discovery.interval-millis}} must be set. > * The broker that failed must be part of the {{bootstrap.servers}} > * There needs to be a certain amount of topics or producers, but I'm unsure > which is crucial > * Changing the values of {{metadata.request.timeout.ms}} or > {{flink.partition-discovery.interval-millis}} does not seem to have any > effect. >Reporter: Julius Michaelis >Priority: Major > > When a Kafka broker fails that is listed among the bootstrap servers and > partition discovery is active, the Flink job reading from that Kafka may > enter a failing loop. > At first, the job seems to react normally without failure with only a short > latency spike when switching Kafka leaders. > Then, it fails with a > {code:none} > org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException > at > org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182) > at > org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:821) > at > org.apache.flink.streaming.api.operators.StreamSource.cancel(StreamSource.java:147) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.cancelTask(SourceStreamTask.java:136) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:602) > at > org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1355) > at java.lang.Thread.run(Thread.java:748) > {code} > It recovers, but processes fewer than the expected amount of records. > Finally, the job fails with > {code:none} > 2020-06-05 13:59:37 > org.apache.kafka.common.errors.TimeoutException: Timeout expired while > fetching topic metadata > {code} > and repeats doing so while not processing any records. (The exception comes > without any backtrace or otherwise interesting information) > I have also observed this behavior with partition-discovery turned off, but > only when the Flink job failed (after a broker failure) and had to run > checkpoint recovery for some other reason. > Please see the [Environment] description for information on how to reproduce > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] aljoscha commented on pull request #12754: [FLINK-5717][streaming] Fix NPE on SessionWindows with ContinuousProc…
aljoscha commented on pull request #12754: URL: https://github.com/apache/flink/pull/12754#issuecomment-649407553 Feel free to fix `ContinuousEventTimeTrigger` and ping me in the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
[ https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144770#comment-17144770 ] Aljoscha Krettek commented on FLINK-18430: -- And yes, I see that moving {{CheckpointListener}} will break some user code but I think it's the better long-term solution, especially if we want to decouple our API/SDK from the runtime. > Upgrade stability to @Public for CheckpointedFunction and CheckpointListener > > > Key: FLINK-18430 > URL: https://issues.apache.org/jira/browse/FLINK-18430 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Critical > Fix For: 1.11.0 > > > The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are > used by many users, but are still (for years now) marked as > {{@PublicEvolving}}. > I think this is not correct. They are very core to the DataStream API and are > used widely and should be treated as {{@Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
[ https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144769#comment-17144769 ] Aljoscha Krettek commented on FLINK-18430: -- I agree that we should make more interfaces public to benefit from automated checking, however, {{CheckpointListener}} is in {{package org.apache.flink.runtime.state}} in the {{flink-runtime}} module. If we want to make it {{@Public}} we should 1) move it to an API module and 3) move it to an API package. Moving it next to {{CheckpointedFunction}} would be good, I think. Side note: {{flink-runtime}} has only three {{@Public}} classes. One of them is {{SerializableFunction}} which we just introduced for 1.11. We should probably also move that to an API package/module. > Upgrade stability to @Public for CheckpointedFunction and CheckpointListener > > > Key: FLINK-18430 > URL: https://issues.apache.org/jira/browse/FLINK-18430 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Critical > Fix For: 1.11.0 > > > The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are > used by many users, but are still (for years now) marked as > {{@PublicEvolving}}. > I think this is not correct. They are very core to the DataStream API and are > used widely and should be treated as {{@Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata
[ https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144768#comment-17144768 ] Julius Michaelis commented on FLINK-18150: -- My current guess is that this relates to the way that the Kafka client [tries to pick a broker|https://github.com/apache/kafka/blob/2.2.0/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L643] for retrieving the metadata from: {{request.timeout.ms}} defaults to 3 but {{default.api.timeout.ms}} and {{max.block.ms}} default to 6. Hence, only two attempts are made. However, if retrieving metadata from one broker failed, that broker may be retired, leading to at least a few subtasks attempting to get metadata from a dead broker twice, failing the task. A log of the situation, for reference: {code:none} [Producer clientId=producer-202] Removing node kafka1:9091 (id: -1 rack: null) from least loaded node selection: is-blacked-out: false, in-flight-requests: 0 [Producer clientId=producer-202] Found least loaded node kafka2:9091 (id: -2 rack: null) [Producer clientId=producer-480] Removing node kafka1:9091 (id: -1 rack: null) from least loaded node selection: is-blacked-out: false, in-flight-requests: 0 [Producer clientId=producer-480] Found least loaded node kafka2:9091 (id: -2 rack: null) [Producer clientId=producer-158] Removing node kafka1:9091 (id: -1 rack: null) from least loaded node selection: is-blacked-out: false, in-flight-requests: 0 [Producer clientId=producer-158] Found least loaded node kafka2:9091 (id: -2 rack: null) [Producer clientId=producer-285] Removing node kafka2:9091 (id: -2 rack: null) from least loaded node selection: is-blacked-out: false, in-flight-requests: 0 [Producer clientId=producer-285] Found least loaded node kafka1:9091 (id: -1 rack: null) [Producer clientId=producer-321] Removing node kafka2:9091 (id: -2 rack: null) from least loaded node selection: is-blacked-out: false, in-flight-requests: 0 [Producer clientId=producer-321] Found least loaded node kafka1:9091 (id: -1 rack: null) [Producer clientId=producer-477] Removing node kafka1:9091 (id: -1 rack: null) from least loaded node selection: is-blacked-out: false, in-flight-requests: 0 [Producer clientId=producer-477] Found least loaded node kafka2:9091 (id: -2 rack: null) {code} I've played around a bit, and setting {code:yaml} session.timeout.ms: 5000 request.timeout.ms: 1 default.api.timeout.ms: 18 {code} on the consumer and {code:yaml} default.api.timeout.ms: 18 max.block.ms: 181000 {code} on the producer seem to make the problem go away. But I'm puzzled as to why. > A single failing Kafka broker may cause jobs to fail indefinitely with > TimeoutException: Timeout expired while fetching topic metadata > -- > > Key: FLINK-18150 > URL: https://issues.apache.org/jira/browse/FLINK-18150 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.10.1 > Environment: It is a bit unclear to me under what circumstances this > can be reproduced. I created a "minimum" non-working example at > https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the > minimum number of Kafka brokers, but it works just as well with replication > factor 3 and 8 brokers, e.g. > I run this with > {code:bash} > docker-compose kill; and docker-compose rm -vf; and docker-compose up > --abort-on-container-exit --build > {code} > The exception should appear on the webui after 5~6 minutes. > To make sure that this isn't dependent on my machine, I've also checked > reproducibility on a m5a.2xlarge EC2 instance. > You verify that the Kafka cluster is running "normally" e.g. with: > {code:bash} > kafkacat -b localhost,localhost:9093 -L > {code} > So far, I only know that > * {{flink.partition-discovery.interval-millis}} must be set. > * The broker that failed must be part of the {{bootstrap.servers}} > * There needs to be a certain amount of topics or producers, but I'm unsure > which is crucial > * Changing the values of {{metadata.request.timeout.ms}} or > {{flink.partition-discovery.interval-millis}} does not seem to have any > effect. >Reporter: Julius Michaelis >Priority: Major > > When a Kafka broker fails that is listed among the bootstrap servers and > partition discovery is active, the Flink job reading from that Kafka may > enter a failing loop. > At first, the job seems to react normally without failure with only a short > latency spike when switching Kafka leaders. > Then, it fails with a > {code:none} > org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException > at >
[GitHub] [flink] azagrebin commented on a change in pull request #12690: [FLINK-18186][doc] Various updates on standalone kubernetes document
azagrebin commented on a change in pull request #12690: URL: https://github.com/apache/flink/pull/12690#discussion_r445413795 ## File path: docs/ops/deployment/kubernetes.md ## @@ -264,6 +264,24 @@ spec: component: jobmanager {% endhighlight %} +`taskmanager-query-state-service.yaml`. Optional service, that exposes the taskmanager `query-state` port as public Kubernetes node's port. Review comment: ```suggestion `taskmanager-query-state-service.yaml`. Optional service, that exposes the taskmanager port to access the queryable state as a public Kubernetes node's port. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] azagrebin commented on a change in pull request #12690: [FLINK-18186][doc] Various updates on standalone kubernetes document
azagrebin commented on a change in pull request #12690: URL: https://github.com/apache/flink/pull/12690#discussion_r445413795 ## File path: docs/ops/deployment/kubernetes.md ## @@ -264,6 +264,24 @@ spec: component: jobmanager {% endhighlight %} +`taskmanager-query-state-service.yaml`. Optional service, that exposes the taskmanager `query-state` port as public Kubernetes node's port. Review comment: ```suggestion `taskmanager-query-state-service.yaml`. Optional service, that exposes the TaskManager port to access the queryable state as a public Kubernetes node's port. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] azagrebin commented on a change in pull request #12690: [FLINK-18186][doc] Various updates on standalone kubernetes document
azagrebin commented on a change in pull request #12690: URL: https://github.com/apache/flink/pull/12690#discussion_r445412890 ## File path: docs/ops/deployment/kubernetes.md ## @@ -264,6 +264,24 @@ spec: component: jobmanager {% endhighlight %} +`taskmanager-query-state-service.yaml`. Optional service, that exposes the taskmanager `query-state` port as public Kubernetes node's port. Review comment: should we mention this service in `## Deploy Flink cluster on Kubernetes` chapter? like we describe `jobmanager-rest-service.yaml`: - after description of access REST API, we could mention that the `query-state` port can be accessed in a similar way. - add the optional `kubectl create/delete -f taskmanager-query-state-service.yaml` command This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144762#comment-17144762 ] Piotr Nowojski commented on FLINK-18433: {quote} Could this have anything to do with the n-ary input operator? {quote} Almost definitely not - n-ary input operator/task are independent beings, not sharing the same code path. For the same reasons as [~AHeise] I would rather suspect something that wasn't covered by our manual cluster tests or JMH. Some more elaborate state backend access or cluster setups (like memory). Also similar performance drop across the board, including both heap and rocksdb would rather suggest to me, that's the regression is not in the runtime code. Performance regressions in the runtime code around ~10% when using heap state backend shouldn't be visible when using RocksDB. That also suggests things like memory settings. I'm OoO so I won't be able to investigate this. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: >
[jira] [Closed] (FLINK-18417) Support List as a conversion class for ARRAY
[ https://issues.apache.org/jira/browse/FLINK-18417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther closed FLINK-18417. Fix Version/s: 1.12.0 Resolution: Fixed Fixed in 1.12.0: 6834ed181aa9edda6b9c7fb61696de93170a88d1 > Support List as a conversion class for ARRAY > > > Key: FLINK-18417 > URL: https://issues.apache.org/jira/browse/FLINK-18417 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Currently we don't support {{List}} as a conversion class in the new type > system. However, there are multiple reasons why we should support this > conversion: > 1) Hive uses {{List}} as the default for representing arrays e.g. in > functions. > 2) The new Expression DSL supports converting lists to array literals already. > 3) The list interface is essential for Java users and part of many structured > types. > 4) We need to represent lists in {{ListView}} for aggregate functions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] twalthr closed pull request #12765: [FLINK-18417][table] Support List as a conversion class for ARRAY
twalthr closed pull request #12765: URL: https://github.com/apache/flink/pull/12765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-17075) Add task status reconciliation between TM and JM
[ https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144759#comment-17144759 ] Till Rohrmann commented on FLINK-17075: --- Thanks for creating this proposal [~chesnay]. It sounds good to me. I think we need the {{ExecutionIdsProvider}} to return the {{Executions}} which are deployed to a given {{ResourceID}}. > Add task status reconciliation between TM and JM > > > Key: FLINK-17075 > URL: https://issues.apache.org/jira/browse/FLINK-17075 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Assignee: Chesnay Schepler >Priority: Critical > Fix For: 1.10.2, 1.12.0, 1.11.1 > > > In order to harden the TM and JM communication I suggest to let the > {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of > the heartbeat payload (similar to FLINK-11059). This would allow to reconcile > the states of both components in case that a status update message was lost > as described by a user on the ML. > https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144757#comment-17144757 ] Till Rohrmann commented on FLINK-18433: --- Pulling in [~liyu] who might now more about possible state backend changes. > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15416) Add Retry Mechanism for PartitionRequestClientFactory.ConnectingChannel
[ https://issues.apache.org/jira/browse/FLINK-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan updated FLINK-15416: -- Release Note: new configuration parameter: taskmanager.network.retries > Add Retry Mechanism for PartitionRequestClientFactory.ConnectingChannel > --- > > Key: FLINK-15416 > URL: https://issues.apache.org/jira/browse/FLINK-15416 > Project: Flink > Issue Type: Wish > Components: Runtime / Network >Affects Versions: 1.10.0 >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We run a flink with 256 TMs in production. The job internally has keyby > logic. Thus, it builds a 256 * 256 communication channels. An outage happened > when there is a chip internal link of one of the network switchs broken that > connecting these machines. During the outage, the flink can't restart > successfully as there is always an exception like "Connecting the channel > failed: Connecting to remote task manager + '/10.14.139.6:41300' has > failed. This might indicate that the remote task manager has been lost. > After deep investigation with the network infrastructure team, we found there > are 6 switchs connecting with these machines. Each switch has 32 physcal > links. Every socket is round-robin assigned to each of links for load > balances. Thus, there is always average 256 * 256 / 6 * 32 * 2 = 170 > channels will be assigned to the broken link. The issue lasted for 4 hours > until we found the broken link and restart the problematic switch. > Given this, we found that the retry of creating channel will help to resolve > this issue. For our networking topology, we can set retry to 2. As 170 / (132 > * 132) < 1, which means after retry twice no channel in 170 channels will be > assigned to the broken link in the average case. > I think it is valuable fix for this kind of partial network partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17073) Slow checkpoint cleanup causing OOMs
[ https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144726#comment-17144726 ] Etienne Chauchot edited comment on FLINK-17073 at 6/25/20, 8:04 AM: [~trohrmann], I wrote [this|[https://s.apache.org/checkpoint-backpressure]] FLIP style design document for checkpoint backpressure. Can you tell me what you think? Also I don't have the rights to create FLIP design documents in flink confluence workspace so I did the FLIP in a google doc. Can you give me the rights? was (Author: echauchot): [~trohrmann], I wrote [this|[https://s.apache.org/checkpoint-backpressure]] FLIP style design document for checkpoint backpressure. Can you tell me what you think? Also I don't have the rights to create FLIP design documents in flink confluence workspace so I did the FLIP in a google doc. Can you give me the rights? > Slow checkpoint cleanup causing OOMs > > > Key: FLINK-17073 > URL: https://issues.apache.org/jira/browse/FLINK-17073 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.12.0 > > > A user reported that he sees a decline in checkpoint cleanup speed when > upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup > tasks are waiting in the execution queue occupying memory. Ultimately, the JM > process dies with an OOM. > Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is > used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the > {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max > parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as > CPU cores. This change might have caused the decline in completed checkpoint > discard throughput. This suspicion needs to be validated before trying to fix > it! > [1] > https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17073) Slow checkpoint cleanup causing OOMs
[ https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144733#comment-17144733 ] Etienne Chauchot commented on FLINK-17073: -- I'm always glad to help ! No problem about syncing with [~pnowojski] and [~SleePy]. Not an trivial problem indeed, but I figured out that it could be a good way to learn quite a lot about Flink internals :) > Slow checkpoint cleanup causing OOMs > > > Key: FLINK-17073 > URL: https://issues.apache.org/jira/browse/FLINK-17073 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.12.0 > > > A user reported that he sees a decline in checkpoint cleanup speed when > upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup > tasks are waiting in the execution queue occupying memory. Ultimately, the JM > process dies with an OOM. > Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is > used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the > {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max > parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as > CPU cores. This change might have caused the decline in completed checkpoint > discard throughput. This suspicion needs to be validated before trying to fix > it! > [1] > https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink-web] morsapaes commented on pull request #350: [hotfix] Documentation Style Guide: Sync + Correction
morsapaes commented on pull request #350: URL: https://github.com/apache/flink-web/pull/350#issuecomment-649336113 @sjwiesman , can this PR be closed? The changes seem to have been merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on a change in pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
tillrohrmann commented on a change in pull request #12699: URL: https://github.com/apache/flink/pull/12699#discussion_r445373126 ## File path: docs/release-notes/flink-1.11.md ## @@ -0,0 +1,220 @@ +--- +title: "Release Notes - Flink 1.11" +--- + + + +These release notes discuss important aspects, such as configuration, behavior, +or dependencies, that changed between Flink 1.10 and Flink 1.11. Please read +these notes carefully if you are planning to upgrade your Flink version to 1.11. + +* This will be replaced by the TOC +{:toc} + +### Clusters & Deployment + Removal of `LegacyScheduler` ([FLINK-15629](https://issues.apache.org/jira/browse/FLINK-15629)) +Flink no longer supports the legacy scheduler. +Hence, setting `jobmanager.scheduler: legacy` will no longer work and fail with an `IllegalArgumentException`. +The only valid option for `jobmanager.scheduler` is the default value `ng`. + + Bind user code class loader to lifetime of a slot ([FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408)) +The user code class loader is being reused by the `TaskExecutor` as long as there is at least a single slot allocated for the respective job. +This changes Flink's recovery behaviour slightly so that it will not reload static fields. +The benefit is that this change drastically reduces pressure on the JVM's metaspace. + +### Memory Management + Removal of deprecated mesos.resourcemanager.tasks.mem ([FLINK-15198](https://issues.apache.org/jira/browse/FLINK-15198)) + +The `mesos.resourcemanager.tasks.mem` option, deprecated in 1.10 in favour of `taskmanager.memory.process.size`, has been completely removed and will have no effect anymore in 1.11+. + +### Table API & SQL + Changed packages of `TableEnvironment` ([FLINK-15947](https://issues.apache.org/jira/browse/FLINK-15947)) +FLINK-15947Finish moving scala expression DSL to flink-table-api-scala + +Due to various issues with packages `org.apache.flink.table.api.scala/java` all classes from those packages were relocated. +Moreover the scala expressions were moved to `org.apache.flink.table.api` as anounced in Flink 1.9. + +If you used one of: +* `org.apache.flink.table.api.java.StreamTableEnvironment` +* `org.apache.flink.table.api.scala.StreamTableEnvironment` +* `org.apache.flink.table.api.java.BatchTableEnvironment` +* `org.apache.flink.table.api.scala.BatchTableEnvironment` + +And you do not convert to/from DataStream switch to: +* `org.apache.flink.table.api.TableEnvironment` + +If you do convert to/from DataStream/DataSet change your imports to one of: +* `org.apache.flink.table.api.bridge.java.StreamTableEnvironment` +* `org.apache.flink.table.api.bridge.scala.StreamTableEnvironment` +* `org.apache.flink.table.api.bridge.java.BatchTableEnvironment` +* `org.apache.flink.table.api.bridge.scala.BatchTableEnvironment` + +For the Scala expressions use the import: +* `org.apache.flink.table.api._` instead of `org.apache.flink.table.api.bridge.scala._` + +Additionally if you use Scala's implicit conversions to/from DataStream/DataSet import `org.apache.flink.table.api.bridge.scala._` instead of `org.apache.flink.table.api.scala._` + + Removal of deprecated `StreamTableSink` ([FLINK-16362](https://issues.apache.org/jira/browse/FLINK-16362)) +The existing `StreamTableSink` implementations should remove emitDataStream method. + + Removal of `BatchTableSink#emitDataSet` ([FLINK-16535](https://issues.apache.org/jira/browse/FLINK-16535)) +The existing `BatchTableSink` implementations should rename `emitDataSet` to `consumeDataSet` and return `DataSink`. + + Corrected execution behavior of TableEnvironment.execute() and StreamTableEnvironment.execute() ([FLINK-16363](https://issues.apache.org/jira/browse/FLINK-16363)) + +In previous versions, `TableEnvironment.execute()` and `StreamExecutionEnvironment.execute()` can both trigger table and DataStream programs. +Since Flink 1.11.0, table programs can only be triggered by `TableEnvironment.execute()`. +Once table program is converted into DataStream program (through `toAppendStream()` or `toRetractStream()` method), it can only be triggered by `StreamExecutionEnvironment.execute()`. + + Corrected execution behavior of ExecutionEnvironment.execute() and BatchTableEnvironment.execute() ([FLINK-17126](https://issues.apache.org/jira/browse/FLINK-17126)) + +In previous versions, `BatchTableEnvironment.execute()` and `ExecutionEnvironment.execute()` can both trigger table and DataSet programs for legacy batch planner. +Since Flink 1.11.0, batch table programs can only be triggered by `BatchEnvironment.execute()`. +Once table program is converted into DataSet program (through `toDataSet()` method), it can only be triggered by `ExecutionEnvironment.execute()`. + +### Configuration + + Renamed log4j-yarn-session.properties and logback-yarn.xml properties files
[GitHub] [flink-web] morsapaes opened a new pull request #350: [hotfix] Documentation Style Guide: Sync + Correction
morsapaes opened a new pull request #350: URL: https://github.com/apache/flink-web/pull/350 Syncing the ZH version of the Documentation Style Guide with a recent change to the english version and correcting the naming guidelines under "Repository". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] morsapaes closed pull request #350: [hotfix] Documentation Style Guide: Sync + Correction
morsapaes closed pull request #350: URL: https://github.com/apache/flink-web/pull/350 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-17073) Slow checkpoint cleanup causing OOMs
[ https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144726#comment-17144726 ] Etienne Chauchot commented on FLINK-17073: -- [~trohrmann], I wrote [this|[https://s.apache.org/checkpoint-backpressure]] FLIP style design document for checkpoint backpressure. Can you tell me what you think? Also I don't have the rights to create FLIP design documents in flink confluence workspace so I did the FLIP in a google doc. Can you give me the rights? > Slow checkpoint cleanup causing OOMs > > > Key: FLINK-17073 > URL: https://issues.apache.org/jira/browse/FLINK-17073 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.12.0 > > > A user reported that he sees a decline in checkpoint cleanup speed when > upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup > tasks are waiting in the execution queue occupying memory. Ultimately, the JM > process dies with an OOM. > Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is > used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the > {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max > parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as > CPU cores. This change might have caused the decline in completed checkpoint > discard throughput. This suspicion needs to be validated before trying to fix > it! > [1] > https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144723#comment-17144723 ] Arvid Heise edited comment on FLINK-18433 at 6/25/20, 7:47 AM: --- Some possible causes from the network stack: - We added a few additional statements in InputGate/ResultPartition to support Unaligned Checkpoints (UC). https://issues.apache.org/jira/browse/FLINK-14551 They will be even executed when UC is disabled. However, it would be odd if JIT wouldn't optimize that out. - (We now use fewer exclusive network buffers by default). (doesn't seem to be fully merged yet, but there might be some related things merged already https://issues.apache.org/jira/browse/FLINK-16428) - I also thought non-blocking output might affect it, but afaik everything was already in 1.10.0. https://issues.apache.org/jira/browse/FLINK-14553 btw we see no regression in our micro benchmarks and our cluster [benchmarks|https://docs.google.com/spreadsheets/d/18GO15zO-WI2EzK0fTkWucMmAGicmi0mGjxHlHgzitHQ/edit#gid=0] with [Throughput|https://github.com/dataArtisans/performance/blob/master/flink-jobs/src/main/java/com/github/projectflink/streaming/Throughput.java] job rather suggest that we gained performance. So before diving deeper, I'd rather like to rule out state backend related changes first. was (Author: aheise): Some possible causes from the network stack: - We added a few additional statements in InputGate/ResultPartition to support Unaligned Checkpoints (UC). https://issues.apache.org/jira/browse/FLINK-14551 They will be even executed when UC is disabled. However, it would be odd if JIT wouldn't optimize that out. - (We now use fewer exclusive network buffers by default). (doesn't seem to be fully merged yet, but there might be some related things merged already https://issues.apache.org/jira/browse/FLINK-16428) - I also thought non-blocking output might affect it, but afaik everything was already in 1.10.0. https://issues.apache.org/jira/browse/FLINK-14553 > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| >
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144723#comment-17144723 ] Arvid Heise commented on FLINK-18433: - Some possible causes from the network stack: - We added a few additional statements in InputGate/ResultPartition to support Unaligned Checkpoints (UC). https://issues.apache.org/jira/browse/FLINK-14551 They will be even executed when UC is disabled. However, it would be odd if JIT wouldn't optimize that out. - (We now use fewer exclusive network buffers by default). (doesn't seem to be fully merged yet, but there might be some related things merged already https://issues.apache.org/jira/browse/FLINK-16428) - I also thought non-blocking output might affect it, but afaik everything was already in 1.10.0. https://issues.apache.org/jira/browse/FLINK-14553 > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: >
[GitHub] [flink] tillrohrmann closed pull request #11852: [FLINK-17300] Log the lineage information between ExecutionAttemptID …
tillrohrmann closed pull request #11852: URL: https://github.com/apache/flink/pull/11852 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-17300) Log the lineage information between ExecutionAttemptID and AllocationID
[ https://issues.apache.org/jira/browse/FLINK-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-17300. - Fix Version/s: (was: 1.11.0) 1.12.0 Resolution: Fixed Fixed via 7e48549b822bc54f8dc0a5e1f9cbb5f3156fda06 > Log the lineage information between ExecutionAttemptID and AllocationID > --- > > Key: FLINK-17300 > URL: https://issues.apache.org/jira/browse/FLINK-17300 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Assignee: Yangze Guo >Priority: Minor > Labels: pull-request-available > Fix For: 1.12.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15693) Stop receiving incoming RPC messages when RpcEndpoint is closing
[ https://issues.apache.org/jira/browse/FLINK-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144719#comment-17144719 ] Till Rohrmann commented on FLINK-15693: --- How exactly would this work [~Anglenet]? I think a {{RpcEndpoint}} uses the {{AkkaInvocationHandler}} to send itself messages (e.g. {{runAsync}} or when using the {{MainThreadExecutor}}). > Stop receiving incoming RPC messages when RpcEndpoint is closing > > > Key: FLINK-15693 > URL: https://issues.apache.org/jira/browse/FLINK-15693 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.9.1, 1.10.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.11.0 > > > When calling {{RpcEndpoint#closeAsync()}}, the system triggers > {{RpcEndpoint#onStop}} and transitions the endpoint into the > {{TerminatingState}}. In order to allow asynchronous clean up operations, the > main thread executor is not shut down immediately. As a side effect, the > {{RpcEndpoint}} still accepts incoming RPC messages from other components. > I think it would be cleaner to no longer accept incoming RPC messages once we > are in the {{TerminatingState}}. That way we would not worry about the > internal state of the {{RpcEndpoint}} when processing RPC messages (similar > to > [here|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L952]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17073) Slow checkpoint cleanup causing OOMs
[ https://issues.apache.org/jira/browse/FLINK-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144715#comment-17144715 ] Till Rohrmann commented on FLINK-17073: --- Yes, your help is highly appreciated. Given that this is not a trivial problem I would suggest to sync with [~pnowojski] and [~SleePy] about the next steps. > Slow checkpoint cleanup causing OOMs > > > Key: FLINK-17073 > URL: https://issues.apache.org/jira/browse/FLINK-17073 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.7.3, 1.8.0, 1.9.0, 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.12.0 > > > A user reported that he sees a decline in checkpoint cleanup speed when > upgrading from Flink 1.7.2 to 1.10.0. The result is that a lot of cleanup > tasks are waiting in the execution queue occupying memory. Ultimately, the JM > process dies with an OOM. > Compared to Flink 1.7.2, we introduced a dedicated {{ioExecutor}} which is > used by the {{HighAvailabilityServices}} (FLINK-11851). Before, we use the > {{AkkaRpcService}} thread pool which was a {{ForkJoinPool}} with a max > parallelism of 64. Now it is a {{FixedThreadPool}} with as many threads as > CPU cores. This change might have caused the decline in completed checkpoint > discard throughput. This suspicion needs to be validated before trying to fix > it! > [1] > https://lists.apache.org/thread.html/r390e5d775878918edca0b6c9f18de96f828c266a888e34ed30ce8494%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18168) Error results when use UDAF with Object Array return type
[ https://issues.apache.org/jira/browse/FLINK-18168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther closed FLINK-18168. Fix Version/s: 1.11.1 1.12.0 1.10.2 Resolution: Fixed Fixed in 1.12.0: 581fabe7df12961d20753dff8947dec07cbb2c56 Fixed in 1.11.1: f5ac8c352b7fb5aff5a78cffa348f72bd8492509 Fixed in 1.10.2: 23ad7e33117b7c02b28fda77596b12668b5117c1 > Error results when use UDAF with Object Array return type > - > > Key: FLINK-18168 > URL: https://issues.apache.org/jira/browse/FLINK-18168 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime >Reporter: Zou >Assignee: Zou >Priority: Major > Labels: pull-request-available > Fix For: 1.10.2, 1.12.0, 1.11.1 > > > I get error results when I use an UDAF with Object Array return type (e.g. > Row[]). I find that the problem is we reuse 'reuseArray' as the return value > of ObjectArrayConverter.toBinaryArray(). It leads to 'prevAggValue' and > 'newAggValue' in GroupAggFunction.processElement() contains exactly the same > BinaryArray, so 'equaliser.equalsWithoutHeader(prevAggValue, newAggValue)' is > always true. -- This message was sent by Atlassian Jira (v8.3.4#803005)