[jira] [Created] (FLINK-21505) Enforce common savepoint format at the operator level
Aljoscha Krettek created FLINK-21505: Summary: Enforce common savepoint format at the operator level Key: FLINK-21505 URL: https://issues.apache.org/jira/browse/FLINK-21505 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Currently, we are relying on the fact that all keyed backends would use the same strategy for savepoints. We should be forcing them at the API level to ensure that all exiting and future state backends will creat savepoints in the same format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21151) Extract common full-snapshot writer from RocksDB full-snapshot strategy
Aljoscha Krettek created FLINK-21151: Summary: Extract common full-snapshot writer from RocksDB full-snapshot strategy Key: FLINK-21151 URL: https://issues.apache.org/jira/browse/FLINK-21151 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek As described in FLIP-41, the RocksDB full-snapshot format will serve as the common, unified savepoint format. We need to extract the common parts and make them reusable by other state backends. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20843) UnalignedCheckpointITCase is unstable
Aljoscha Krettek created FLINK-20843: Summary: UnalignedCheckpointITCase is unstable Key: FLINK-20843 URL: https://issues.apache.org/jira/browse/FLINK-20843 Project: Flink Issue Type: Bug Components: Runtime / Network Reporter: Aljoscha Krettek https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=493=logs=6e55a443-5252-5db5-c632-109baf464772=9df6efca-61d0-513a-97ad-edb76d85786a=9432 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20651) Use Spotless/google-java-format for code formatting/enforcement
Aljoscha Krettek created FLINK-20651: Summary: Use Spotless/google-java-format for code formatting/enforcement Key: FLINK-20651 URL: https://issues.apache.org/jira/browse/FLINK-20651 Project: Flink Issue Type: Improvement Components: Build System Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Please see the ML discussion for background: https://lists.apache.org/thread.html/rfb079ec4cfd35bcb93df9c2163aaa121e392282f0f3d9710c8ade811%40%3Cdev.flink.apache.org%3E. There was broad consensus in the discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20491) Support Broadcast State in BATCH execution mode
Aljoscha Krettek created FLINK-20491: Summary: Support Broadcast State in BATCH execution mode Key: FLINK-20491 URL: https://issues.apache.org/jira/browse/FLINK-20491 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Right now, we don't support {{DataStream.connect(BroadcastStream)}} in {{BATCH}} execution mode. I believe we can add support for this with not too much work. The key insight is that we can process the broadcast side before the non-broadcast side. Initially, we were shying away from this because of concerns about {{ctx.applyToKeyedState()}} which allows the broadcast side of the user function to access/iterate over state from the keyed side. We thought that we couldn't support this. However, since we know that we process the broadcast side first we know that the keyed side will always be empty when doing so. We can thus just make this "keyed iteration" call a no-op, instead of throwing an exception as we do now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20302) Suggest DataStream API with BATCH execution mode in DataSet docs
Aljoscha Krettek created FLINK-20302: Summary: Suggest DataStream API with BATCH execution mode in DataSet docs Key: FLINK-20302 URL: https://issues.apache.org/jira/browse/FLINK-20302 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream, Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20153) Add documentation for BATCH execution mode
Aljoscha Krettek created FLINK-20153: Summary: Add documentation for BATCH execution mode Key: FLINK-20153 URL: https://issues.apache.org/jira/browse/FLINK-20153 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20098) Don't add flink-connector-files to flink-dist
Aljoscha Krettek created FLINK-20098: Summary: Don't add flink-connector-files to flink-dist Key: FLINK-20098 URL: https://issues.apache.org/jira/browse/FLINK-20098 Project: Flink Issue Type: Bug Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.12.0 We currently add both {{flink-connector-files}} and {{flink-connector-base}} to {{flink-dist}}. This implies, that users should use the dependency like this: {code} org.apache.flink flink-connector-files ${project.version} provided {code} which differs from other connectors where users don't need to specify {{provided}}. Also, {{flink-connector-files}} has {{flink-connector-base}} as a provided dependency, which means that examples that use this dependency will not run out-of-box in IntelliJ because transitive provided dependencies will not be considered. I propose to just remove the dependencies from {{flink-dist}} and let users use the File Connector like any other connector. I believe the initial motivation for "providing" the File Connector in {{flink-dist}} was to allow us to use the File Connector under the hood in methods such as {{StreamExecutionEnvironment.readFile(...)}}. We could decide to deprecate and remove those methods or re-add the File Connector as an explicit (non-provided) dependency again in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20001) Don't use setAllVerticesInSameSlotSharingGroupByDefault in StreamGraphGenerator
Aljoscha Krettek created FLINK-20001: Summary: Don't use setAllVerticesInSameSlotSharingGroupByDefault in StreamGraphGenerator Key: FLINK-20001 URL: https://issues.apache.org/jira/browse/FLINK-20001 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek I think the default of having all vertices in the same slot sharing group should be good for both {{BATCH}} and {{STREAMING}} right now. We can reconsider actually setting this flag in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19932) Add integration test for BATCH execution on DataStream API
Aljoscha Krettek created FLINK-19932: Summary: Add integration test for BATCH execution on DataStream API Key: FLINK-19932 URL: https://issues.apache.org/jira/browse/FLINK-19932 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19837) Don't emit intermediate watermarks watermark operators in BATCH execution mode
Aljoscha Krettek created FLINK-19837: Summary: Don't emit intermediate watermarks watermark operators in BATCH execution mode Key: FLINK-19837 URL: https://issues.apache.org/jira/browse/FLINK-19837 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek Currently, both sources and watermark/timestamp operators can emit watermarks that we don't really need. We only need a final watermark in BATCH execution mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19835) Don't emit intermediate watermarks in BATCH execution mode
Aljoscha Krettek created FLINK-19835: Summary: Don't emit intermediate watermarks in BATCH execution mode Key: FLINK-19835 URL: https://issues.apache.org/jira/browse/FLINK-19835 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek Currently, both sources and watermark/timestamp operators can emit watermarks that we don't really need. We only need a final watermark in BATCH execution mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19833) Rename Sink API Writer interface to SinkWriter
Aljoscha Krettek created FLINK-19833: Summary: Rename Sink API Writer interface to SinkWriter Key: FLINK-19833 URL: https://issues.apache.org/jira/browse/FLINK-19833 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek This makes it more consistent with {{SourceReader}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19671) Update EditorConfig file to be useful
Aljoscha Krettek created FLINK-19671: Summary: Update EditorConfig file to be useful Key: FLINK-19671 URL: https://issues.apache.org/jira/browse/FLINK-19671 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek We should update our {{.editorconfig}} file to format Java code according to a style that passes our checkstyle rules and also applies our import ordering. This will greatly simplify development because developers can just "re-format code" in IntelliJ and be done with it. See the ML discussion for more background: https://lists.apache.org/thread.html/r481bb622410718cd454d251b194c7c1ad358e5d861fdacc9a4be3b5b%40%3Cdev.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19521) Support dynamic properties on DefaultCLI
Aljoscha Krettek created FLINK-19521: Summary: Support dynamic properties on DefaultCLI Key: FLINK-19521 URL: https://issues.apache.org/jira/browse/FLINK-19521 Project: Flink Issue Type: Improvement Components: Command Line Client Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Currently, only the {{GenericCLI} and the YARN CLIs allow specifying arbitrary configuration options using the "-Dfoo=bar" syntax. We should also add this for DefaultCLI to make the experience more consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19508) Add collect() operation on DataStream
Aljoscha Krettek created FLINK-19508: Summary: Add collect() operation on DataStream Key: FLINK-19508 URL: https://issues.apache.org/jira/browse/FLINK-19508 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: Aljoscha Krettek With the recent changes/additions to {{DataStreamUtils.collect()}} that make it more robust by using the regular REST client to fetch results from operators it might make sense to add a {{collect()}} operation right on {{DataStream}}. This operation is still not meant for big data volumes but I think it can be useful for debugging and fetching small amounts of messages to the client. When we do this, we can also think about changing {{print()}} to print on the client instead of to the {{TaskManager}} stdout. I think the current behaviour of this operation is mostly confusing for users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19493) In CliFrontend, make flow of Configuration more obvious
Aljoscha Krettek created FLINK-19493: Summary: In CliFrontend, make flow of Configuration more obvious Key: FLINK-19493 URL: https://issues.apache.org/jira/browse/FLINK-19493 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek It's very important to ensure that the {{Configuration}} the {{CliFrontend}} loads ends up in the {{*ContexteEnvironment}} and that its settings are reflected there. Currently, there are no tests for this behaviour and it is hard to convince yourself that the code is actually doing the right thing. We should simplify the flow of the {{Configuration}} from loading to the environment and add tests that verify this behaviour. Currently, the flow is roughly this: - the {{main()}} method loads the {{Configuration}} (from {{flink-conf.yaml}}) - the {{Configuration}} is passed to the {{CustomCommandLines}} in {{loadCustomCommandLines()}} - {{main()}} passes both the {{Configuration}} and the {{CustomCommandLines}} to the constructor of {{CliFrontend}} - when we need a {{Configuration}} for execution {{getEffectiveConfiguration()}} is called. This doesn't take the {{Configuration}} of the {{CliFrontend}} as a basis but instead calls {{CustomCommandLine.applyCommandLineOptionsToConfiguration}} It's up to the {{CustomCommandLine.applyCommandLineOptionsToConfiguration()}} implemenation to either forward the {{Configuration}} that was given to it by the {{CliFrontend.main()}} method or return some other {{Configuration}}. Only if the correct {{Configuration}} is returned can we ensure that user settings make it all the way though. I'm proposing to change {{CustomCommandLine.applyCommandLineOptionsToConfiguration()}} to instead apply it's settings to a {{Configuration}} that is passed in a parameter, then {{getEffectiveConfiguration()}} can pass in the {{Configuration}} it got from the {{main()}} method as a basis and the flow is easy to verify because. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19479) Allow explicitly configuring time behaviour on KeyedStream.intervalJoin()
Aljoscha Krettek created FLINK-19479: Summary: Allow explicitly configuring time behaviour on KeyedStream.intervalJoin() Key: FLINK-19479 URL: https://issues.apache.org/jira/browse/FLINK-19479 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek With the deprecation of {{StreamExecutionEnvironment.setStreamTimeCharacteristic()}} in FLINK-19319 we need a way of explicitly configuring the time behaviour of these join operations. Currently, all join operations use the characteristic to configure themselves. Alternatively, we might consider removing/deprecating these join operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19377) Task can swallow test exceptions which hides test failures
Aljoscha Krettek created FLINK-19377: Summary: Task can swallow test exceptions which hides test failures Key: FLINK-19377 URL: https://issues.apache.org/jira/browse/FLINK-19377 Project: Flink Issue Type: Bug Components: Runtime / State Backends, Runtime / Task, Tests Reporter: Aljoscha Krettek I noticed this while debugging {{EventTimeWindowCheckpointingITCase}}. If you change {{testSlidingWindows()}} to use a tumbling window instead of the sliding window the test should fail. Interestingly, the tests only fail when using the RocksDB backend but not for the heap-based backends. You can follow the flow by setting a breakpoint on the {{AssertionError}} being thrown in {{ValidatingSink}}. Somewhere in the Task innards it will be suppressed/swallowed and the test will succeed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19326) Allow explicitly configuring time behaviour on CEP PatternStream
Aljoscha Krettek created FLINK-19326: Summary: Allow explicitly configuring time behaviour on CEP PatternStream Key: FLINK-19326 URL: https://issues.apache.org/jira/browse/FLINK-19326 Project: Flink Issue Type: Sub-task Components: Library / CEP Reporter: Aljoscha Krettek With the deprecation of {{StreamExecutionEnvironment.setStreamTimeCharacteristic()}} in FLINK-19319 we need a way of explicitly configuring the time behaviour of CEP operations. Currently, all CEP operations use the characteristic to configure themselves. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19319) Deprecate StreamExecutionEnvironment.setStreamTimeCharacteristic()
Aljoscha Krettek created FLINK-19319: Summary: Deprecate StreamExecutionEnvironment.setStreamTimeCharacteristic() Key: FLINK-19319 URL: https://issues.apache.org/jira/browse/FLINK-19319 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek After FLINK-19317 and FLINK-19318 we don't need this setting anymore. Using (explicit) processing-time windows and processing-time timers work fine in a program that has {{EventTime}} set as a time characteristic and once we deprecate {{timeWindow()}} there are not other operations that change behaviour depending on the time characteristic so there's no need to ever change from the new default of event-time. Similarly, the {{IngestionTime}} setting can be achieved in the future by providing an ingestion-time {{WatermarkStrategy}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19318) Deprecate timeWindow() operations in DataStream API
Aljoscha Krettek created FLINK-19318: Summary: Deprecate timeWindow() operations in DataStream API Key: FLINK-19318 URL: https://issues.apache.org/jira/browse/FLINK-19318 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek Excerpt from FLIP-134: {quote} As describe above, we think timeWindow() is not a useful operation and therefore propose to deprecate and eventually remove it. The operation can have surprising behaviour and users should use explicit process-time or event-time operations. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19317) Make EventTime the default StreamTimeCharacteristic
Aljoscha Krettek created FLINK-19317: Summary: Make EventTime the default StreamTimeCharacteristic Key: FLINK-19317 URL: https://issues.apache.org/jira/browse/FLINK-19317 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Excerpt from FLIP-134: {quote} As described above, event time is the only sensible time characteristic for batch. We therefore propose to chagne the default value of the StreamTimeCharacteristic from ProcessingTime to EventTime. This means the DataStream API programs that were using event time before now just work without manually changing this setting. Processing-time programs will also still work, because using processing-time timers is not dependent on the StreamTimeCharacteristic. DataStream programs that don't set a TimestampAssigner or WatermarkStrategy will also still work if they don't use operations that don't rely on (event-time) timestamps. This is true for both BATCH and STREAMING execution mode. The only real user-visible change of this is that programs that used the KeyedStream.timeWindow()/DataStream.timeWindow() operation, which is dependent on the StreamTimeCharacteristic will now use event time by default. We don't think this operation is useful because the behaviour can be surprising. We recommend users always use an explicit processing-time window or event-time window. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19316) FLIP-134: Batch execution for the DataStream API
Aljoscha Krettek created FLINK-19316: Summary: FLIP-134: Batch execution for the DataStream API Key: FLINK-19316 URL: https://issues.apache.org/jira/browse/FLINK-19316 Project: Flink Issue Type: New Feature Components: API / DataStream Reporter: Aljoscha Krettek Umbrella issue for [FLIP-134|https://cwiki.apache.org/confluence/x/4i94CQ] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19264) MiniCluster is flaky with concurrent job execution
Aljoscha Krettek created FLINK-19264: Summary: MiniCluster is flaky with concurrent job execution Key: FLINK-19264 URL: https://issues.apache.org/jira/browse/FLINK-19264 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek While working on FLINK-19123 I noticed that jobs often fail on {{MiniCluster}} when multiple jobs are running at the same time. I created a reproducer here: https://github.com/aljoscha/flink/tree/flink-19123-fix-test-stream-env-alternative. You can run {{MiniClusterConcurrencyITCase}} to see the problem in action. Sometimes the test will succeed, sometimes it will fail with {code} java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.test.example.MiniClusterConcurrencyITCase.submitConcurrently(MiniClusterConcurrencyITCase.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147) at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:107) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:926) at akka.dispatch.OnComplete.internal(Future.scala:264) at akka.dispatch.OnComplete.internal(Future.scala:261) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) at
[jira] [Created] (FLINK-19247) Update Chinese documentation after removal of Kafka 0.10 and 0.11
Aljoscha Krettek created FLINK-19247: Summary: Update Chinese documentation after removal of Kafka 0.10 and 0.11 Key: FLINK-19247 URL: https://issues.apache.org/jira/browse/FLINK-19247 Project: Flink Issue Type: Improvement Components: Connectors / Kafka, Documentation Reporter: Aljoscha Krettek In FLINK-19152 I removed Kafka 0.10 and 0.11. Most of the Chinese documentation was updated for this but some parts were too hard to fix for me as a non-chinese-speaker. Take a look at the PR for FLINK-19152 to see what changes still need to be applied to the Chinese documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19193) Upgrade migration guidelines to use stop-with-savepoint
Aljoscha Krettek created FLINK-19193: Summary: Upgrade migration guidelines to use stop-with-savepoint Key: FLINK-19193 URL: https://issues.apache.org/jira/browse/FLINK-19193 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This is about step one in the documentation here: https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/upgrading.html#step-1-take-a-savepoint-in-the-old-flink-version We currently advise users to take a savepoint, without telling them to stop or cancel the job afterwards. We should update this to suggest stopping the job with a savepoint in step one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19155) ResultPartitionTest is unstable
Aljoscha Krettek created FLINK-19155: Summary: ResultPartitionTest is unstable Key: FLINK-19155 URL: https://issues.apache.org/jira/browse/FLINK-19155 Project: Flink Issue Type: New Feature Components: Runtime / Network Reporter: Aljoscha Krettek I saw a failure in {{testInitializeMoreStateThanBuffer()}}: https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=274=logs=6e58d712-c5cc-52fb-0895-6ff7bd56c46b=f30a8e80-b2cf-535c-9952-7f521a4ae374=5995 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19153) FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)
Aljoscha Krettek created FLINK-19153: Summary: FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API) Key: FLINK-19153 URL: https://issues.apache.org/jira/browse/FLINK-19153 Project: Flink Issue Type: New Feature Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek Umbrella issue for FLIP-131: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19152) Remove Kafka 0.10.x and 0.11.x connectors
Aljoscha Krettek created FLINK-19152: Summary: Remove Kafka 0.10.x and 0.11.x connectors Key: FLINK-19152 URL: https://issues.apache.org/jira/browse/FLINK-19152 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Please see the ML thread for details: https://lists.apache.org/thread.html/r6d5c44e88d2d0fa187df6093694d6ad40c08561492c55b826b7231e7%40%3Cdev.flink.apache.org%3E In short, 0.10.x and 0.11.x are very old and you can use the "modern" Kafka connector to connect to older brokers/clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19135) (Stream)ExecutionEnvironment.execute() should not throw ExecutionException
Aljoscha Krettek created FLINK-19135: Summary: (Stream)ExecutionEnvironment.execute() should not throw ExecutionException Key: FLINK-19135 URL: https://issues.apache.org/jira/browse/FLINK-19135 Project: Flink Issue Type: Improvement Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek In FLINK-14850 we changed the {{execute()}} method to be basically {code} final JobClient jobClient = executeAsync(...); return jobClient.getJobExecutionResult(userClassloader).get(); {code} Unfortunately, this means that {{execute()}} now throws an {{ExecutionException}} instead of a {{ProgramInvocationException}} or {{JobExecutionException}} as before. The {{ExecutionException}} is wrapping the other exceptions that we were throwing before. We didn't notice this in tests because most tests use {{Test(Stream)Environment}} which overrides the {{execute()}} method and so doesn't go through the {{PipelineExecutor}} logic or the normal code path of delegating to {{executeAsync()}}. We should fix this to go back to the old behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19123) TestStreamEnvironment does not use shared MiniCluster for executeAsync()
Aljoscha Krettek created FLINK-19123: Summary: TestStreamEnvironment does not use shared MiniCluster for executeAsync() Key: FLINK-19123 URL: https://issues.apache.org/jira/browse/FLINK-19123 Project: Flink Issue Type: Bug Components: API / DataStream, Runtime / Coordination, Tests Reporter: Aljoscha Krettek TestStreamEnvironment does override {{execute()}} but not {{executeAsync()}} . Now, {{execute()}} goes against the {{MiniCluster}} session that was started by a {{MiniClusterWithClientResource}} or some other method that uses {{TestStreamEnvironment}}. However, {{executeAsync()}} will go through the normal {{StreamExecutionEnvironment}} logic and tries to find an executor, does not know that it is a testing environment. Up until recently, you would have gotten an exception that tells you that no executor is configured, then we would have found out that we need to override {{executeAsync()}} in {{TestStreamEnvironment}}. However, we currently configure a local executor in the constructor: [https://github.com/apache/flink/blob/2160c3294ef87143ab9a4e8138cb618651499792/flink-test-utils-parent/flink-test-utils/src/main/java/org/apache/flink/streaming/util/TestStreamEnvironment.java#L59]. With this, you basically get the “local environment” behaviour when you call {{executeAsync()}}, which starts a cluster for the job and shuts it down when the job finishes. This basically makes the {{TestStreamEnvironment}} cluster sharing useless. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18693) AvroSerializationSchema does not work with types generated by avrohugger
Aljoscha Krettek created FLINK-18693: Summary: AvroSerializationSchema does not work with types generated by avrohugger Key: FLINK-18693 URL: https://issues.apache.org/jira/browse/FLINK-18693 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.10.2, 1.12.0, 1.11.1 The main problem is that the code in {{SpecificData.createSchema()}} tries to reflectively read the {{SCHEMA$}} field, that is normally there in Avro generated classes. However, avrohugger generates this field in a companion object, which the reflective Java code will therefore not find. This is also described in these ML threads: * [https://lists.apache.org/thread.html/5db58c7d15e4e9aaa515f935be3b342fe036e97d32e1fb0f0d1797ee@%3Cuser.flink.apache.org%3E] * [https://lists.apache.org/thread.html/cf1c5b8fa7f095739438807de9f2497e04ffe55237c5dea83355112d@%3Cuser.flink.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18692) AvroSerializationSchema does not work with types generated by avrohugger
Aljoscha Krettek created FLINK-18692: Summary: AvroSerializationSchema does not work with types generated by avrohugger Key: FLINK-18692 URL: https://issues.apache.org/jira/browse/FLINK-18692 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.10.2, 1.12.0, 1.11.1 The main problem is that the code in {{SpecificData.createSchema()}} tries to reflectively read the {{SCHEMA$}} field, that is normally there in Avro generated classes. However, avrohugger generates this field in a companion object, which the reflective Java code will therefore not find. This is also described in these ML threads: * [https://lists.apache.org/thread.html/5db58c7d15e4e9aaa515f935be3b342fe036e97d32e1fb0f0d1797ee@%3Cuser.flink.apache.org%3E] * [https://lists.apache.org/thread.html/cf1c5b8fa7f095739438807de9f2497e04ffe55237c5dea83355112d@%3Cuser.flink.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18569) Add Table.limit() which is the equivalent of SQL LIMIT
Aljoscha Krettek created FLINK-18569: Summary: Add Table.limit() which is the equivalent of SQL LIMIT Key: FLINK-18569 URL: https://issues.apache.org/jira/browse/FLINK-18569 Project: Flink Issue Type: Bug Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime Reporter: Aljoscha Krettek Currently, you can run a SQL query like {{select * FROM (VALUES 'Hello', 'CIAO', 'foo', 'bar') LIMIT 2;}} but you cannot run the equivalent in the Table API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18478) AvroDeserializationSchema does not work with types generated by avrohugger
Aljoscha Krettek created FLINK-18478: Summary: AvroDeserializationSchema does not work with types generated by avrohugger Key: FLINK-18478 URL: https://issues.apache.org/jira/browse/FLINK-18478 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.9.1, 1.10.0 The main problem is that the code in {{SpecificData.createSchema()}} tries to reflectively read the {{SCHEMA$}} field, that is normally there in Avro generated classes. However, avrohugger generates this field in a companion object, which the reflective Java code will therefore not find. This is also described in these ML threads: * [https://lists.apache.org/thread.html/5db58c7d15e4e9aaa515f935be3b342fe036e97d32e1fb0f0d1797ee@%3Cuser.flink.apache.org%3E] * [https://lists.apache.org/thread.html/cf1c5b8fa7f095739438807de9f2497e04ffe55237c5dea83355112d@%3Cuser.flink.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18381) Update Jekyll to 4.0.1
Aljoscha Krettek created FLINK-18381: Summary: Update Jekyll to 4.0.1 Key: FLINK-18381 URL: https://issues.apache.org/jira/browse/FLINK-18381 Project: Flink Issue Type: Bug Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Building the docs with Ruby 2.7 and Jekyll 4.0.0 spits out a lot of warnings, see [https://github.com/jekyll/jekyll/issues/7947]. Updating to 4.0.1 fixes this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18377) Rename "Flink Master" back to JobManager in documentation
Aljoscha Krettek created FLINK-18377: Summary: Rename "Flink Master" back to JobManager in documentation Key: FLINK-18377 URL: https://issues.apache.org/jira/browse/FLINK-18377 Project: Flink Issue Type: Task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek >From the ML thread >(https://lists.apache.org/thread.html/red9d18f7173d53d016f6826410841b2726b9293ceefaea2de0cdeafb%40%3Cdev.flink.apache.org%3E) > : This came to my mind because of the master/slave discussion in FLINK-18209 and the larger discussions about inequality/civil rights happening right now in the world. I think for this reason alone we should use a name that does not include "master". We could rename it back to JobManager, which was the name mostly used before 2019. Since the beginning of Flink, TaskManager was the term used for the worker component/node and JobManager was the term used for the orchestrating component/node. Currently our [glossary|http://example.com] defines these terms (paraphrased by me): - "Flink Master": it's the orchestrating component that consists of resource manager, dispatcher, and JobManager - JobManager: it's the thing that manages a single job and runs as part of a "Flink Master" - TaskManager: it's the worker process Prior to the introduction of the glossary the definition of JobManager would have been: - It's the orchestrating component that manages execution of jobs and schedules work on TaskManagers. Quite some parts in the code and documentation/configuration options still use that older meaning of JobManager. Newer parts of the documentation use "Flink Master" instead. I'm proposing to go back to calling the orchestrating component JobManager, which would mean that we have to touch up the documentation to remove mentions of "Flink Master". I'm also proposing not to mention the internal components such as resource manager and dispatcher in the glossary because there are transparent to users. I'm proposing to go back to JobManager instead of an alternative name also because switching to yet another name would mean many more changes to code/documentation/peoples minds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18120) Don't expand documentation sections by default
Aljoscha Krettek created FLINK-18120: Summary: Don't expand documentation sections by default Key: FLINK-18120 URL: https://issues.apache.org/jira/browse/FLINK-18120 Project: Flink Issue Type: Task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This is basically a revert of FLINK-16041. The idea seemed good at the time but with more sections being added this just looks too unwieldy when you first open the docs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18036) Chinese documentation build is broken
Aljoscha Krettek created FLINK-18036: Summary: Chinese documentation build is broken Key: FLINK-18036 URL: https://issues.apache.org/jira/browse/FLINK-18036 Project: Flink Issue Type: Task Components: chinese-translation, Documentation Affects Versions: 1.11.0 Reporter: Aljoscha Krettek Fix For: 1.11.0 Log from one of the builders: https://ci.apache.org/builders/flink-docs-master/builds/1848/steps/Build%20docs/logs/stdio The problem is that the chinese doc uses {{{% link %}}} tags that refer to documents from the english documentation. It should be as easy as adding {{.zh}} in these links. It seems this change introduced the problem: https://github.com/apache/flink/commit/d40abbf0309f414a6acf8a090c448ba397a08d9c -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18032) Remove outdated sections in migration guide
Aljoscha Krettek created FLINK-18032: Summary: Remove outdated sections in migration guide Key: FLINK-18032 URL: https://issues.apache.org/jira/browse/FLINK-18032 Project: Flink Issue Type: Task Components: Documentation Reporter: Aljoscha Krettek There is still a section in {{docs/dev/migration.md}} about migrating from 1.2 to 1.3, and we talk about the serializable list interfaces that will be removed as part of FLINK-17376. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18011) Make WatermarkStrategy/WatermarkStrategies more ergonomic
Aljoscha Krettek created FLINK-18011: Summary: Make WatermarkStrategy/WatermarkStrategies more ergonomic Key: FLINK-18011 URL: https://issues.apache.org/jira/browse/FLINK-18011 Project: Flink Issue Type: Sub-task Components: API / Core, API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 Currently, we have an interface {{WatermarkStrategy}}, which is a {{TimestampAssignerSupplier}} and {{WatermarkGeneratorSupplier}}. The very first design (which is also currently implemented) also added {{WatermarkStrategies}} as a convenience builder for a {{WatermarkStrategy}}. However, I don't think users will ever implement a {{WatermarkStrategy}} but always wrap it in a builder. I also think that {{WatermarkStrategy}} itself is already that builder and we currently have two levels of builders, which also makes them harder to use in the {{DataStream API}} because of type checking issues. I'm proposing to remove {{WatermarkStrategies}} and to instead put the static methods directly into {{WatermarkStrategy}} and also to remove the {{build()}} method. Instead of a {{build()}} method, API methods on {{WatermarkStrategy}} just keep "piling" features on top of a base {{WatermarkStrategy}} via wrapping. Example to show what I mean for the API (current): {code} DataStream input = ...; input.assignTimestampsAndWatermarks( WatermarkStrategies..forMonotonousTimestamps().build()); {code} with the proposed change: {code} DataStream input = ...; input.assignTimestampsAndWatermarks( WatermarkStrategy.forMonotonousTimestamps()); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17956) Add Flink 1.11 MigrationVersion
Aljoscha Krettek created FLINK-17956: Summary: Add Flink 1.11 MigrationVersion Key: FLINK-17956 URL: https://issues.apache.org/jira/browse/FLINK-17956 Project: Flink Issue Type: Task Components: API / Core Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17886) Update documentation for new WatermarkGenerator/WatermarkStrategies
Aljoscha Krettek created FLINK-17886: Summary: Update documentation for new WatermarkGenerator/WatermarkStrategies Key: FLINK-17886 URL: https://issues.apache.org/jira/browse/FLINK-17886 Project: Flink Issue Type: Bug Components: chinese-translation, Documentation Reporter: Aljoscha Krettek Fix For: 1.11.0 We need to update the Chinese documentation according to FLINK-17773. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17815) Change KafkaConnector to give per-partition metric group to WatermarkGenerator
Aljoscha Krettek created FLINK-17815: Summary: Change KafkaConnector to give per-partition metric group to WatermarkGenerator Key: FLINK-17815 URL: https://issues.apache.org/jira/browse/FLINK-17815 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka Reporter: Aljoscha Krettek We currently give a reference to the general consumer {{MetricGroup}}, this means that all {{WatermarkGenerators}} would write to the same metric group. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17773) Update documentation for new WatermarkGenerator/WatermarkStrategies
Aljoscha Krettek created FLINK-17773: Summary: Update documentation for new WatermarkGenerator/WatermarkStrategies Key: FLINK-17773 URL: https://issues.apache.org/jira/browse/FLINK-17773 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17766) Use checkpoint lock instead of fine-grained locking in Kafka AbstractFetcher
Aljoscha Krettek created FLINK-17766: Summary: Use checkpoint lock instead of fine-grained locking in Kafka AbstractFetcher Key: FLINK-17766 URL: https://issues.apache.org/jira/browse/FLINK-17766 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 In {{emitRecordsWithTimestamps()}}, we are currently locking on the partition state object itself to prevent concurrent access (and to make sure that changes are visible across threads). However, after recent changes (FLINK-17307) we hold the checkpoint lock for emitting the whole "bundle" of records from Kafka. We can now also just use the checkpoint lock in the periodic emitter callback and then don't need the fine-grained locking on the state for record emission. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17669) Use new WatermarkStrategy/WatermarkGenerator in Kafka connector
Aljoscha Krettek created FLINK-17669: Summary: Use new WatermarkStrategy/WatermarkGenerator in Kafka connector Key: FLINK-17669 URL: https://issues.apache.org/jira/browse/FLINK-17669 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17661) Add APIs for using new WatermarkStrategy/WatermarkGenerator
Aljoscha Krettek created FLINK-17661: Summary: Add APIs for using new WatermarkStrategy/WatermarkGenerator Key: FLINK-17661 URL: https://issues.apache.org/jira/browse/FLINK-17661 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17659) Add common watermark strategies and WatermarkStrategies helper
Aljoscha Krettek created FLINK-17659: Summary: Add common watermark strategies and WatermarkStrategies helper Key: FLINK-17659 URL: https://issues.apache.org/jira/browse/FLINK-17659 Project: Flink Issue Type: Sub-task Components: API / Core Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 {{WatermarkStrategies}} is a builder-style helper for constructing common {{WatermarkStrategy}} subclasses, along with timestamp assigners and idleness configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17658) Add new TimestampAssigner and WatermarkGenerator interfaces
Aljoscha Krettek created FLINK-17658: Summary: Add new TimestampAssigner and WatermarkGenerator interfaces Key: FLINK-17658 URL: https://issues.apache.org/jira/browse/FLINK-17658 Project: Flink Issue Type: Sub-task Components: API / Core Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17655) Remove old and long deprecated TimestampExtractor
Aljoscha Krettek created FLINK-17655: Summary: Remove old and long deprecated TimestampExtractor Key: FLINK-17655 URL: https://issues.apache.org/jira/browse/FLINK-17655 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 This was deprecated about 4 years ago in FLINK-3379, for Flink 1.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17654) Move Clock classes to flink-core to make them usable outside runtime
Aljoscha Krettek created FLINK-17654: Summary: Move Clock classes to flink-core to make them usable outside runtime Key: FLINK-17654 URL: https://issues.apache.org/jira/browse/FLINK-17654 Project: Flink Issue Type: Sub-task Components: API / Core Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17653) FLIP-126: Unify (and separate) Watermark Assigners
Aljoscha Krettek created FLINK-17653: Summary: FLIP-126: Unify (and separate) Watermark Assigners Key: FLINK-17653 URL: https://issues.apache.org/jira/browse/FLINK-17653 Project: Flink Issue Type: Bug Components: API / Core, API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.11.0 https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17514) TaskCancelerWatchdog does not kill TaskManager
Aljoscha Krettek created FLINK-17514: Summary: TaskCancelerWatchdog does not kill TaskManager Key: FLINK-17514 URL: https://issues.apache.org/jira/browse/FLINK-17514 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.10.1, 1.11.0 Reporter: Aljoscha Krettek Fix For: 1.10.1, 1.11.0 The watchdog reports a fatal error using {{taskManager.notifyFatalError(msg, null)}}. This should normally lead to the TaskManager being terminated. The code introduced in FLINK-16255 tries to look at the passed exception and will eventually fail with a {{NullPointerException}}, which prevents the TaskManager from being terminated. Stacktrace: {code} 2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskmanager.Task - Task did not exit gracefully within 180 + seconds. 2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor- Task did not exit gracefully within 180 + seconds. 2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskmanager.Task - Error in Task Cancellation Watch Dog java.lang.NullPointerException at org.apache.flink.util.ExceptionUtils.isOutOfMemoryErrorWithMessageStartingWith(ExceptionUtils.java:186) at org.apache.flink.util.ExceptionUtils.isMetaspaceOutOfMemoryError(ExceptionUtils.java:170) at org.apache.flink.util.ExceptionUtils.enrichTaskManagerOutOfMemoryError(ExceptionUtils.java:144) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.onFatalError(TaskManagerRunner.java:249) at org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.notifyFatalError(TaskExecutor.java:1751) at org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1514) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17415) Fold API-agnostic documentation into DataStream documentation (chinese)
Aljoscha Krettek created FLINK-17415: Summary: Fold API-agnostic documentation into DataStream documentation (chinese) Key: FLINK-17415 URL: https://issues.apache.org/jira/browse/FLINK-17415 Project: Flink Issue Type: Sub-task Components: API / DataStream, Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek As per [FLIP-42|https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation], we want to move most cross-API documentation to the DataStream section and deprecate the DataSet API in the future. We want to go from - Project Build Setup - Basic API Concepts - Streaming (DataStream API) - Batch (DataSet API) - Table API & SQL - Data Types & Serialization - Managing Execution - Libraries - Best Practices - API Migration Guides To - DataStream API - Table API / SQL - DataSet API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17349) Reduce runtime of LocalExecutorITCase
Aljoscha Krettek created FLINK-17349: Summary: Reduce runtime of LocalExecutorITCase Key: FLINK-17349 URL: https://issues.apache.org/jira/browse/FLINK-17349 Project: Flink Issue Type: Improvement Components: Table SQL / Client, Table SQL / Legacy Planner, Table SQL / Planner, Table SQL / Runtime Reporter: Aljoscha Krettek Running the while ITCase takes ~3 minutes on my machine, which is not acceptable for developer productivity and is also quite the burden on our CI systems and PR iteration time. The issue is mostly that this does many costly operations, such as compiling SQL queries. Some tests are inefficient in that they do a lot more repetitions or test things that are not needed here. Also {{LocalExecutor}} itself is a bit wasteful because every time a session property is changed, when opening a session, and for other things we trigger reloading/re-parsing the environment, which means all the defined catalogs, sources/sinks, and views. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17217) Download links for central.maven.org in doc don't work
Aljoscha Krettek created FLINK-17217: Summary: Download links for central.maven.org in doc don't work Key: FLINK-17217 URL: https://issues.apache.org/jira/browse/FLINK-17217 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.9.3, 1.10.1 Reporter: Aljoscha Krettek central.maven.org is not available anymore, I think it was never a URL that they wanted used for that. We can instead use {{[https://repo1.maven.org.|https://repo1.maven.org/]}} This mostly means that the SQL connector download links don't work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17136) Rename toplevel DataSet/DataStream section titles
Aljoscha Krettek created FLINK-17136: Summary: Rename toplevel DataSet/DataStream section titles Key: FLINK-17136 URL: https://issues.apache.org/jira/browse/FLINK-17136 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek According to FLIP-42: - Streaming (DataStream API) -> DataStream API - Batch (DataSet API) -> DataSet API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17074) Deprecate DataStream.keyBy() that use tuple/expression keys
Aljoscha Krettek created FLINK-17074: Summary: Deprecate DataStream.keyBy() that use tuple/expression keys Key: FLINK-17074 URL: https://issues.apache.org/jira/browse/FLINK-17074 Project: Flink Issue Type: Improvement Components: API / DataStream Environment: Currently you can either specify a {{KeySelector}}, tuple positions, and expression keys? I think {{KeySelectors}} are strictly superior and with lambdas (or function references) quite easy to use. Tuple/expression keys use reflection underneath to do the field accesses, so performance is strictly worse. Also, when using a {{KeySelector}} you will have a meaningful key type {{KEY}} in your operations while for tuple/expression keys the key type is simply {{Tuple}}. Tuple/expression keys were introduced before Java got support for lambdas in Java 8 and before we added the Table API. Nowadays, using a lambda is little more typing than using an expression key but is (possibly) faster and more type safe. The Table API should be used for these more expression-based/relational use cases. Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17009) Fold API-agnostic documentation into DataStream documentation
Aljoscha Krettek created FLINK-17009: Summary: Fold API-agnostic documentation into DataStream documentation Key: FLINK-17009 URL: https://issues.apache.org/jira/browse/FLINK-17009 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek As per [FLIP-42|https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation], we want to move most cross-API documentation to the DataStream section and deprecate the DataSet API in the future. We want to go from - Project Build Setup - Basic API Concepts - Streaming (DataStream API) - Batch (DataSet API) - Table API & SQL - Data Types & Serialization - Managing Execution - Libraries - Best Practices - API Migration Guides To - DataStream API - Table API / SQL - DataSet API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16976) Update chinese documentation for ListCheckpointed deprecation
Aljoscha Krettek created FLINK-16976: Summary: Update chinese documentation for ListCheckpointed deprecation Key: FLINK-16976 URL: https://issues.apache.org/jira/browse/FLINK-16976 Project: Flink Issue Type: Bug Components: chinese-translation, Documentation Reporter: Aljoscha Krettek Fix For: 1.11.0 The change for the english documentation is in https://github.com/apache/flink/commit/10aadfc6906a1629f7e60eacf087e351ba40d517 The original Jira issue is FLINK-6258. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16572) CheckPubSubEmulatorTest is flaky on Azure
Aljoscha Krettek created FLINK-16572: Summary: CheckPubSubEmulatorTest is flaky on Azure Key: FLINK-16572 URL: https://issues.apache.org/jira/browse/FLINK-16572 Project: Flink Issue Type: Bug Components: Build System / Azure Pipelines Reporter: Aljoscha Krettek Log: https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=56=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=ce095137-3e3b-5f73-4b79-c42d3d5f8283=7842 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16216) Describe end-to-end exactly once programs in stateful stream processing concepts documentation
Aljoscha Krettek created FLINK-16216: Summary: Describe end-to-end exactly once programs in stateful stream processing concepts documentation Key: FLINK-16216 URL: https://issues.apache.org/jira/browse/FLINK-16216 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek This should go into {{stateful-stream-processing.md}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16214) Describe how state is different for stream/batch programs in concepts documentation
Aljoscha Krettek created FLINK-16214: Summary: Describe how state is different for stream/batch programs in concepts documentation Key: FLINK-16214 URL: https://issues.apache.org/jira/browse/FLINK-16214 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek This should go into {{stateful-stream-processing.md}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16213) Add "What Is State" section in concepts documentation
Aljoscha Krettek created FLINK-16213: Summary: Add "What Is State" section in concepts documentation Key: FLINK-16213 URL: https://issues.apache.org/jira/browse/FLINK-16213 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek This should go into {{stateful-stream-processing.md}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16212) Describe how Flink is a unified batch/stream processing system in concepts documentation
Aljoscha Krettek created FLINK-16212: Summary: Describe how Flink is a unified batch/stream processing system in concepts documentation Key: FLINK-16212 URL: https://issues.apache.org/jira/browse/FLINK-16212 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16211) Add introduction to stream processing concepts documentation
Aljoscha Krettek created FLINK-16211: Summary: Add introduction to stream processing concepts documentation Key: FLINK-16211 URL: https://issues.apache.org/jira/browse/FLINK-16211 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16210) Add section about applications and clusters/session in concepts documentation
Aljoscha Krettek created FLINK-16210: Summary: Add section about applications and clusters/session in concepts documentation Key: FLINK-16210 URL: https://issues.apache.org/jira/browse/FLINK-16210 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek This can either go into the existing _Flink Architecture_ ({{flink-architecture.md}}) documentation or be a new section. We can possibly remove the old _Flink Architecture_ section then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16209) Add Latency and Completeness section in timely stream processing concepts
Aljoscha Krettek created FLINK-16209: Summary: Add Latency and Completeness section in timely stream processing concepts Key: FLINK-16209 URL: https://issues.apache.org/jira/browse/FLINK-16209 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16208) Add introduction to timely stream processing concepts documentation
Aljoscha Krettek created FLINK-16208: Summary: Add introduction to timely stream processing concepts documentation Key: FLINK-16208 URL: https://issues.apache.org/jira/browse/FLINK-16208 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16207) In stream processing concepts section, rework distribution patterns description
Aljoscha Krettek created FLINK-16207: Summary: In stream processing concepts section, rework distribution patterns description Key: FLINK-16207 URL: https://issues.apache.org/jira/browse/FLINK-16207 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Currently, we only distinguish between one-to-one and redistribution. We should instead describe it as - Forward - Broadcast - Random - Keyed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16144) Add client.timeout setting and use that for CLI operations
Aljoscha Krettek created FLINK-16144: Summary: Add client.timeout setting and use that for CLI operations Key: FLINK-16144 URL: https://issues.apache.org/jira/browse/FLINK-16144 Project: Flink Issue Type: Improvement Components: Command Line Client Reporter: Aljoscha Krettek Fix For: 1.11.0 Currently, the Cli uses the {{akka.client.timeout}} setting. This has historical reasons but can be very confusing for users. We should introduce a new setting {{client.timeout}} that is used for the client, with a fallback to the previous {{akka.client.timeout}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16049) Remove outdated "Best Pracatices" section from Application Development Section
Aljoscha Krettek created FLINK-16049: Summary: Remove outdated "Best Pracatices" section from Application Development Section Key: FLINK-16049 URL: https://issues.apache.org/jira/browse/FLINK-16049 Project: Flink Issue Type: Bug Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16045) Extract connectors documentation to a top-level section
Aljoscha Krettek created FLINK-16045: Summary: Extract connectors documentation to a top-level section Key: FLINK-16045 URL: https://issues.apache.org/jira/browse/FLINK-16045 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16044) Extract libraries documentation to a top-level section
Aljoscha Krettek created FLINK-16044: Summary: Extract libraries documentation to a top-level section Key: FLINK-16044 URL: https://issues.apache.org/jira/browse/FLINK-16044 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16041) Expand "popular" documentation sections by default
Aljoscha Krettek created FLINK-16041: Summary: Expand "popular" documentation sections by default Key: FLINK-16041 URL: https://issues.apache.org/jira/browse/FLINK-16041 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Attachments: current-state.png, proposed-state.png Currently, when the documentation page is loaded all sections are collapsed, this means that some prominent subsections are not easily discoverable. I think we should expand the "Getting Started", "Concepts", and "API" sections by default. (See also https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation, because "API" doesn't exist yet in the current documentation. I attached screenshots to show what I mean. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16000) Move "Project Build Setup" to "Getting Started" in documentation
Aljoscha Krettek created FLINK-16000: Summary: Move "Project Build Setup" to "Getting Started" in documentation Key: FLINK-16000 URL: https://issues.apache.org/jira/browse/FLINK-16000 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15999) Extract “Concepts” material from API/Library sections and start proper concepts section
Aljoscha Krettek created FLINK-15999: Summary: Extract “Concepts” material from API/Library sections and start proper concepts section Key: FLINK-15999 URL: https://issues.apache.org/jira/browse/FLINK-15999 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15998) Revert rename of "Job Cluster" to "Application Cluster" in documentation
Aljoscha Krettek created FLINK-15998: Summary: Revert rename of "Job Cluster" to "Application Cluster" in documentation Key: FLINK-15998 URL: https://issues.apache.org/jira/browse/FLINK-15998 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.10.0, 1.11.0 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.10.1, 1.11.0 [~kkl0u] and I are working on an upcoming FLIP that will propose a "real" application mode. The gist of it will be that an application cluster is roughly responsible for all the jobs that are executed in a user {{main()}} method. Each individual invocation of {{execute()}} in said {{main()}} function would launch a job on the aforementioned application cluster. The {{main()}} method would run inside the cluster (or not, that's up for discussion). FLINK-12625 introduced a glossary that describes what was previously known as a "Job Cluster" as an "Application Cluster". We think that in the future users will have both: per-job clusters and application clusters. We therefore think that we should describe our current per-job clusters as Per-job clusters (or job clusters) in the glossary and documentation and reserve the name application cluster for the "real" forthcoming application clusters. This would also render FLINK-12885 outdated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15997) Make documentation 404 page look like a documentation page
Aljoscha Krettek created FLINK-15997: Summary: Make documentation 404 page look like a documentation page Key: FLINK-15997 URL: https://issues.apache.org/jira/browse/FLINK-15997 Project: Flink Issue Type: Sub-task Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15993) Add timeout to 404 documentation redirect, add explanation
Aljoscha Krettek created FLINK-15993: Summary: Add timeout to 404 documentation redirect, add explanation Key: FLINK-15993 URL: https://issues.apache.org/jira/browse/FLINK-15993 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15904) Make Kafka Consumer work with activated "disableGenericTypes()"
Aljoscha Krettek created FLINK-15904: Summary: Make Kafka Consumer work with activated "disableGenericTypes()" Key: FLINK-15904 URL: https://issues.apache.org/jira/browse/FLINK-15904 Project: Flink Issue Type: Bug Components: Connectors / Kafka Reporter: Aljoscha Krettek A user reported a problem that the Kafka Consumer doesn't work in that case: https://lists.apache.org/thread.html/r462a854e8a0ab3512e2906b40411624f3164ea3af7cba61ee94cd760%40%3Cuser.flink.apache.org%3E. We should use a different constructor for {{ListStateDescriptor}} that takes {{TypeSerializer}} here: https://github.com/apache/flink/blob/68cc21e4af71505efa142110e35a1f8b1c25fe6e/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L860. This will circumvent the check. My full analysis from the email thread: {quote} Unfortunately, the fact that the Kafka Sources use Kryo for state serialization is a very early design misstep that we cannot get rid of for now. We will get rid of that when the new source interface lands ([1]) and when we have a new Kafka Source based on that. As a workaround, we should change the Kafka Consumer to go through a different constructor of ListStateDescriptor which directly takes a TypeSerializer instead of a TypeInformation here: [2]. This should sidestep the "no generic types" check. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface [2] https://github.com/apache/flink/blob/68cc21e4af71505efa142110e35a1f8b1c25fe6e/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L860 {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15735) Too many warnings when running bin/start-cluster.sh
Aljoscha Krettek created FLINK-15735: Summary: Too many warnings when running bin/start-cluster.sh Key: FLINK-15735 URL: https://issues.apache.org/jira/browse/FLINK-15735 Project: Flink Issue Type: Bug Components: Command Line Client Reporter: Aljoscha Krettek Fix For: 1.10.0 Running {{bin/start-cluster.sh}} now prints: {code} $ bin/start-cluster.sh Starting cluster. Starting standalonesession daemon on host Aljoschas-MacBook-Pro-Work.local. log4j:WARN No appenders could be found for logger (org.apache.flink.configuration.GlobalConfiguration). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. log4j:WARN No appenders could be found for logger (org.apache.flink.configuration.GlobalConfiguration). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Starting taskexecutor daemon on host Aljoschas-MacBook-Pro-Work.local. {code} I think we were not printing that before and we probably shouldn't do so now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15518) Don't hide web frontend side pane automatically
Aljoscha Krettek created FLINK-15518: Summary: Don't hide web frontend side pane automatically Key: FLINK-15518 URL: https://issues.apache.org/jira/browse/FLINK-15518 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.9.0, 1.10.0 Reporter: Aljoscha Krettek Fix For: 1.10.0 As mentioned in FLINK-13386 the side pane hides automatically in some cases but not all cases. When I was debugging or trying the web frontend I found this behaviour a bit disconcerting. Could we disable the hiding by default? The user can still manually hide if they want to. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15474) In TypeSerializerUpgradeTestBase, create serializer snapshots "on demand"
Aljoscha Krettek created FLINK-15474: Summary: In TypeSerializerUpgradeTestBase, create serializer snapshots "on demand" Key: FLINK-15474 URL: https://issues.apache.org/jira/browse/FLINK-15474 Project: Flink Issue Type: Bug Components: API / Type Serialization System, Tests Affects Versions: 1.9.0, 1.10.0 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.10.0 Currently, we store binary snapshots in the repository for all the different serializer upgrade test configurations (see linked POC for which snapshots are there for just the POJO serializer). This is hard to maintain, because someone has to go back and generate snapshtos from previous Flink versions and add them to the repo when updating the tests for a new Flink version. It's also problematic from a repository perspective because we keep piling up binary snapshots. Instead, we can create a snapshot "on demand" from a previous Flink version by using a classloader that has the previous Flink jar. I created a POC which demonstrated the approach: [https://github.com/aljoscha/flink/tree/jit-serializer-test-base]. The advantage is that we don't need binary snapshots in the repo anymore, updating the tests to a newer Flink version should be as easy as adding a new migration version and Flink download url. The downside is that the test now downloads Flink releases (in the PoC this is done using the caching infra introduced for e2e tests), which is costly and also re-generates the snapshots for every test, which is also costly. The test time (minus downloading) goes up from about 300 ms to roughly 6 seconds. That's not something I would call a unit test. We could call these "integration tests" (or even e2e tests) and only run them nightly. Side note, we don't have test coverage for serializer upgrades from 1.8 and 1.9 currently, so even doing it nightly would be an improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15265) Remove "-executor" suffix from executor names
Aljoscha Krettek created FLINK-15265: Summary: Remove "-executor" suffix from executor names Key: FLINK-15265 URL: https://issues.apache.org/jira/browse/FLINK-15265 Project: Flink Issue Type: Bug Components: API / Core Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek The executor names always have "-executor" as a suffix, this is reduntant. Currently, the executor name is also used to retrieve a {{ClusterClient}}, where it is unfortunate that the name has executor as a suffix. In the future we might provide something like a {{FlinkClient}} that offers a programmatic API for the functionality of {{bin/flink}}, here we would also use the same names. In reality, the "executor names" are not names of executors but deployment targets. That's why the current naming seems a bit unnatural. This is a simple search-and-replace job, no new functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15251) Fabric8FlinkKubeClient doesn't work if ingress has hostname but no IP
Aljoscha Krettek created FLINK-15251: Summary: Fabric8FlinkKubeClient doesn't work if ingress has hostname but no IP Key: FLINK-15251 URL: https://issues.apache.org/jira/browse/FLINK-15251 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0 Environment: Kubernetes for Docker on MacOS Reporter: Aljoscha Krettek Fix For: 1.10.0 In my setup the ingress has a hostname but no IP here: https://github.com/apache/flink/blob/f49e632bb290ded45b320f5d00ceaa1543a6bb1c/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L199 This means that when I try to use the Kubernetes Executor I will get {code} Exception in thread "main" java.lang.NullPointerException: Address should not be null. at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75) at org.apache.flink.kubernetes.kubeclient.Endpoint.(Endpoint.java:33) at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:209) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:82) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:106) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:53) at org.apache.flink.client.deployment.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:60) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1741) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1712) at org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:169) {code} I think we can just check if a hostname is set and use that if there is no IP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15157) Make ScalaShell ensureYarnConfig() and fetchConnectionInfo() public
Aljoscha Krettek created FLINK-15157: Summary: Make ScalaShell ensureYarnConfig() and fetchConnectionInfo() public Key: FLINK-15157 URL: https://issues.apache.org/jira/browse/FLINK-15157 Project: Flink Issue Type: Bug Components: Scala Shell Reporter: Aljoscha Krettek This allows users of the Scala Shell, such as Zeppelin to work better with the shell. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15129) Return JobClient instead of JobClient Future from executeAsync()
Aljoscha Krettek created FLINK-15129: Summary: Return JobClient instead of JobClient Future from executeAsync() Key: FLINK-15129 URL: https://issues.apache.org/jira/browse/FLINK-15129 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek Currently, users have to write this when they want to use the {{JobClient}}: {code} CompletableFuture jobClientFuture = env.executeAsync(); JobClient jobClient = jobClientFuture.get(); // or use thenApply/thenCompose etc. {code} instead we could always return a {{JobClient}} right away and therefore remove one step for the user. I don't know if it's always the right choice, but currently we always return an already completed future that contains the {{JobClient}}. In the future we might want to return a future that actually completes at some later point, we would not be able to do this if we directly return a {{JobClient}} and would have to block in {{executeAsync()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15121) Add public constructor/getter for execution environments that takes Configuration
Aljoscha Krettek created FLINK-15121: Summary: Add public constructor/getter for execution environments that takes Configuration Key: FLINK-15121 URL: https://issues.apache.org/jira/browse/FLINK-15121 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 1.10.0 Currently, you cannot create an {{ExecutionEnvironment}} from a {{Configuration}}, which you would need to configure an arbitrary Executor, such as one of the YARN executors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15116) Make JobClient stateless, remove AutoCloseable
Aljoscha Krettek created FLINK-15116: Summary: Make JobClient stateless, remove AutoCloseable Key: FLINK-15116 URL: https://issues.apache.org/jira/browse/FLINK-15116 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek Currently, {{JobClient}} is {{AutoCloseable}} and we require users to close the {{JobClient}} that they get as a result from {{executeAsync()}}. This is problematic because users can simply ignore the result of {{executeAsync()}} and then we will leak the resources that the client has. We should change the {{JobClient}} so that it acquires the required {{ClusterClient}} for each method call and closes it again. This means that the users no longer have the burden of managing the JobClient lifecycle, i.e. they can freely ignore the result of executeAsync(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15013) Flink (on YARN) sometimes needs too many slots
Aljoscha Krettek created FLINK-15013: Summary: Flink (on YARN) sometimes needs too many slots Key: FLINK-15013 URL: https://issues.apache.org/jira/browse/FLINK-15013 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.10.0 Reporter: Aljoscha Krettek Fix For: 1.10.0 Attachments: DualInputWordCount.jar *THIS IS DIFFERENT FROM FLINK-15007, even though the text looks almost the same.* This was discovered while debugging FLINK-14834. In some cases a Flink needs needs more slots to execute than expected. You can see this in some of the logs attached to FLINK-14834. You can reproduce this using [https://github.com/aljoscha/docker-hadoop-cluster] to bring up a YARN cluster and then running a compiled Flink in there. When you run {code:java} bin/flink run -m yarn-cluster -p 3 -yjm 1224 -ytm 1224 /root/DualInputWordCount.jar --input hdfs:///wc-in-1 --output hdfs:///wc-out && hdfs dfs -rm -r /wc-out {code} and check the logs afterwards you will sometimes see three "Requesting new slot..." statements and sometimes you will see four. This is the {{git bisect}} log that identifies the first faulty commit ([https://github.com/apache/flink/commit/2ab8b61f2f22f1a1ce7f92cd6b8dd32d2c0c227d|https://github.com/apache/flink/commit/2ab8b61f2f22f1a1ce7f92cd6b8dd32d2c0c227d]): {code:java} git bisect start # good: [09f2f43a1d73c76bf4d3f4a1205269eb860deb14] [FLINK-14154][ml] Add the class for multivariate Gaussian Distribution. git bisect good 09f2f43a1d73c76bf4d3f4a1205269eb860deb14 # bad: [01d6972ab267807b8afccb09a45c454fa76d6c4b] [hotfix] Refactor out slots creation from the TaskSlotTable constructor git bisect bad 01d6972ab267807b8afccb09a45c454fa76d6c4b # bad: [7a61c582c7213f123e10de4fd11a13d96425fd77] [hotfix] Fix wrong Java doc comment of BroadcastStateBootstrapFunction.Context git bisect bad 7a61c582c7213f123e10de4fd11a13d96425fd77 # good: [edeec8d7420185d1c49b2739827bd921d2c2d485] [hotfix][runtime] Replace all occurrences of letter to mail to unify wording of variables and documentation. git bisect good edeec8d7420185d1c49b2739827bd921d2c2d485 # bad: [1b4ebce86b71d56f44185f1cb83d9a3b51de13df] [FLINK-14262][table-planner-blink] support referencing function with fully/partially qualified names in blink git bisect bad 1b4ebce86b71d56f44185f1cb83d9a3b51de13df # good: [25a3d9138cd5e39fc786315682586b75d8ac86ea] [hotfix] Move TaskManagerSlot to o.a.f.runtime.resourcemanager.slotmanager git bisect good 25a3d9138cd5e39fc786315682586b75d8ac86ea # good: [362d7670593adc2e4b20650c8854398727d8102b] [FLINK-12122] Calculate TaskExecutorUtilization when listing available slots git bisect good 362d7670593adc2e4b20650c8854398727d8102b # bad: [7e8218515baf630e668348a68ff051dfa49c90c3] [FLINK-13969][Checkpointing] Do not allow trigger new checkpoitn after stop the coordinator git bisect bad 7e8218515baf630e668348a68ff051dfa49c90c3 # bad: [269e7f007e855c2bdedf8bad64ef13f516a608a6] [FLINK-12122] Choose SlotSelectionStrategy based on ClusterOptions#EVENLY_SPREAD_OUT_SLOTS_STRATEGY git bisect bad 269e7f007e855c2bdedf8bad64ef13f516a608a6 # bad: [2ab8b61f2f22f1a1ce7f92cd6b8dd32d2c0c227d] [FLINK-12122] Add EvenlySpreadOutLocationPreferenceSlotSelectionStrategy git bisect bad 2ab8b61f2f22f1a1ce7f92cd6b8dd32d2c0c227d # first bad commit: [2ab8b61f2f22f1a1ce7f92cd6b8dd32d2c0c227d] [FLINK-12122] Add EvenlySpreadOutLocationPreferenceSlotSelectionStrategy {code} I'm using the streaming WordCount example that I modified to have two "inputs", similar to how the WordCount example is used in the YARN/kerberos/Docker test. Instead of using the input once we use it like this: {code} text = env.readTextFile(params.get("input")).union(env.readTextFile(params.get("input"))); {code} to create two inputs from the same path. A jar is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15007) Flink on YARN does not request required TaskExecutors in some cases
Aljoscha Krettek created FLINK-15007: Summary: Flink on YARN does not request required TaskExecutors in some cases Key: FLINK-15007 URL: https://issues.apache.org/jira/browse/FLINK-15007 Project: Flink Issue Type: Bug Components: Runtime / Coordination, Runtime / Task Affects Versions: 1.10.0 Reporter: Aljoscha Krettek Fix For: 1.10.0 This was discovered while debugging FLINK-14834. In some cases Flink does not request new {{TaskExecutors}} even though new slots are requested. You can see this in some of the logs attached to FLINK-14834. You can reproduce this using https://github.com/aljoscha/docker-hadoop-cluster to bring up a YARN cluster and then running a compiled Flink in there. When you run {code} bin/flink run -m yarn-cluster -p 3 -yjm 1200 -ytm 1200 /root/DualInputWordCount.jar --input hdfs:///wc-in-1 --input hdfs:///wc-in-2 --output hdfs:///wc-out && hdfs dfs -rm -r /wc-out {code} the job waits and eventually fails because it does not have enough slots. (You can see in the log that 3 new slots are requested but only 2 {{TaskExecutors}} are requested. When you run {code} bin/flink run -m yarn-cluster -p 3 -yjm 1224 -ytm 1224 /root/DualInputWordCount.jar --input hdfs:///wc-in-1 --input hdfs:///wc-in-2 --output hdfs:///wc-out && hdfs dfs -rm -r /wc-out {code} runs successfully. This is the {{git bisect}} log that identifies the first faulty commit ([https://github.com/apache/flink/commit/9fed0ddc5bc015f98246a2d8d9adbe5fb2b91ba4|http://example.com]): {code} git bisect start # good: [09f2f43a1d73c76bf4d3f4a1205269eb860deb14] [FLINK-14154][ml] Add the class for multivariate Gaussian Distribution. git bisect good 09f2f43a1d73c76bf4d3f4a1205269eb860deb14 # bad: [85905f80e9711967711c2992612dccdd2cc211ac] [FLINK-14834][tests] Disable flaky yarn_kerberos_docker (default input) test git bisect bad 85905f80e9711967711c2992612dccdd2cc211ac # good: [c9c8a29b1b2e4f2886fba1524432f9788b564e61] [FLINK-14759][coordination] Remove unused class TaskManagerCliOptions git bisect good c9c8a29b1b2e4f2886fba1524432f9788b564e61 # good: [c9c8a29b1b2e4f2886fba1524432f9788b564e61] [FLINK-14759][coordination] Remove unused class TaskManagerCliOptions git bisect good c9c8a29b1b2e4f2886fba1524432f9788b564e61 # bad: [ae539c97c858b94e0e2504b54a8517ac1383482a] [hotfix][runtime] Check managed memory fraction range when setting it into StreamConfig git bisect bad ae539c97c858b94e0e2504b54a8517ac1383482a # good: [01d6972ab267807b8afccb09a45c454fa76d6c4b] [hotfix] Refactor out slots creation from the TaskSlotTable constructor git bisect good 01d6972ab267807b8afccb09a45c454fa76d6c4b # bad: [d32e1d00854e24bc4bb3aad6d866c2d709acd993] [FLINK-14594][core] Change Resource to use BigDecimal as its value git bisect bad d32e1d00854e24bc4bb3aad6d866c2d709acd993 # bad: [25f87ec208a642283e995811d809632129ca289a] [FLINK-11935][table-planner] Fix cast timestamp/date to string to avoid Gregorian cutover git bisect bad 25f87ec208a642283e995811d809632129ca289a # bad: [21c6b85a6f5991aabcbcd41fedc860d662d478fb] [FLINK-14842][table] add logging for loaded modules and functions git bisect bad 21c6b85a6f5991aabcbcd41fedc860d662d478fb # bad: [48986aa0b89de731d1b9136b59d409933cc15408] [hotfix] Remove unnecessary comments about memory size calculation before network init git bisect bad 48986aa0b89de731d1b9136b59d409933cc15408 # bad: [4c4652efa43ed8ab456f5f63c89b57d8c4a621f8] [hotfix] Remove unused number of slots in MemoryManager git bisect bad 4c4652efa43ed8ab456f5f63c89b57d8c4a621f8 # bad: [9fed0ddc5bc015f98246a2d8d9adbe5fb2b91ba4] [FLINK-14400] Shrink the scope of MemoryManager from TaskExecutor to slot git bisect bad 9fed0ddc5bc015f98246a2d8d9adbe5fb2b91ba4 # first bad commit: [9fed0ddc5bc015f98246a2d8d9adbe5fb2b91ba4] [FLINK-14400] Shrink the scope of MemoryManager from TaskExecutor to slot {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14992) Add job submission listener to execution environments
Aljoscha Krettek created FLINK-14992: Summary: Add job submission listener to execution environments Key: FLINK-14992 URL: https://issues.apache.org/jira/browse/FLINK-14992 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek Fix For: 1.10.0 We should add a way of registering listeners that are notified when a job is submitted for execution using an environment. This is useful for cases where a framework, for example the Zeppelin Notebook, creates an environment for the user, the user can submit jobs, but the framework needs a handle to the job in order to manage it. This can be as simple as {code} interface JobSubmissionListener { void notify(JobClient jobClient) } {code} with a method {{registerJobSubmissionListener(JobSubmissionListener)}} on the environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14968) Kerberized YARN on Docker test (custom fs plugin) fails on Travis
Aljoscha Krettek created FLINK-14968: Summary: Kerberized YARN on Docker test (custom fs plugin) fails on Travis Key: FLINK-14968 URL: https://issues.apache.org/jira/browse/FLINK-14968 Project: Flink Issue Type: Bug Components: Deployment / YARN, Tests Affects Versions: 1.10.0 Reporter: Gary Yao Assignee: Aljoscha Krettek Fix For: 1.10.0 https://api.travis-ci.org/v3/job/612782888/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14854) Add executeAsync() method to environments
Aljoscha Krettek created FLINK-14854: Summary: Add executeAsync() method to environments Key: FLINK-14854 URL: https://issues.apache.org/jira/browse/FLINK-14854 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek The new {{executeAsync()}} method should return a {{JobClient}}. This exposes the new executor/job client work on the user API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14840) Use new Executor interface in SQL cli
Aljoscha Krettek created FLINK-14840: Summary: Use new Executor interface in SQL cli Key: FLINK-14840 URL: https://issues.apache.org/jira/browse/FLINK-14840 Project: Flink Issue Type: Sub-task Components: Table SQL / Client Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Currently, the SQL cli has custom code for job deployment in {{ProgramDeployer}}. We should replace this by using the newly introduced {{Executor}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14412) Rename ML Pipeline to MLPipeline
Aljoscha Krettek created FLINK-14412: Summary: Rename ML Pipeline to MLPipeline Key: FLINK-14412 URL: https://issues.apache.org/jira/browse/FLINK-14412 Project: Flink Issue Type: Bug Components: Library / Machine Learning Reporter: Aljoscha Krettek In FLINK-14290 we introduced a {{Pipeline}} interface in {{flink-core}} as the common interface of Flink Jobs/Pipelines. Unfortunately, this name clashes with {{Pipeline}} in the ML package. My suggestion is to rename {{Pipeline}} in the ML package to {{MLPipeline}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14391) Remove FlinkPlan Interface
Aljoscha Krettek created FLINK-14391: Summary: Remove FlinkPlan Interface Key: FLINK-14391 URL: https://issues.apache.org/jira/browse/FLINK-14391 Project: Flink Issue Type: Sub-task Components: API / DataSet, API / DataStream Reporter: Aljoscha Krettek In the process of implementing FLINK-14290 [~tison] noticed that we can remove {{FlinkPlan}} after introducing {{Pipeline}} and changing the translation logic. It was only introduced as a somewhat hacky workaround for the fact that {{OptimizedPlan}} and {{StreamGraph}} don't have a common base class/interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)