[GitHub] [flink] flinkbot commented on issue #8083: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9
flinkbot commented on issue #8083: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9 URL: https://github.com/apache/flink/pull/8083#issuecomment-478208078 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] leesf opened a new pull request #8083: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9
leesf opened a new pull request #8083: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9 URL: https://github.com/apache/flink/pull/8083 ## What is the purpose of the change Fix error in BlockingShutdownTest on Java 9. ## Brief change log Add clazz.getName.equals("java.lang.ProcessImpl") in if condition. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12050) BlockingShutdownTest fails on Java 9
[ https://issues.apache.org/jira/browse/FLINK-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12050: --- Labels: pull-request-available (was: ) > BlockingShutdownTest fails on Java 9 > > > Key: FLINK-12050 > URL: https://issues.apache.org/jira/browse/FLINK-12050 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: leesf >Priority: Major > Labels: pull-request-available > > {code} > 21:21:28.689 [ERROR] > testProcessShutdownBlocking(org.apache.flink.runtime.util.BlockingShutdownTest) > Time elapsed: 0.961 s <<< FAILURE! > java.lang.AssertionError: Cannot determine process ID > at > org.apache.flink.runtime.util.BlockingShutdownTest.testProcessShutdownBlocking(BlockingShutdownTest.java:57) > 21:21:28.689 [ERROR] > testProcessExitsDespiteBlockingShutdownHook(org.apache.flink.runtime.util.BlockingShutdownTest) > Time elapsed: 0.325 s <<< FAILURE! > java.lang.AssertionError: Cannot determine process ID > at > org.apache.flink.runtime.util.BlockingShutdownTest.testProcessExitsDespiteBlockingShutdownHook(BlockingShutdownTest.java:98) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] leesf closed pull request #8082: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9
leesf closed pull request #8082: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9 URL: https://github.com/apache/flink/pull/8082 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8082: [FLINk-12050][Tests] BlockingShutdownTest fails on Java 9
flinkbot commented on issue #8082: [FLINk-12050][Tests] BlockingShutdownTest fails on Java 9 URL: https://github.com/apache/flink/pull/8082#issuecomment-478204878 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] leesf opened a new pull request #8082: [FLINk-12050][Tests] BlockingShutdownTest fails on Java 9
leesf opened a new pull request #8082: [FLINk-12050][Tests] BlockingShutdownTest fails on Java 9 URL: https://github.com/apache/flink/pull/8082 ## What is the purpose of the change Fix error in BlockingShutdownTest on Java 9. ## Brief change log Add clazz.getName.equals("java.lang.ProcessImpl") in if condition. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804019#comment-16804019 ] TisonKun edited comment on FLINK-12048 at 3/30/19 2:34 AM: --- -I think this is caused because after FLINK-11718 we start a dispatcher and when {{#onStart}} called in the main thread, the {{SubmittedJobGraphStore}} just started.- -For this test we need to find an approach to explicitly wait for the dispatcher get started.- -For production code we need to move {{pathCache.getListenable().addListener(new SubmittedJobGraphsPathCacheListener());}} to {{ZooKeeperSubmittedJobGraphStore#start}} and thus {{#onAddedJobGraph}} isn't called before the {{ZooKeeperSubmittedJobGraphStore}} started.- -Further and personally, we'd better nudge FLINK-10333 to remove {{SubmittedJobGraphListener}}- -cc- [~till.rohrmann]-- {{pathCache.start();}} is called in {{ZooKeeperSubmittedJobGraphStore#start()}}. My bad. was (Author: tison): I think this is caused because after FLINK-11718 we start a dispatcher and when {{#onStart}} called in the main thread, the {{SubmittedJobGraphStore}} just started. For this test we need to find an approach to explicitly wait for the dispatcher get started. For production code we need to move {{pathCache.getListenable().addListener(new SubmittedJobGraphsPathCacheListener());}} to {{ZooKeeperSubmittedJobGraphStore#start}} and thus {{#onAddedJobGraph}} isn't called before the {{ZooKeeperSubmittedJobGraphStore}} started. Further and personally, we'd better nudge FLINK-10333 to remove {{SubmittedJobGraphListener}} cc [~till.rohrmann] > ZooKeeperHADispatcherTest failed on Travis > -- > > Key: FLINK-12048 > URL: https://issues.apache.org/jira/browse/FLINK-12048 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.8.0, 1.9.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > > https://travis-ci.org/apache/flink/builds/512077301 > {code} > 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 9.671 s <<< FAILURE! - in > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest > 01:14:56.364 [ERROR] > testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest) > Time elapsed: 1.209 s <<< ERROR! > org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: > org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the > added job d51eeb908f360e44c0a2004e00a6afd2 > at > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117) > Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not > start the added job d51eeb908f360e44c0a2004e00a6afd2 > Caused by: java.lang.IllegalStateException: Not running. Forgot to call > start()? > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
[ https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805325#comment-16805325 ] TisonKun commented on FLINK-11828: -- Although this is the earlier issue, further discussion happened in FLINK-12048. Thus close this as a duplication of FLINK-12048. > ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable > -- > > Key: FLINK-11828 > URL: https://issues.apache.org/jira/browse/FLINK-11828 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.8.0 >Reporter: Andrey Zagrebin >Priority: Critical > Labels: test-stability > > I observed locally on Mac that > ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes > sporadically fails when I run the whole test package > org.apache.flink.runtime.dispatcher in IntelliJ Idea: > {code:java} > org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: > org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the > added job 3cdec37e27b590a6f87b6c52151aa17d > at > org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51) > at > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not > start the added job 3cdec37e27b590a6f87b6c52151aa17d > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) > at >
[jira] [Updated] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
[ https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-11828: - Release Note: (was: Although this is the earlier issue, further discussion happened in FLINK-12048. Thus close this as a duplication of FLINK-12048.) > ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable > -- > > Key: FLINK-11828 > URL: https://issues.apache.org/jira/browse/FLINK-11828 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.8.0 >Reporter: Andrey Zagrebin >Priority: Critical > Labels: test-stability > > I observed locally on Mac that > ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes > sporadically fails when I run the whole test package > org.apache.flink.runtime.dispatcher in IntelliJ Idea: > {code:java} > org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: > org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the > added job 3cdec37e27b590a6f87b6c52151aa17d > at > org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51) > at > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not > start the added job 3cdec37e27b590a6f87b6c52151aa17d > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) > at >
[jira] [Updated] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-12048: - Affects Version/s: 1.8.0 > ZooKeeperHADispatcherTest failed on Travis > -- > > Key: FLINK-12048 > URL: https://issues.apache.org/jira/browse/FLINK-12048 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.8.0, 1.9.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > > https://travis-ci.org/apache/flink/builds/512077301 > {code} > 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 9.671 s <<< FAILURE! - in > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest > 01:14:56.364 [ERROR] > testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest) > Time elapsed: 1.209 s <<< ERROR! > org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: > org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the > added job d51eeb908f360e44c0a2004e00a6afd2 > at > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117) > Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not > start the added job d51eeb908f360e44c0a2004e00a6afd2 > Caused by: java.lang.IllegalStateException: Not running. Forgot to call > start()? > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
[ https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun closed FLINK-11828. Resolution: Duplicate Release Note: Although this is the earlier issue, further discussion happened in FLINK-12048. Thus close this as a duplication of FLINK-12048. > ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable > -- > > Key: FLINK-11828 > URL: https://issues.apache.org/jira/browse/FLINK-11828 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.8.0 >Reporter: Andrey Zagrebin >Priority: Critical > Labels: test-stability > > I observed locally on Mac that > ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes > sporadically fails when I run the whole test package > org.apache.flink.runtime.dispatcher in IntelliJ Idea: > {code:java} > org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: > org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the > added job 3cdec37e27b590a6f87b6c52151aa17d > at > org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51) > at > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not > start the added job 3cdec37e27b590a6f87b6c52151aa17d > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) > at >
[jira] [Commented] (FLINK-11935) Remove DateTimeUtils pull-in and fix datetime casting problem
[ https://issues.apache.org/jira/browse/FLINK-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805307#comment-16805307 ] Rong Rong commented on FLINK-11935: --- Thanks for checking in [~yanghua]. I think we can skip the calcite upgrade to 1.19 because the file did not change between 1.18 and 1.19 i believe. I think further investigation is needed to answer: "can we remove DateTimeUtils file". if removing DateTimeUtils and fix the cast problem is possible we can just go with that path. (as you described) if not, we will need to # backport the DateTimeUtils to latest version Flink use (which is 1.18), # make changes on top as needed # create Calcite/Avatica Jira ticket and marked in the DataTimeUtils file to properly log the issue why we still need the file. Once we upgrade to 1.19 in the master ticket FLINK-11921. we will revisit and see if removing the file is possible during the main upgrade, if not a follow up ticket will be created. Do you think this can be the right approach? > Remove DateTimeUtils pull-in and fix datetime casting problem > - > > Key: FLINK-11935 > URL: https://issues.apache.org/jira/browse/FLINK-11935 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Rong Rong >Assignee: vinoyang >Priority: Major > > This {{DateTimeUtils}} was pulled in in FLINK-7235. > Originally the time operation was not correctly done via the {{ymdToJulian}} > function before the date {{1970-01-01}} thus we need the fix. similar to > addressing this problem: > {code:java} > Optimized :1017-12-05 22:58:58.998 > Expected :1017-11-29 22:58:58.998 > Actual :1017-12-05 22:58:58.998 > {code} > > However, after pulling in avatica 1.13, I found out that the optimized plans > of the time operations are actually correct. it is in fact the casting part > that creates problem: > For example, the following: > *{{(plus(-12000.months, cast('2017-11-29 22:58:58.998', TIMESTAMP))}}* > result in a StringTestExpression of: > *{{CAST(1017-11-29 22:58:58.998):VARCHAR(65536) CHARACTER SET "UTF-16LE" > COLLATE "ISO-8859-1$en_US$primary" NOT NULL}}* > but the testing results are: > {code:java} > Optimized :1017-11-29 22:58:58.998 > Expected :1017-11-29 22:58:58.998 > Actual :1017-11-23 22:58:58.998 > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] tzulitai commented on issue #8076: [FLINK-12064] [core, State Backends] RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore
tzulitai commented on issue #8076: [FLINK-12064] [core, State Backends] RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore URL: https://github.com/apache/flink/pull/8076#issuecomment-478061022 Just to clarify: the problem is that the state backend uses the correct key serializer for state access, but the wrong one is snapshotted in checkpoints, is that correct? Otherwise, the title / description implies that it is using the wrong key serializer for runtime state access. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer
flinkbot commented on issue #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer URL: https://github.com/apache/flink/pull/8081#issuecomment-478060987 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] tweise commented on issue #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer
tweise commented on issue #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer URL: https://github.com/apache/flink/pull/8081#issuecomment-478060762 CC: @glaksh100 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] tweise opened a new pull request #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer
tweise opened a new pull request #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer URL: https://github.com/apache/flink/pull/8081 Backport to 1.8.x This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12064) RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore
[ https://issues.apache.org/jira/browse/FLINK-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805168#comment-16805168 ] Tzu-Li (Gordon) Tai commented on FLINK-12064: - Just to clarify: the problem is that the state backend uses the correct key serializer for state access, but the wrong one is snapshotted in checkpoints, is that correct? Otherwise, the title implies that it is using the wrong key serializer for runtime state access. > RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens > during restore > > > Key: FLINK-12064 > URL: https://issues.apache.org/jira/browse/FLINK-12064 > Project: Flink > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li >Priority: Blocker > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > As titled, in current {{RocksDBKeyedStateBackend}} we use {{keySerializer}} > rather than {{keySerializerProvider.currentSchemaSerializer()}}, which is > incorrect. The issue is not revealed in existing UT since current cases > didn't check snapshot after state schema migration. > This is a regression issue caused by the FLINK-10043 refactoring work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12070) Make blocking result partitions consumable multiple times
Till Rohrmann created FLINK-12070: - Summary: Make blocking result partitions consumable multiple times Key: FLINK-12070 URL: https://issues.apache.org/jira/browse/FLINK-12070 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: Till Rohrmann In order to avoid writing produced results multiple times for multiple consumers and in order to speed up batch recoveries, we should make the blocking result partitions to be consumable multiple times. At the moment a blocking result partition will be released once the consumers has processed all data. Instead the result partition should be released once the next blocking result has been produced and all consumers of a blocking result partition have terminated. Moreover, blocking results should not hold on slot resources like network buffers or memory as it is currently the case with {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12069) Add proper lifecycle management for intermediate result partitions
Till Rohrmann created FLINK-12069: - Summary: Add proper lifecycle management for intermediate result partitions Key: FLINK-12069 URL: https://issues.apache.org/jira/browse/FLINK-12069 Project: Flink Issue Type: Improvement Components: Runtime / Coordination, Runtime / Network Affects Versions: 1.8.0, 1.9.0 Reporter: Till Rohrmann In order to properly execute batch jobs, we should make the lifecycle management of intermediate result partitions the responsibility of the {{JobMaster}}/{{Scheduler}} component. The {{Scheduler}} knows best when an intermediate result partition is no longer needed and, thus, can be freed. So for example, a blocking intermediate result should only be released after all subsequent blocking intermediate results have been completed in order to speed up potential failovers. Moreover, having explicit control over intermediate result partitions, could also enable use cases like result partition sharing between jobs and even across clusters (by simply not releasing the result partitions). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8080: [hotfix][runtime] Improve exception message in SingleInputGate
flinkbot commented on issue #8080: [hotfix][runtime] Improve exception message in SingleInputGate URL: https://github.com/apache/flink/pull/8080#issuecomment-478056398 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] leesf opened a new pull request #8080: [hotfix][runtime] Improve exception message in SingleInputGate
leesf opened a new pull request #8080: [hotfix][runtime] Improve exception message in SingleInputGate URL: https://github.com/apache/flink/pull/8080 Improve exception message in SingleInputGate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12068) Backtrack fail over regions if intermediate results are unavailable
Till Rohrmann created FLINK-12068: - Summary: Backtrack fail over regions if intermediate results are unavailable Key: FLINK-12068 URL: https://issues.apache.org/jira/browse/FLINK-12068 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Till Rohrmann The batch failover strategy needs to be able to backtrack fail over regions if an intermediate result is unavailable. Either by explicitly checking whether the intermediate result partition is available or via a special exception indicating that a result partition is no longer available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol commented on a change in pull request #8036: [FLINK-11431][runtime] Upgrade akka to 2.5
zentol commented on a change in pull request #8036: [FLINK-11431][runtime] Upgrade akka to 2.5 URL: https://github.com/apache/flink/pull/8036#discussion_r270454915 ## File path: flink-dist/src/main/resources/META-INF/NOTICE ## @@ -41,7 +41,7 @@ The following dependencies all share the same BSD license which you find under l - org.scala-lang:scala-library:2.11.12 - org.scala-lang:scala-reflect:2.11.12 - org.scala-lang.modules:scala-java8-compat_2.11:0.7.0 -- org.scala-lang.modules:scala-parser-combinators_2.11:1.0.4 +- org.scala-lang.modules:scala-parser-combinators_2.11:1.1.1 Review comment: right, they do, but without declaring the dependency , but given how it's in a rather central place I'd again assume that the tests cover this sufficiently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zentol commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
zentol commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#issuecomment-478034908 All done, big part of the PR is now renaming variables/methods from *path to *address, which is rather unfortunate... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-6227) Introduce the DataConsumptionException for downstream task failure
[ https://issues.apache.org/jira/browse/FLINK-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805081#comment-16805081 ] zhijiang commented on FLINK-6227: - Thanks for the concern [~till.rohrmann]. I could work on it if needed after the whole fine-grained recovery is confirmed. :) > Introduce the DataConsumptionException for downstream task failure > -- > > Key: FLINK-6227 > URL: https://issues.apache.org/jira/browse/FLINK-6227 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Reporter: zhijiang >Priority: Minor > > It is part of FLIP-1. > We define a new special exception to indicate the downstream task failure in > consuming upstream data. > The {{JobManager}} will receive and consider this special exception to > calculate the minimum connected sub-graph which is called {{FailoverRegion}}. > So the {{DataConsumptionException}} should contain {{ResultPartitionID}} > information for {{JobMaster}} tracking upstream executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805079#comment-16805079 ] Till Rohrmann commented on FLINK-12048: --- The problem is actually a callback to the {{#onAddedJobGraph}} of the second {{Dispatcher}} which is so delayed that it is executed after the second {{Dispatcher}} has been shut down. Here is a commit with which one can reproduce the interleaving locally: https://github.com/tillrohrmann/flink/commit/f361cdec484e707061e0cbbd727f417fbe60e8b7. As part of FLINK-11843 I want to rework that a {{Dispatcher}} is only running if it has the leadership and not if it is on stand by. This could fix the problem. Moreover, we should make sure that no concurrent operations are ongoing when we terminate the {{Dispatcher}}. > ZooKeeperHADispatcherTest failed on Travis > -- > > Key: FLINK-12048 > URL: https://issues.apache.org/jira/browse/FLINK-12048 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > > https://travis-ci.org/apache/flink/builds/512077301 > {code} > 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 9.671 s <<< FAILURE! - in > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest > 01:14:56.364 [ERROR] > testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest) > Time elapsed: 1.209 s <<< ERROR! > org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: > org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the > added job d51eeb908f360e44c0a2004e00a6afd2 > at > org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117) > Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not > start the added job d51eeb908f360e44c0a2004e00a6afd2 > Caused by: java.lang.IllegalStateException: Not running. Forgot to call > start()? > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#discussion_r270430314 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java ## @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID resourceID) { } @Override - public CompletableFuture> requestMetricQueryServiceAddress(Time timeout) { - return CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath)); + public CompletableFuture> requestMetricQueryServiceGateway(Time timeout) { Review comment: yes it is very similar to the previous behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9
flinkbot edited a comment on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9 URL: https://github.com/apache/flink/pull/8079#issuecomment-478011172 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ✅ 1. The [description] looks good. - Approved by @zentol [PMC] * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zentol commented on a change in pull request #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9
zentol commented on a change in pull request #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9 URL: https://github.com/apache/flink/pull/8079#discussion_r270426021 ## File path: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/HBaseTestingClusterAutostarter.java ## @@ -211,8 +211,7 @@ private static void createHBaseSiteXml(File hbaseSiteXmlDirectory, String zookee private static void addDirectoryToClassPath(File directory) { try { - // Get the classloader actually used by HBaseConfiguration - ClassLoader classLoader = HBaseConfiguration.create().getClassLoader(); Review comment: The point of this method is to mutate the default classloader so that the hbase-site.xml is on the classpath later on. By creating a new classloader that is immediately discarded once the method exists and never used to actually load anything this method becomes a no-op, which isn't acceptable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-9363) Bump up the Jackson version
[ https://issues.apache.org/jira/browse/FLINK-9363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler updated FLINK-9363: Affects Version/s: shaded-6.0 Fix Version/s: shaded-7.0 Component/s: (was: Runtime / Coordination) BuildSystem / Shaded > Bump up the Jackson version > --- > > Key: FLINK-9363 > URL: https://issues.apache.org/jira/browse/FLINK-9363 > Project: Flink > Issue Type: Improvement > Components: BuildSystem / Shaded >Affects Versions: shaded-6.0 >Reporter: Ted Yu >Assignee: aloyszhang >Priority: Major > Labels: pull-request-available, security > Fix For: shaded-7.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > CVE's for Jackson : > CVE-2017-17485 > CVE-2018-5968 > CVE-2018-7489 > We can upgrade to 2.9.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12067) Refactor the constructor of NetworkEnvironment
zhijiang created FLINK-12067: Summary: Refactor the constructor of NetworkEnvironment Key: FLINK-12067 URL: https://issues.apache.org/jira/browse/FLINK-12067 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: zhijiang Assignee: zhijiang The constructor of {{NetworkEnvironment}} could be refactored to only contain {{NetworkEnvironmentConfiguration}}, the other related components such as {{TaskEventDispatcher}}, {{ResultPartitionManager}}, {{NetworkBufferPool}} could be created internally. We also refactor the process of generating {{NetworkEnvironmentConfiguration}} in {{TaskManagerServiceConfiguration}} to add {{numNetworkBuffers}} instead of previous {{networkBufFraction}}, {{networkBufMin}}, {{networkBufMax}}. Further we introduce the {{NetworkEnvironmentConfigurationBuilder}} for creating {{NetworkEnvironmentConfiguration}} easily especially for tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9
flinkbot commented on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9 URL: https://github.com/apache/flink/pull/8079#issuecomment-478011172 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12054) HBaseConnectorITCase fails on Java 9
[ https://issues.apache.org/jira/browse/FLINK-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12054: --- Labels: pull-request-available (was: ) > HBaseConnectorITCase fails on Java 9 > > > Key: FLINK-12054 > URL: https://issues.apache.org/jira/browse/FLINK-12054 > Project: Flink > Issue Type: Sub-task > Components: Build System, Connectors / HBase >Reporter: Chesnay Schepler >Assignee: leesf >Priority: Major > Labels: pull-request-available > > An issue in hbase. > {code} > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 21.83 sec <<< > FAILURE! - in org.apache.flink.addons.hbase.HBaseConnectorITCase > org.apache.flink.addons.hbase.HBaseConnectorITCase Time elapsed: 21.829 sec > <<< FAILURE! > java.lang.AssertionError: We should get a URLClassLoader > at > org.apache.flink.addons.hbase.HBaseConnectorITCase.activateHBaseCluster(HBaseConnectorITCase.java:81) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] leesf opened a new pull request #8079: [FLINK-12054] HBaseConnectorITCase fails on Java 9
leesf opened a new pull request #8079: [FLINK-12054] HBaseConnectorITCase fails on Java 9 URL: https://github.com/apache/flink/pull/8079 ## What is the purpose of the change Fix the error in HBaseConnectorITCase on Java 9. ## Brief change log Create a new URLClassLoader with directory. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-9679) Implement ConfluentRegistryAvroSerializationSchema
[ https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominik Wosiński updated FLINK-9679: Summary: Implement ConfluentRegistryAvroSerializationSchema (was: Implement AvroSerializationSchema) > Implement ConfluentRegistryAvroSerializationSchema > -- > > Key: FLINK-9679 > URL: https://issues.apache.org/jira/browse/FLINK-9679 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.6.0 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Major > Labels: pull-request-available > > Implement AvroSerializationSchema using Confluent Schema Registry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-12050) BlockingShutdownTest fails on Java 9
[ https://issues.apache.org/jira/browse/FLINK-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned FLINK-12050: - Assignee: leesf > BlockingShutdownTest fails on Java 9 > > > Key: FLINK-12050 > URL: https://issues.apache.org/jira/browse/FLINK-12050 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: leesf >Priority: Major > > {code} > 21:21:28.689 [ERROR] > testProcessShutdownBlocking(org.apache.flink.runtime.util.BlockingShutdownTest) > Time elapsed: 0.961 s <<< FAILURE! > java.lang.AssertionError: Cannot determine process ID > at > org.apache.flink.runtime.util.BlockingShutdownTest.testProcessShutdownBlocking(BlockingShutdownTest.java:57) > 21:21:28.689 [ERROR] > testProcessExitsDespiteBlockingShutdownHook(org.apache.flink.runtime.util.BlockingShutdownTest) > Time elapsed: 0.325 s <<< FAILURE! > java.lang.AssertionError: Cannot determine process ID > at > org.apache.flink.runtime.util.BlockingShutdownTest.testProcessExitsDespiteBlockingShutdownHook(BlockingShutdownTest.java:98) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] sorahn commented on issue #8016: [FLINK-10705][web]: Rework Flink Web Dashboard
sorahn commented on issue #8016: [FLINK-10705][web]: Rework Flink Web Dashboard URL: https://github.com/apache/flink/pull/8016#issuecomment-477990235 @vthinkxie I don't really see anything else, it all seems to work as advertised. I think the next step is getting this merged and getting some real-world testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] tillrohrmann commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers
tillrohrmann commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers URL: https://github.com/apache/flink/pull/8002#discussion_r270397980 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java ## @@ -144,11 +135,9 @@ public MetricRegistryImpl(MetricRegistryConfiguration config) { } } - Class reporterClass = Class.forName(className); - MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); + final MetricReporter reporterInstance = reporterSetup.getSupplier().get(); Review comment: Can't this happen outside of the `MetricRegistryImpl` when we create the `MetricReporter` or is there a contract that certain operations can only be when we create the `MetricRegistryImpl`? If the `MetricReporter` depend on the `MetricRegistryImpl` then I understand why we need a factory. But if not, then I think it would be easier to directly start the `MetricReporters`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (FLINK-11898) Support code generation for all Blink built-in functions and operators
[ https://issues.apache.org/jira/browse/FLINK-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu resolved FLINK-11898. - Resolution: Resolved Resolved in 1.9: 2d66d4ae42576426d9df67d54a99ed57034f90b7 > Support code generation for all Blink built-in functions and operators > -- > > Key: FLINK-11898 > URL: https://issues.apache.org/jira/browse/FLINK-11898 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Support code generation for built-in functions and operators. > FLINK-11788 has supported some of the operators. This issue is aiming to > complement the functions and operators supported in Flink SQL. > This should inlclude: CONCAT, LIKE, SUBSTRING, UPPER, LOWER, and so on. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] asfgit closed pull request #8029: [FLINK-11898] [table-planner-blink] Support code generation for all Blink built-in functions and operators
asfgit closed pull request #8029: [FLINK-11898] [table-planner-blink] Support code generation for all Blink built-in functions and operators URL: https://github.com/apache/flink/pull/8029 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-6227) Introduce the DataConsumptionException for downstream task failure
[ https://issues.apache.org/jira/browse/FLINK-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804926#comment-16804926 ] Till Rohrmann commented on FLINK-6227: -- This issue might still be relevant depending on how we plan to solve the backtracking problem. It could now be easier with the introduction of FLINK-10289. > Introduce the DataConsumptionException for downstream task failure > -- > > Key: FLINK-6227 > URL: https://issues.apache.org/jira/browse/FLINK-6227 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Reporter: zhijiang >Priority: Minor > > It is part of FLIP-1. > We define a new special exception to indicate the downstream task failure in > consuming upstream data. > The {{JobManager}} will receive and consider this special exception to > calculate the minimum connected sub-graph which is called {{FailoverRegion}}. > So the {{DataConsumptionException}} should contain {{ResultPartitionID}} > information for {{JobMaster}} tracking upstream executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10288) Failover Strategy improvement
[ https://issues.apache.org/jira/browse/FLINK-10288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804923#comment-16804923 ] Till Rohrmann commented on FLINK-10288: --- Point 4. might already be covered by the design of FLINK-4256 but has not been implemented yet. > Failover Strategy improvement > - > > Key: FLINK-10288 > URL: https://issues.apache.org/jira/browse/FLINK-10288 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Reporter: JIN SUN >Assignee: ryantaocer >Priority: Major > > Flink pays significant efforts to make Streaming Job fault tolerant. The > checkpoint mechanism and exactly once semantics make Flink different than > other systems. However, there are still some cases not been handled very > well. Those cases can apply to both Streaming and Batch scenarios, and its > orthogonal with current fault tolerant mechanism. Here is a summary of those > cases: > # Some failures are non-recoverable, such as a user error: > DividebyZeroException. We shouldn't try to restart the task, as it will never > succeed. The DivideByZeroException is just a simple case, those errors > sometime are not easy to reproduce or predict, as it might be only triggered > by specific input data, we shouldn’t retry for all user code exceptions. > # There is no limit for task retry today, unless a SuppressRestartException > was encountered, a task will keep on retrying until it succeeds. As mentioned > above, we shouldn’t retry for some cases at all, and for the Exceptions we > can retry, such as a network exception, should we have a retry limit? We need > retry for any transient issue, but we also need to set a limit to avoid > infinite retry and resource wasting. For Batch and Streaming workload, we > might need different strategies. > # There are some exceptions due to hardware issues, such as disk/network > malfunction. when a task/TaskManager fail on this, we’d better detect and > avoid to schedule to that machine next time. > # If a task read from a blocking result partition, when its input is not > available, we can ‘revoke’ the produce task, set the task fail and rerun the > upstream task to regenerate data. the revoke can propagate up through the > chain. In Spark, revoke is naturally support by lineage. > To make fault tolerance easier, we need to keep deterministic behavior as > much as possible. For user code, it’s not easy to control. However, for > system related code, we can fix it. For example, we should at least make sure > the different attempt of a same task to have the same inputs (we have a bug > in current codebase (DataSourceTask) that cannot guarantee this). Note that > this is track by [Flink-10205] > Details see this proposal: > [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] wuchong commented on issue #8029: [FLINK-11898] [table-planner-blink] Support code generation for all Blink built-in functions and operators
wuchong commented on issue #8029: [FLINK-11898] [table-planner-blink] Support code generation for all Blink built-in functions and operators URL: https://github.com/apache/flink/pull/8029#issuecomment-477986803 Merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (FLINK-12020) Add documentation for mesos-appmaster-job.sh
[ https://issues.apache.org/jira/browse/FLINK-12020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Yin reassigned FLINK-12020: - Assignee: Jacky Yin > Add documentation for mesos-appmaster-job.sh > > > Key: FLINK-12020 > URL: https://issues.apache.org/jira/browse/FLINK-12020 > Project: Flink > Issue Type: Improvement > Components: Deployment / Mesos, Documentation >Affects Versions: 1.7.2, 1.8.0 >Reporter: Till Rohrmann >Assignee: Jacky Yin >Priority: Minor > > The Flink documentation is currently lacking information about the > {{mesos-appmaster-job.sh}} and how to use it. It would be helpful for our > users to add documentation and examples how to use it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot edited a comment on issue #7940: [hotfix][docs] fix error in functions example
flinkbot edited a comment on issue #7940: [hotfix][docs] fix error in functions example URL: https://github.com/apache/flink/pull/7940#issuecomment-470805294 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❗ 3. Needs [attention] from. - Needs attention by @dawidwys [committer], @twalthr [PMC], @zentol [PMC] * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] leesf commented on issue #7940: [hotfix][docs] fix error in functions example
leesf commented on issue #7940: [hotfix][docs] fix error in functions example URL: https://github.com/apache/flink/pull/7940#issuecomment-477981097 @flinkbot attention @dawidwys would you review this PR in your free time? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-9307) Running flink program doesn't print infos in console any more
[ https://issues.apache.org/jira/browse/FLINK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804907#comment-16804907 ] Chesnay Schepler commented on FLINK-9307: - Previously the client was notified of job status updates, this is no longer the case since we switched over to REST. > Running flink program doesn't print infos in console any more > - > > Key: FLINK-9307 > URL: https://issues.apache.org/jira/browse/FLINK-9307 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Affects Versions: 1.6.0 > Environment: Mac OS X 12.10.6 > Flink 1.6-SNAPSHOT commit c8fa8d025684c222582 . >Reporter: TisonKun >Priority: Critical > > As shown in [this > page|https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html#run-the-example], > when flink run program, it should have printed infos like > {code:java} > Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123 > Using address 127.0.0.1:6123 to connect to JobManager. > JobManager web interface address http://127.0.0.1:8081 > Starting execution of program > Submitting job with JobID: 574a10c8debda3dccd0c78a3bde55e1b. Waiting for > job completion. > Connected to JobManager at > Actor[akka.tcp://flink@127.0.0.1:6123/user/jobmanager#297388688] > 11/04/2016 14:04:50 Job execution switched to status RUNNING. > 11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to > SCHEDULED > 11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to > DEPLOYING > 11/04/2016 14:04:50 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to SCHEDULED > 11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to DEPLOYING > 11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to RUNNING > 11/04/2016 14:04:51 Source: Socket Stream -> Flat Map(1/1) switched to > RUNNING > {code} > but when I follow the same command, it just printed > {code:java} > Starting execution of program > {code} > Those message are printed properly if I use flink 1.4, and also logged at > `log/flink--taskexecutor-0-localhost.log` when running on > 1.6-SNAPSHOT. > Now it breaks [this pull request|https://github.com/apache/flink/pull/5957] > and [~Zentol] suspects it's a regression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] carp84 commented on issue #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends
carp84 commented on issue #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends URL: https://github.com/apache/flink/pull/8078#issuecomment-477978217 The current PR also includes fix for FLINK-12066 (blocker) and will do a rebase after that goes in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#discussion_r270385597 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java ## @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID resourceID) { } @Override - public CompletableFuture> requestMetricQueryServiceAddress(Time timeout) { - return CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath)); + public CompletableFuture> requestMetricQueryServiceGateway(Time timeout) { Review comment: Then the behavior should be like we previously do with `actorSelection`. Thanks for your help and I would also learn how to pass an address and connect to the RpcEndpoint with it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12066) Remove StateSerializerProvider field in keyed state backend
[ https://issues.apache.org/jira/browse/FLINK-12066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12066: --- Labels: pull-request-available (was: ) > Remove StateSerializerProvider field in keyed state backend > --- > > Key: FLINK-12066 > URL: https://issues.apache.org/jira/browse/FLINK-12066 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Yu Li >Assignee: Yu Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > As mentioned in [PR review of > FLINK-10043|https://github.com/apache/flink/pull/7674#discussion_r257630962] > with Stefan and offline discussion with Gordon, after the refactoring work > serializer passed to {{RocksDBKeyedStateBackend}} constructor is a final one, > thus the {{StateSerializerProvider}} field is no longer needed. > For {{HeapKeyedStateBackend}}, the only thing stops us to pass a final > serializer is the circle dependency between the backend and > {{HeapRestoreOperation}}, and we aim to decouple them by introducing a new > {{HeapInternalKeyContext}} as the bridge. More details please refer to the > coming PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends
flinkbot commented on issue #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends URL: https://github.com/apache/flink/pull/8078#issuecomment-477977949 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] carp84 opened a new pull request #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends
carp84 opened a new pull request #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends URL: https://github.com/apache/flink/pull/8078 ## What is the purpose of the change This PR removes the {{StateSerializerProvider}} field in rocksdb and heap keyed backends, as per [discussed](https://github.com/apache/flink/pull/7674#discussion_r257630962) and planed in previous review of FLINK-10043 ## Brief change log * Removed {{StateSerializerProvider}} field from keyed backends * Introduced a new {{HeapInternalKeyContext}} which allows restore to happen before backend construction in heap backend builder. ## Verifying this change This change is already covered by existing tests, such as: * All tests under `org.apache.flink.runtime.state` package * All tests under `org.apache.flink.runtime.state.heap` package ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-9307) Running flink program doesn't print infos in console any more
[ https://issues.apache.org/jira/browse/FLINK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804894#comment-16804894 ] TisonKun commented on FLINK-9307: - It is still our case that "running flink program doesn't print infos in console". However, with FLINK-9282 we updated our documentation and now our document says it is the proper behavior. It depends on whether we want a patch to print infos in console as in flink 1.4. > Running flink program doesn't print infos in console any more > - > > Key: FLINK-9307 > URL: https://issues.apache.org/jira/browse/FLINK-9307 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Affects Versions: 1.6.0 > Environment: Mac OS X 12.10.6 > Flink 1.6-SNAPSHOT commit c8fa8d025684c222582 . >Reporter: TisonKun >Priority: Critical > > As shown in [this > page|https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html#run-the-example], > when flink run program, it should have printed infos like > {code:java} > Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123 > Using address 127.0.0.1:6123 to connect to JobManager. > JobManager web interface address http://127.0.0.1:8081 > Starting execution of program > Submitting job with JobID: 574a10c8debda3dccd0c78a3bde55e1b. Waiting for > job completion. > Connected to JobManager at > Actor[akka.tcp://flink@127.0.0.1:6123/user/jobmanager#297388688] > 11/04/2016 14:04:50 Job execution switched to status RUNNING. > 11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to > SCHEDULED > 11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to > DEPLOYING > 11/04/2016 14:04:50 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to SCHEDULED > 11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to DEPLOYING > 11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to RUNNING > 11/04/2016 14:04:51 Source: Socket Stream -> Flat Map(1/1) switched to > RUNNING > {code} > but when I follow the same command, it just printed > {code:java} > Starting execution of program > {code} > Those message are printed properly if I use flink 1.4, and also logged at > `log/flink--taskexecutor-0-localhost.log` when running on > 1.6-SNAPSHOT. > Now it breaks [this pull request|https://github.com/apache/flink/pull/5957] > and [~Zentol] suspects it's a regression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#discussion_r270384272 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java ## @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID resourceID) { } @Override - public CompletableFuture> requestMetricQueryServiceAddress(Time timeout) { - return CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath)); + public CompletableFuture> requestMetricQueryServiceGateway(Time timeout) { Review comment: You can't just add `Serializable` to a class and expect it to magically work. The `MQS` extends `RpcEndpoint` which is not serialiable and contains loads of fields that are neither. From what i can tell we have to return the RpcService address instead, but again, let me investigate first before making any changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12066) Remove StateSerializerProvider field in keyed state backend
Yu Li created FLINK-12066: - Summary: Remove StateSerializerProvider field in keyed state backend Key: FLINK-12066 URL: https://issues.apache.org/jira/browse/FLINK-12066 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Yu Li Assignee: Yu Li Fix For: 1.9.0 As mentioned in [PR review of FLINK-10043|https://github.com/apache/flink/pull/7674#discussion_r257630962] with Stefan and offline discussion with Gordon, after the refactoring work serializer passed to {{RocksDBKeyedStateBackend}} constructor is a final one, thus the {{StateSerializerProvider}} field is no longer needed. For {{HeapKeyedStateBackend}}, the only thing stops us to pass a final serializer is the circle dependency between the backend and {{HeapRestoreOperation}}, and we aim to decouple them by introducing a new {{HeapInternalKeyContext}} as the bridge. More details please refer to the coming PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] TisonKun commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
TisonKun commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#issuecomment-477975190 Do a rebase on FLINK-12057 and try to address comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#discussion_r270382585 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java ## @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID resourceID) { } @Override - public CompletableFuture> requestMetricQueryServiceAddress(Time timeout) { - return CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath)); + public CompletableFuture> requestMetricQueryServiceGateway(Time timeout) { Review comment: Is using `SerializableOptional` reasonable? It causes we implements `Serializable` for `MetricQueryServiceGateway` and thus for `MetricQueryService`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-9904) Allow users to control MaxDirectMemorySize
[ https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9904: - Affects Version/s: 1.9.0 1.8.0 1.7.2 > Allow users to control MaxDirectMemorySize > -- > > Key: FLINK-9904 > URL: https://issues.apache.org/jira/browse/FLINK-9904 > Project: Flink > Issue Type: Improvement > Components: Deployment / Scripts >Affects Versions: 1.4.2, 1.5.1, 1.7.2, 1.8.0, 1.9.0 >Reporter: Himanshu Roongta >Priority: Minor > > For people who use docker image and run flink in pods, currently, there is no > way to update > {{MaxDirectMemorySize}} > (Well one can create a custom version of > [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh]) > > As a result, it starts with a value of 8388607T . If the param > {{taskmanager.memory.preallocate}} is set to false (default) the clean up > will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle > kicks in. However with pods especially in kuberenete they will get killed > because pods do not run at such a high value. (In our case we run 8GB per pod) > > The fix would be to allow it be configurable via {{flink-conf}}. We can still > have a default of 8388607T to avoid a breaking change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9904) Allow users to control MaxDirectMemorySize
[ https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9904: - Component/s: (was: Runtime / Coordination) Deployment / Scripts > Allow users to control MaxDirectMemorySize > -- > > Key: FLINK-9904 > URL: https://issues.apache.org/jira/browse/FLINK-9904 > Project: Flink > Issue Type: Improvement > Components: Deployment / Scripts >Affects Versions: 1.4.2, 1.5.1 >Reporter: Himanshu Roongta >Priority: Minor > > For people who use docker image and run flink in pods, currently, there is no > way to update > {{MaxDirectMemorySize}} > (Well one can create a custom version of > [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh]) > > As a result, it starts with a value of 8388607T . If the param > {{taskmanager.memory.preallocate}} is set to false (default) the clean up > will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle > kicks in. However with pods especially in kuberenete they will get killed > because pods do not run at such a high value. (In our case we run 8GB per pod) > > The fix would be to allow it be configurable via {{flink-conf}}. We can still > have a default of 8388607T to avoid a breaking change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9904) Allow users to control MaxDirectMemorySize
[ https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804879#comment-16804879 ] Till Rohrmann commented on FLINK-9904: -- Yes, please provide a fix for this problem [~hroongta]. > Allow users to control MaxDirectMemorySize > -- > > Key: FLINK-9904 > URL: https://issues.apache.org/jira/browse/FLINK-9904 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.4.2, 1.5.1 >Reporter: Himanshu Roongta >Priority: Minor > > For people who use docker image and run flink in pods, currently, there is no > way to update > {{MaxDirectMemorySize}} > (Well one can create a custom version of > [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh]) > > As a result, it starts with a value of 8388607T . If the param > {{taskmanager.memory.preallocate}} is set to false (default) the clean up > will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle > kicks in. However with pods especially in kuberenete they will get killed > because pods do not run at such a high value. (In our case we run 8GB per pod) > > The fix would be to allow it be configurable via {{flink-conf}}. We can still > have a default of 8388607T to avoid a breaking change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9904) Allow users to control MaxDirectMemorySize
[ https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9904: - Priority: Minor (was: Critical) > Allow users to control MaxDirectMemorySize > -- > > Key: FLINK-9904 > URL: https://issues.apache.org/jira/browse/FLINK-9904 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.4.2, 1.5.1 >Reporter: Himanshu Roongta >Priority: Minor > > For people who use docker image and run flink in pods, currently, there is no > way to update > {{MaxDirectMemorySize}} > (Well one can create a custom version of > [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh]) > > As a result, it starts with a value of 8388607T . If the param > {{taskmanager.memory.preallocate}} is set to false (default) the clean up > will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle > kicks in. However with pods especially in kuberenete they will get killed > because pods do not run at such a high value. (In our case we run 8GB per pod) > > The fix would be to allow it be configurable via {{flink-conf}}. We can still > have a default of 8388607T to avoid a breaking change. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6455) Web UI should not show link to upload new jobs when using job mode cluster
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804877#comment-16804877 ] Till Rohrmann commented on FLINK-6455: -- Update: With Flip-6 it is no longer possible to submit jobs to a job mode cluster. However we still show the web UI link which should not be the case. > Web UI should not show link to upload new jobs when using job mode cluster > -- > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.2.1, 1.7.2, 1.8.0 >Reporter: Stephan Ewen >Priority: Minor > > When running a job in job mode, the web UI must not show the 'upload new job' > option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6455) Web UI should not show link to upload new jobs when using job mode cluster
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6455: - Summary: Web UI should not show link to upload new jobs when using job mode cluster (was: Yarn single-job JobManagers should not allow uploads of new jobs.) > Web UI should not show link to upload new jobs when using job mode cluster > -- > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN, Runtime / Web Frontend >Affects Versions: 1.2.1, 1.7.2, 1.8.0 >Reporter: Stephan Ewen >Priority: Minor > > When running a job in job mode, the web UI must not show the 'upload new job' > option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6455) Web UI should not show link to upload new jobs when using job mode cluster
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6455: - Component/s: (was: Deployment / YARN) > Web UI should not show link to upload new jobs when using job mode cluster > -- > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.2.1, 1.7.2, 1.8.0 >Reporter: Stephan Ewen >Priority: Minor > > When running a job in job mode, the web UI must not show the 'upload new job' > option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12049) ClassLoaderUtilsTest fails on Java 9
[ https://issues.apache.org/jira/browse/FLINK-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12049: --- Labels: pull-request-available (was: ) > ClassLoaderUtilsTest fails on Java 9 > > > Key: FLINK-12049 > URL: https://issues.apache.org/jira/browse/FLINK-12049 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: leesf >Priority: Major > Labels: pull-request-available > > {code} > 21:21:24.547 [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time > elapsed: 0.214 s <<< FAILURE! - in > org.apache.flink.runtime.util.ClassLoaderUtilsTest > 21:21:24.547 [ERROR] > testWithAppClassLoader(org.apache.flink.runtime.util.ClassLoaderUtilsTest) > Time elapsed: 0.021 s <<< FAILURE! > java.lang.AssertionError > at > org.apache.flink.runtime.util.ClassLoaderUtilsTest.testWithAppClassLoader(ClassLoaderUtilsTest.java:140) > {code} > {code} > public void testWithAppClassLoader() { > String result = > ClassLoaderUtil.getUserCodeClassLoaderInfo(ClassLoader.getSystemClassLoader()); > assertTrue(result.toLowerCase().contains("system classloader")); > {code} > {{ClassLoader.getSystemClassLoader()}} no longer returns an URLClassLoader on > Java 9, but {{ClassLoaderUtil.getUserCodeClassLoaderInfo}} relies on this to > extract information about the ClassLoader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6455: - Description: When running a job in job mode, the web UI must not show the 'upload new job' option. (was: When running a job on Yarn in the 'single job cluster' mode, the web UI must not show the 'upload new job' option. If users actually submit another job via the web UI, the JobManager starts sharing the yarn cluster between these two jobs and there will be competition for TaskManagers between the jobs.) > Yarn single-job JobManagers should not allow uploads of new jobs. > - > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN, Runtime / Web Frontend >Affects Versions: 1.2.1, 1.7.2, 1.8.0 >Reporter: Stephan Ewen >Priority: Minor > > When running a job in job mode, the web UI must not show the 'upload new job' > option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] leesf opened a new pull request #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest fails on Java 9
leesf opened a new pull request #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest fails on Java 9 URL: https://github.com/apache/flink/pull/8077 ## What is the purpose of the change Fix error in ClassLoaderUtilsTest#testWithAppClassLoader on Java 9. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6455: - Issue Type: Improvement (was: Bug) > Yarn single-job JobManagers should not allow uploads of new jobs. > - > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN, Runtime / Web Frontend >Affects Versions: 1.2.1 >Reporter: Stephan Ewen >Priority: Critical > > When running a job on Yarn in the 'single job cluster' mode, the web UI must > not show the 'upload new job' option. > If users actually submit another job via the web UI, the JobManager starts > sharing the yarn cluster between these two jobs and there will be competition > for TaskManagers between the jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12064) RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore
[ https://issues.apache.org/jira/browse/FLINK-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804874#comment-16804874 ] Yu Li commented on FLINK-12064: --- [~tzulitai] [~srichter] [~aljoscha] FYI. > RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens > during restore > > > Key: FLINK-12064 > URL: https://issues.apache.org/jira/browse/FLINK-12064 > Project: Flink > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li >Priority: Blocker > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > As titled, in current {{RocksDBKeyedStateBackend}} we use {{keySerializer}} > rather than {{keySerializerProvider.currentSchemaSerializer()}}, which is > incorrect. The issue is not revealed in existing UT since current cases > didn't check snapshot after state schema migration. > This is a regression issue caused by the FLINK-10043 refactoring work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6455: - Priority: Minor (was: Critical) > Yarn single-job JobManagers should not allow uploads of new jobs. > - > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN, Runtime / Web Frontend >Affects Versions: 1.2.1 >Reporter: Stephan Ewen >Priority: Minor > > When running a job on Yarn in the 'single job cluster' mode, the web UI must > not show the 'upload new job' option. > If users actually submit another job via the web UI, the JobManager starts > sharing the yarn cluster between these two jobs and there will be competition > for TaskManagers between the jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest fails on Java 9
flinkbot commented on issue #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest fails on Java 9 URL: https://github.com/apache/flink/pull/8077#issuecomment-477972236 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.
[ https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6455: - Affects Version/s: 1.8.0 1.7.2 > Yarn single-job JobManagers should not allow uploads of new jobs. > - > > Key: FLINK-6455 > URL: https://issues.apache.org/jira/browse/FLINK-6455 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN, Runtime / Web Frontend >Affects Versions: 1.2.1, 1.7.2, 1.8.0 >Reporter: Stephan Ewen >Priority: Minor > > When running a job on Yarn in the 'single job cluster' mode, the web UI must > not show the 'upload new job' option. > If users actually submit another job via the web UI, the JobManager starts > sharing the yarn cluster between these two jobs and there will be competition > for TaskManagers between the jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8035) Unable to submit job when HA is enabled
[ https://issues.apache.org/jira/browse/FLINK-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804866#comment-16804866 ] Till Rohrmann commented on FLINK-8035: -- Ping [~longtimer]. > Unable to submit job when HA is enabled > --- > > Key: FLINK-8035 > URL: https://issues.apache.org/jira/browse/FLINK-8035 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.4.0 > Environment: Mac OS X >Reporter: Robert Metzger >Priority: Critical > > Steps to reproduce: > - Get Flink 1.4 (f5a0b4bdfb) > - Get ZK (3.3.6 in this case) > - Put the following flink-conf.yaml: > {code} > high-availability: zookeeper > high-availability.storageDir: file:///tmp/flink-ha > high-availability.zookeeper.quorum: localhost:2181 > high-availability.zookeeper.path.cluster-id: /my-namespace > {code} > - Start Flink, submit a job (any streaming example will do) > The job submission will time out. On the JobManager, it seems that the job > submission gets stuck when trying to submit something to Zookeeper. > In the JM UI, the job will sit there in status "CREATED" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10484) New latency tracking metrics format causes metrics cardinality explosion
[ https://issues.apache.org/jira/browse/FLINK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804865#comment-16804865 ] Till Rohrmann commented on FLINK-10484: --- Can this issue be closed since FLINK-10242 has been backported [~jgrier]? > New latency tracking metrics format causes metrics cardinality explosion > > > Key: FLINK-10484 > URL: https://issues.apache.org/jira/browse/FLINK-10484 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.5.4, 1.6.0, 1.6.1 >Reporter: Jamie Grier >Assignee: Jamie Grier >Priority: Critical > > The new metrics format for latency tracking causes huge metrics cardinality > explosion due to the format and the fact that there is a metric reported for > a every combination of source subtask index and operator subtask index. > Yikes! > This format is actually responsible for basically taking down our metrics > system due to DDOSing our metrics servers (at Lyft). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9930) Flink slot memory leak after restarting job with static variables
[ https://issues.apache.org/jira/browse/FLINK-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804864#comment-16804864 ] Till Rohrmann commented on FLINK-9930: -- The best solution is to not use static variables in your user code. Otherwise it will only be cleaned up after the class is being unloaded. > Flink slot memory leak after restarting job with static variables > - > > Key: FLINK-9930 > URL: https://issues.apache.org/jira/browse/FLINK-9930 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.4.2 >Reporter: Martin Shang >Priority: Critical > > I find flink taskmanager exit caused by OOM ,and I dumped the head file. > > The result is that the static variables are not released when canceled the > job, and a new object is generated after I restarted the job. > The static variable is multiple for the slot memory isolation. > How could I make the variables collected by the Flink? > [multiple static variable|https://i.stack.imgur.com/KJcIN.jpg] > [path to gc|https://i.stack.imgur.com/bvrDF.jpg] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-9930) Flink slot memory leak after restarting job with static variables
[ https://issues.apache.org/jira/browse/FLINK-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-9930. Resolution: Not A Problem > Flink slot memory leak after restarting job with static variables > - > > Key: FLINK-9930 > URL: https://issues.apache.org/jira/browse/FLINK-9930 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.4.2 >Reporter: Martin Shang >Priority: Critical > > I find flink taskmanager exit caused by OOM ,and I dumped the head file. > > The result is that the static variables are not released when canceled the > job, and a new object is generated after I restarted the job. > The static variable is multiple for the slot memory isolation. > How could I make the variables collected by the Flink? > [multiple static variable|https://i.stack.imgur.com/KJcIN.jpg] > [path to gc|https://i.stack.imgur.com/bvrDF.jpg] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#discussion_r270377183 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java ## @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID resourceID) { } @Override - public CompletableFuture> requestMetricQueryServiceAddress(Time timeout) { - return CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath)); + public CompletableFuture> requestMetricQueryServiceGateway(Time timeout) { Review comment: This doesn't work at runtime since `Optional` is not serializable. Investigating alternatives. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-9848) When Parallelism more than available task slots Flink stays idle
[ https://issues.apache.org/jira/browse/FLINK-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804860#comment-16804860 ] Till Rohrmann commented on FLINK-9848: -- Is this still an issue [~yazdanjs]? If not, then please close this JIRA issue. > When Parallelism more than available task slots Flink stays idle > > > Key: FLINK-9848 > URL: https://issues.apache.org/jira/browse/FLINK-9848 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.5.0, 1.5.1 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Critical > > For version 1.4.x when select Parallelism > Available task Slots, Flink throw > bellow error right away > {{NoResourceAvailableException: Not enough free slots available to run the > job}} > > but for version 1.5.x there are two different behaviors: Sometimes job ran > successfully and sometimes it throw bellow error after 5 minutes > > {{org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate all requires slots within timeout of 30 ms. Slots > required: 5, slots allocated: 2}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-9710) Make ClusterClient be used as multiple instances in a single jvm process
[ https://issues.apache.org/jira/browse/FLINK-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-9710. Resolution: Not A Problem The new {{RestClusterClient}} should be usable multiple times. Therefore, I will close this issue for the moment. Please re-open this issue if the new cluster client has the same problems. > Make ClusterClient be used as multiple instances in a single jvm process > > > Key: FLINK-9710 > URL: https://issues.apache.org/jira/browse/FLINK-9710 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Affects Versions: 1.4.2 >Reporter: Chuanlei Ni >Priority: Critical > > We can use `ClusterClient` to submit job, but it is designed for command. So > we cannot use this class in a long running jvm process which will create > multiple cluster client concurrently. > This Jira aims to make `ClusterClient` be used as multiple instances in a > single jvm process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-9132) Cluster runs out of task slots when a job falls into restart loop
[ https://issues.apache.org/jira/browse/FLINK-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-9132. -- Resolution: Won't Fix This problem should be fixed with a newer Flink version (>= 1.5). Please try this version and report back if it is not workin. The community unfortunately no longer supports Flink 1.4.2. > Cluster runs out of task slots when a job falls into restart loop > - > > Key: FLINK-9132 > URL: https://issues.apache.org/jira/browse/FLINK-9132 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.4.2 > Environment: env.java.opts in flink-conf.yaml file: > > env.java.opts: -Xloggc:/home/user/flink/log/flinkServer-gc.log -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+UseG1GC -XX:MaxGCPauseMillis=150 -XX:InitiatingHeapOccupancyPercent=55 > -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=2 -XX:-ResizePLAB > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=100M >Reporter: Alex Smirnov >Priority: Critical > Attachments: FailedJob.java, jconsole-classes.png > > > If there's a job which is restarting in a loop, then Task Manager hosting it > goes down after some time. Job manager automatically assigns the job to > another Task Manager and the new Task Manager goes down as well. After some > time, all Task Managers are gone. Cluster becomes paralyzed. > I've attached to TaskManager's java process using jconsole and noticed that > number of loaded classes increases dramatically if a job is in restarting > loop and restores from checkpoint. > See attachment for the graph with G1GC enabled for the node. Standard GC > performs even worse - task manager shuts down within 20 minutes since the > restart loop start. > I've also attached minimal program to reproduce the problem > > please let me know if additional information is required from me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint
zentol commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint URL: https://github.com/apache/flink/pull/7927#issuecomment-477968927 So this doesn't actually work in production since `MetricQueryService#start()` was never called in the `MetricRegistryImpl`. I will fix this and add another test for the MR to ensure this doesn't happen again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-8803) Mini Cluster Shutdown with HA unstable, causing test failures
[ https://issues.apache.org/jira/browse/FLINK-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-8803. Resolution: Not A Problem Fix Version/s: (was: 1.6.5) Not a problem anymore, because the {{FlinkMiniCluster}} has been removed with FLINK-10540. > Mini Cluster Shutdown with HA unstable, causing test failures > - > > Key: FLINK-8803 > URL: https://issues.apache.org/jira/browse/FLINK-8803 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Reporter: Stephan Ewen >Priority: Critical > > When the {{FlinkMiniCluster}} is created for HA tests with ZooKeeper, the > shutdown is unstable. > It looks like ZooKeeper may be shut down before the JobManager is shut down, > causing the shutdown procedure of the JobManager (specifically > {{ZooKeeperSubmittedJobGraphStore.removeJobGraph}}) to block until tests time > out. > Full log: https://api.travis-ci.org/v3/job/346853707/log.txt > Note that no ZK threads are alive any more, seems ZK is shut down already. > Relevant Stack Traces: > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f973800a800 nid=0x43b4 waiting on > condition [0x7f973eb0b000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x8966cf18> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:212) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.ready(package.scala:169) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.startInternalShutdown(FlinkMiniCluster.scala:469) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:435) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.closeAsync(FlinkMiniCluster.scala:719) > at > org.apache.flink.test.util.MiniClusterResource.after(MiniClusterResource.java:104) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50) > ... > {code} > {code} > "flink-akka.actor.default-dispatcher-2" #1012 prio=5 os_prio=0 > tid=0x7f97394fa800 nid=0x3328 waiting on condition [0x7f971db29000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x87f82a70> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:336) > at > org.apache.flink.shaded.curator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35) > at > org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.release(ZooKeeperStateHandleStore.java:478) > at > org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:435) > at > org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:405) > at > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.removeJobGraph(ZooKeeperSubmittedJobGraphStore.java:266) > - locked
[jira] [Closed] (FLINK-10735) flink on yarn close container exception
[ https://issues.apache.org/jira/browse/FLINK-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-10735. - Resolution: Duplicate Should be fixed with FLINK-10848. > flink on yarn close container exception > --- > > Key: FLINK-10735 > URL: https://issues.apache.org/jira/browse/FLINK-10735 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination >Affects Versions: 1.6.2 > Environment: Hadoop 2.7 > flink 1.6.2 >Reporter: Fei Feng >Priority: Critical > Labels: yarn > > flink on yarn with detached mode, when cancle flink job,yarn resource release > very slow! > if job failed and continouslly restart , it will get more and more container > until the resource is used up。 > Log: > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Job > 32F01A0FC50EFE8F4794AD0C45678EC4: xxx switched from state RUNNING to > CANCELLING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: > Kafka09TableSource(SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, > SS_CUSTOMER_SK, SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, > SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY, SS_WHOLESALE_COST, SS_LIST_PRICE, > SS_SALES_PRICE, SS_EXT_DISCOUNT_AMT, SS_EXT_SALES_PRICE, > SS_EXT_WHOLESALE_COST, SS_EXT_LIST_PRICE, SS_EXT_TAX, SS_COUPON_AMT, > SS_NET_PAID, SS_NET_PAID_INC_TAX, SS_NET_PROFIT, ROWTIME) -> from: > (SS_WHOLESALE_COST, SS_SALES_PRICE, SS_COUPON_AMT, SS_NET_PAID, > SS_NET_PROFIT, ROWTIME) -> Timestamps/Watermarks -> where: (>(SS_COUPON_AMT, > 0)), select: (ROWTIME, SS_WHOLESALE_COST, SS_SALES_PRICE, SS_NET_PAID, > SS_NET_PROFIT) -> time attribute: (ROWTIME) (1/3) > (0807b5f291f897ac4545dbfdb8ec3448) switched from RUNNING to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: > Kafka09TableSource(SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, > SS_CUSTOMER_SK, SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, > SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY, SS_WHOLESALE_COST, SS_LIST_PRICE, > SS_SALES_PRICE, SS_EXT_DISCOUNT_AMT, SS_EXT_SALES_PRICE, > SS_EXT_WHOLESALE_COST, SS_EXT_LIST_PRICE, SS_EXT_TAX, SS_COUPON_AMT, > SS_NET_PAID, SS_NET_PAID_INC_TAX, SS_NET_PROFIT, ROWTIME) -> from: > (SS_WHOLESALE_COST, SS_SALES_PRICE, SS_COUPON_AMT, SS_NET_PAID, > SS_NET_PROFIT, ROWTIME) -> Timestamps/Watermarks -> where: (>(SS_COUPON_AMT, > 0)), select: (ROWTIME, SS_WHOLESALE_COST, SS_SALES_PRICE, SS_NET_PAID, > SS_NET_PROFIT) -> time attribute: (ROWTIME) (2/3) > (a56a70eb6807dacf18fbf272ee6160e2) switched from RUNNING to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: > Kafka09TableSource(SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, > SS_CUSTOMER_SK, SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, > SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY, SS_WHOLESALE_COST, SS_LIST_PRICE, > SS_SALES_PRICE, SS_EXT_DISCOUNT_AMT, SS_EXT_SALES_PRICE, > SS_EXT_WHOLESALE_COST, SS_EXT_LIST_PRICE, SS_EXT_TAX, SS_COUPON_AMT, > SS_NET_PAID, SS_NET_PAID_INC_TAX, SS_NET_PROFIT, ROWTIME) -> from: > (SS_WHOLESALE_COST, SS_SALES_PRICE, SS_COUPON_AMT, SS_NET_PAID, > SS_NET_PROFIT, ROWTIME) -> Timestamps/Watermarks -> where: (>(SS_COUPON_AMT, > 0)), select: (ROWTIME, SS_WHOLESALE_COST, SS_SALES_PRICE, SS_NET_PAID, > SS_NET_PROFIT) -> time attribute: (ROWTIME) (3/3) > (a2eca3dc06087dfdaf3fd0200e545cc4) switched from RUNNING to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: window: > (TumblingGroupWindow('w$, 'ROWTIME, 360.millis)), select: > (SUM(SS_WHOLESALE_COST) AS EXPR$1, SUM(SS_SALES_PRICE) AS EXPR$2, > SUM(SS_NET_PAID) AS EXPR$3, SUM(SS_NET_PROFIT) AS EXPR$4, start('w$) AS > w$start, end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS > w$proctime) -> where: (>(EXPR$4, 1000)), select: (CAST(w$start) AS WSTART, > EXPR$1, EXPR$2, EXPR$3, EXPR$4) -> to: Row (1/1) > (9fa3592c74eda97124a033b6afea6c87) switched from RUNNING to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Sink: > JDBCAppendTableSink(RS_DAY, RS_WHOLESALE_COST, RS_SALES_PRICE, RS_NET_PAID, > RS_NET_PROFIT) (1/3) (6d4f413ace1206714566b610e5bf47b6) switched from RUNNING > to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Sink: > JDBCAppendTableSink(RS_DAY, RS_WHOLESALE_COST, RS_SALES_PRICE, RS_NET_PAID, > RS_NET_PROFIT) (2/3) (8871efc7c19abae7f833e02aa1d2107d) switched from RUNNING > to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Sink: > JDBCAppendTableSink(RS_DAY, RS_WHOLESALE_COST, RS_SALES_PRICE, RS_NET_PAID, > RS_NET_PROFIT) (3/3) (fcaf0b9247d60a31e3381d70871d929f) switched from RUNNING > to CANCELING. > 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: > Kafka09TableSource(SS_SOLD_DATE_SK,
[jira] [Commented] (FLINK-9307) Running flink program doesn't print infos in console any more
[ https://issues.apache.org/jira/browse/FLINK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804850#comment-16804850 ] Till Rohrmann commented on FLINK-9307: -- Is this issue still valid [~Tison]? > Running flink program doesn't print infos in console any more > - > > Key: FLINK-9307 > URL: https://issues.apache.org/jira/browse/FLINK-9307 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Affects Versions: 1.6.0 > Environment: Mac OS X 12.10.6 > Flink 1.6-SNAPSHOT commit c8fa8d025684c222582 . >Reporter: TisonKun >Priority: Critical > > As shown in [this > page|https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html#run-the-example], > when flink run program, it should have printed infos like > {code:java} > Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123 > Using address 127.0.0.1:6123 to connect to JobManager. > JobManager web interface address http://127.0.0.1:8081 > Starting execution of program > Submitting job with JobID: 574a10c8debda3dccd0c78a3bde55e1b. Waiting for > job completion. > Connected to JobManager at > Actor[akka.tcp://flink@127.0.0.1:6123/user/jobmanager#297388688] > 11/04/2016 14:04:50 Job execution switched to status RUNNING. > 11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to > SCHEDULED > 11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to > DEPLOYING > 11/04/2016 14:04:50 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to SCHEDULED > 11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to DEPLOYING > 11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of > WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) > switched to RUNNING > 11/04/2016 14:04:51 Source: Socket Stream -> Flat Map(1/1) switched to > RUNNING > {code} > but when I follow the same command, it just printed > {code:java} > Starting execution of program > {code} > Those message are printed properly if I use flink 1.4, and also logged at > `log/flink--taskexecutor-0-localhost.log` when running on > 1.6-SNAPSHOT. > Now it breaks [this pull request|https://github.com/apache/flink/pull/5957] > and [~Zentol] suspects it's a regression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6949) Add ability to ship custom resource files to YARN cluster
[ https://issues.apache.org/jira/browse/FLINK-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804849#comment-16804849 ] Mikhail Pryakhin commented on FLINK-6949: - Hi [~till.rohrmann], Yes, unfortunately it is still an issue. Sorry for being inactive for a long time, I hope to contribute to a solution soon > Add ability to ship custom resource files to YARN cluster > - > > Key: FLINK-6949 > URL: https://issues.apache.org/jira/browse/FLINK-6949 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.3.0 >Reporter: Mikhail Pryakhin >Assignee: Mikhail Pryakhin >Priority: Major > > *The problem:* > When deploying a flink job on YARN it is not possible to specify custom > resource files to be shipped to YARN cluster. > > *The use case description:* > When running a flink job on multiple environments it becomes necessary to > pass environment-related configuration files to the job's runtime. It can be > accomplished by packaging configuration files within the job's jar. But > having tens of different environments one can easily end up packaging as many > jars as there are environments. It would be great to have an ability to > separate configuration files from the job artifacts. > > *The possible solution:* > add the --yarnship-files option to flink cli to specify files that should be > shipped to the YARN cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-7990) Strange behavior when configuring Logback for logging
[ https://issues.apache.org/jira/browse/FLINK-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804845#comment-16804845 ] Till Rohrmann commented on FLINK-7990: -- Is this issue still valid [~fhueske]? If not, then please close it. > Strange behavior when configuring Logback for logging > - > > Key: FLINK-7990 > URL: https://issues.apache.org/jira/browse/FLINK-7990 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.3.2 >Reporter: Fabian Hueske >Priority: Major > > The following issue was reported on the [user > mailinglist|https://lists.apache.org/thread.html/c06a9f0b1189bf21d946d3d9728631295c88bfc57043cdbe18409d52@%3Cuser.flink.apache.org%3E] > {quote} > I have a single node Flink instance which has the required jars for logback > in the lib folder (logback-classic.jar, logback-core.jar, > log4j-over-slf4j.jar). I have removed the jars for log4j from the lib folder > (log4j-1.2.17.jar, slf4j-log4j12-1.7.7.jar). 'logback.xml' is also correctly > updated in 'conf' folder. I have also included 'logback.xml' in the > classpath, although this does not seem to be considered while the job is run. > Flink refers to logback.xml inside the conf folder only. I have updated > pom.xml as per Flink's documentation in order to exclude log4j. I have some > log entries set inside a few map and flatmap functions and some log entries > outside those functions (eg: "program execution started"). > When I run the job, Flink writes only those logs that are coded outside the > transformations. Those logs that are coded inside the transformations (map, > flatmap etc) are not getting written to the log file. If this was happening > always, I could have assumed that the Task Manager is not writing the logs. > But Flink displays a strange behavior regarding this. Whenever I update the > logback jars inside the the lib folder(due to version changes), during the > next job run, all logs (even those inside map and flatmap) are written > correctly into the log file. But the logs don't get written in any of the > runs after that. This means that my 'logback.xml' file is correct and the > settings are also correct. But I don't understand why the same settings don't > work while the same job is run again. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6206) Log task state transitions as warn/error for FAILURE scenarios
[ https://issues.apache.org/jira/browse/FLINK-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6206: - Priority: Minor (was: Critical) > Log task state transitions as warn/error for FAILURE scenarios > -- > > Key: FLINK-6206 > URL: https://issues.apache.org/jira/browse/FLINK-6206 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.2.0 >Reporter: Dan Bress >Priority: Minor > Labels: pull-request-available > > If a task fails due to an exception, I would like that to be logged at a warn > or an error level. currently its info > {code} > private boolean transitionState(ExecutionState currentState, ExecutionState > newState, Throwable cause) { > if (STATE_UPDATER.compareAndSet(this, currentState, newState)) { > if (cause == null) { > LOG.info("{} ({}) switched from {} to {}.", > taskNameWithSubtask, executionId, currentState, newState); > } else { > LOG.info("{} ({}) switched from {} to {}.", > taskNameWithSubtask, executionId, currentState, newState, cause); > } > return true; > } else { > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7990) Strange behavior when configuring Logback for logging
[ https://issues.apache.org/jira/browse/FLINK-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7990: - Priority: Major (was: Critical) > Strange behavior when configuring Logback for logging > - > > Key: FLINK-7990 > URL: https://issues.apache.org/jira/browse/FLINK-7990 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.3.2 >Reporter: Fabian Hueske >Priority: Major > > The following issue was reported on the [user > mailinglist|https://lists.apache.org/thread.html/c06a9f0b1189bf21d946d3d9728631295c88bfc57043cdbe18409d52@%3Cuser.flink.apache.org%3E] > {quote} > I have a single node Flink instance which has the required jars for logback > in the lib folder (logback-classic.jar, logback-core.jar, > log4j-over-slf4j.jar). I have removed the jars for log4j from the lib folder > (log4j-1.2.17.jar, slf4j-log4j12-1.7.7.jar). 'logback.xml' is also correctly > updated in 'conf' folder. I have also included 'logback.xml' in the > classpath, although this does not seem to be considered while the job is run. > Flink refers to logback.xml inside the conf folder only. I have updated > pom.xml as per Flink's documentation in order to exclude log4j. I have some > log entries set inside a few map and flatmap functions and some log entries > outside those functions (eg: "program execution started"). > When I run the job, Flink writes only those logs that are coded outside the > transformations. Those logs that are coded inside the transformations (map, > flatmap etc) are not getting written to the log file. If this was happening > always, I could have assumed that the Task Manager is not writing the logs. > But Flink displays a strange behavior regarding this. Whenever I update the > logback jars inside the the lib folder(due to version changes), during the > next job run, all logs (even those inside map and flatmap) are written > correctly into the log file. But the logs don't get written in any of the > runs after that. This means that my 'logback.xml' file is correct and the > settings are also correct. But I don't understand why the same settings don't > work while the same job is run again. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers
zentol commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers URL: https://github.com/apache/flink/pull/8002#discussion_r270372947 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java ## @@ -144,11 +135,9 @@ public MetricRegistryImpl(MetricRegistryConfiguration config) { } } - Class reporterClass = Class.forName(className); - MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); + final MetricReporter reporterInstance = reporterSetup.getSupplier().get(); Review comment: see https://github.com/zentol/flink/commit/b5feb554a965913255c3a7f3ab94430bb37d6c52#diff-65246650592087a2e5a816275f23dcc3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zentol commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers
zentol commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers URL: https://github.com/apache/flink/pull/8002#discussion_r270372880 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java ## @@ -144,11 +135,9 @@ public MetricRegistryImpl(MetricRegistryConfiguration config) { } } - Class reporterClass = Class.forName(className); - MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); + final MetricReporter reporterInstance = reporterSetup.getSupplier().get(); Review comment: In FLINK-11922 I want do add support for reporter factories, with which supported reporters would allocate resources in their constructor instead of `open`. I don't want that to occur in the configuration, hence the supplier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (FLINK-6125) Commons httpclient is not shaded anymore in Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-6125. -- Resolution: Later Closed because of inactivity. If this is still a problem, please re-open this issue. > Commons httpclient is not shaded anymore in Flink 1.2 > - > > Key: FLINK-6125 > URL: https://issues.apache.org/jira/browse/FLINK-6125 > Project: Flink > Issue Type: Bug > Components: Build System, Connectors / Kinesis >Reporter: Robert Metzger >Priority: Critical > > This has been reported by a user: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Return-of-Flink-shading-problems-in-1-2-0-td12257.html > The Kinesis connector requires Flink to not expose any httpclient > dependencies. Since Flink 1.2 it seems that we are exposing that dependency > again -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6949) Add ability to ship custom resource files to YARN cluster
[ https://issues.apache.org/jira/browse/FLINK-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804842#comment-16804842 ] Till Rohrmann commented on FLINK-6949: -- Hi [~m.pryahin] is this still an issue or has this been fixed by now? > Add ability to ship custom resource files to YARN cluster > - > > Key: FLINK-6949 > URL: https://issues.apache.org/jira/browse/FLINK-6949 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.3.0 >Reporter: Mikhail Pryakhin >Assignee: Mikhail Pryakhin >Priority: Major > > *The problem:* > When deploying a flink job on YARN it is not possible to specify custom > resource files to be shipped to YARN cluster. > > *The use case description:* > When running a flink job on multiple environments it becomes necessary to > pass environment-related configuration files to the job's runtime. It can be > accomplished by packaging configuration files within the job's jar. But > having tens of different environments one can easily end up packaging as many > jars as there are environments. It would be great to have an ability to > separate configuration files from the job artifacts. > > *The possible solution:* > add the --yarnship-files option to flink cli to specify files that should be > shipped to the YARN cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-7351) test instability in JobClientActorRecoveryITCase#testJobClientRecovery
[ https://issues.apache.org/jira/browse/FLINK-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-7351. Resolution: Not A Problem Fix Version/s: (was: 1.6.5) Not a problem anymore with Flip-6 > test instability in JobClientActorRecoveryITCase#testJobClientRecovery > -- > > Key: FLINK-7351 > URL: https://issues.apache.org/jira/browse/FLINK-7351 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.3.2, 1.4.0 >Reporter: Nico Kruber >Priority: Critical > Labels: test-stability > > On a 16-core VM, the following test failed during {{mvn clean verify}} > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec > <<< FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase > testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase) > Time elapsed: 21.299 sec <<< ERROR! > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Not enough free slots available to run the job. You can decrease the operator > parallelism or increase the number of slots per TaskManager in the > configuration. Resources available to scheduler: Number of instances=0, total > number of slots=0, available slots=0 > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) > at > org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6949) Add ability to ship custom resource files to YARN cluster
[ https://issues.apache.org/jira/browse/FLINK-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6949: - Priority: Major (was: Critical) > Add ability to ship custom resource files to YARN cluster > - > > Key: FLINK-6949 > URL: https://issues.apache.org/jira/browse/FLINK-6949 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.3.0 >Reporter: Mikhail Pryakhin >Assignee: Mikhail Pryakhin >Priority: Major > > *The problem:* > When deploying a flink job on YARN it is not possible to specify custom > resource files to be shipped to YARN cluster. > > *The use case description:* > When running a flink job on multiple environments it becomes necessary to > pass environment-related configuration files to the job's runtime. It can be > accomplished by packaging configuration files within the job's jar. But > having tens of different environments one can easily end up packaging as many > jars as there are environments. It would be great to have an ability to > separate configuration files from the job artifacts. > > *The possible solution:* > add the --yarnship-files option to flink cli to specify files that should be > shipped to the YARN cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-7610) xerces lib conflict
[ https://issues.apache.org/jira/browse/FLINK-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-7610. Resolution: Resolved Should be resolved with child-first classloading. > xerces lib conflict > --- > > Key: FLINK-7610 > URL: https://issues.apache.org/jira/browse/FLINK-7610 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.3.2 >Reporter: Domenico Campagnolo >Priority: Critical > > Flink job fails when I try to use the method validate((new > StAXSource(reader))) from the class _javax.xml.validation.Validator_ because > StAXSource is [supported since xerces > 2.10|https://stackoverflow.com/questions/20493359/staxsource-not-accepted-by-validator-in-jboss-eap-6-1] > and the library _flink-shaded-hadoop2-uber-1.3.2_ contains the 2.9.1 version > in it. > I also included the dependency to the latest version of xerces 2.11.0 into my > job-jar but it still does not work. > I also added that library into _%FLINK_HOME%/lib/_ folder hoping it would be > loaded on the classpath at the boot, overriding the original one, but it does > not work either. > See the cause: > _Caused by: java.lang.IllegalArgumentException: *Source parameter of type > 'javax.xml.transform.stax.StAXSource' is not accepted by this validator*._ > {code:java} > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error. > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) > at > org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80) > at > org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318) > at > org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72) > at > org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61) > at > org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41) > at > org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109) > at > org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97) > at > org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62) > at > io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57) > at > io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159) > at > org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147) > at >
[jira] [Updated] (FLINK-7040) Flip-6 client-cluster communication
[ https://issues.apache.org/jira/browse/FLINK-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7040: - Priority: Major (was: Critical) > Flip-6 client-cluster communication > --- > > Key: FLINK-7040 > URL: https://issues.apache.org/jira/browse/FLINK-7040 > Project: Flink > Issue Type: New Feature > Components: Deployment / Mesos, Runtime / Coordination >Reporter: Till Rohrmann >Assignee: Chesnay Schepler >Priority: Major > Labels: flip-6 > > With the new Flip-6 architecture, the client will communicate with the > cluster in a RESTful manner. > The cluster shall support the following REST calls: > * List jobs (GET): Get list of all running jobs on the cluster > * Submit job (POST): Submit a job to the cluster (only supported in session > mode) > * Lookup job leader (GET): Gets the JM leader for the given job > * Get job status (GET): Get the status of an executed job (and maybe the > JobExecutionResult) > * Cancel job (PUT): Cancel the given job > * Stop job (PUT): Stops the given job > * Take savepoint (POST): Take savepoint for given job (How to return the > savepoint under which the savepoint was stored? Maybe always having to > specify a path) > * Get KV state (GET): Gets the KV state for the given job and key (Queryable > state) > * Poll/subscribe to notifications for job (GET, WebSocket): Polls new > notifications from the execution of the given job/Opens WebSocket to receive > notifications > The first four REST calls will be served by the REST endpoint running in the > application master/cluster entrypoint. The other calls will be served by a > REST endpoint running along side to the JobManager. > Detailed information about different implementations and their pros and cons > can be found in this document: > https://docs.google.com/document/d/1eIX6FS9stwraRdSUgRSuLXC1sL7NAmxtuqIXe_jSi-k/edit?usp=sharing > The implementation will most likely be Netty based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8641) Move BootstrapTools#getTaskManagerShellCommand to flink-yarn
[ https://issues.apache.org/jira/browse/FLINK-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8641: - Priority: Major (was: Critical) > Move BootstrapTools#getTaskManagerShellCommand to flink-yarn > > > Key: FLINK-8641 > URL: https://issues.apache.org/jira/browse/FLINK-8641 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN, Runtime / Configuration >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Major > > I would like to move {{getTaskManagerShellCommand()}} and > {{getStartCommand()}} from > {{org.apache.flink.runtime.clusterframework.BootstrapTools}} in flink-runtime > to flink-yarn. > Yarn is the sole user of these methods, and both methods are directly related > to the {{YARN_CONTAINER_START_COMMAND_TEMPLATE}} {{ConfigConstants}} > We can't move this constant to {{YarnOptions}} at this point since the > {{YarnOptions}} are in {{flink-yarn}}, but the above methods require the > option to be accessible from {{flink-runtime}}. > [~till.rohrmann] Do you see any problems that this move could cause? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-5536) Config option: HA
[ https://issues.apache.org/jira/browse/FLINK-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-5536. -- Resolution: Incomplete Parent issue closed. Please re-open if you want to pick it up again. > Config option: HA > - > > Key: FLINK-5536 > URL: https://issues.apache.org/jira/browse/FLINK-5536 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Mesos >Reporter: Eron Wright >Assignee: Stavros Kontopoulos >Priority: Major > > Configure Flink HA thru package options plus good defaults. The main > components are ZK configuration and state backend configuration. > - The ZK information can be defaulted to `master.mesos` as with other packages > - Evaluate whether ZK can be fully configured by default, even if a state > backend isn't configured. > - Use DCOS HDFS as the filesystem for the state backend. Evaluate whether to > assume that DCOS HDFS is installed by default, or whether to make it explicit. > - To use DCOS HDFS, the init script should download the core-site.xml and > hdfs-site.xml from the HDFS 'connection' endpoint. Supply a default value > for the endpoint address; see > [https://docs.mesosphere.com/service-docs/hdfs/connecting-clients/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-5534) Config option: 'flink-options'
[ https://issues.apache.org/jira/browse/FLINK-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-5534. -- Resolution: Incomplete Parent issue closed. Please re-open if you want to pick it up again. > Config option: 'flink-options' > -- > > Key: FLINK-5534 > URL: https://issues.apache.org/jira/browse/FLINK-5534 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Mesos >Reporter: Eron Wright >Priority: Critical > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-5540) CLI: savepoint
[ https://issues.apache.org/jira/browse/FLINK-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-5540. -- Resolution: Incomplete Parent issue closed. Please re-open if you want to pick it up again. > CLI: savepoint > -- > > Key: FLINK-5540 > URL: https://issues.apache.org/jira/browse/FLINK-5540 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Mesos >Reporter: Eron Wright >Priority: Major > > Implement CLI support for savepoints, in both the 'run' and 'cancel' > operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)