[GitHub] [flink] flinkbot commented on issue #8083: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9

2019-03-29 Thread GitBox
flinkbot commented on issue #8083: [FLINK-12050][Tests] BlockingShutdownTest 
fails on Java 9
URL: https://github.com/apache/flink/pull/8083#issuecomment-478208078
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] leesf opened a new pull request #8083: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9

2019-03-29 Thread GitBox
leesf opened a new pull request #8083: [FLINK-12050][Tests] 
BlockingShutdownTest fails on Java 9
URL: https://github.com/apache/flink/pull/8083
 
 
   
   ## What is the purpose of the change
   
   Fix error in BlockingShutdownTest on Java 9.
   
   
   ## Brief change log
   
   Add clazz.getName.equals("java.lang.ProcessImpl") in if condition.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12050) BlockingShutdownTest fails on Java 9

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12050:
---
Labels: pull-request-available  (was: )

> BlockingShutdownTest fails on Java 9
> 
>
> Key: FLINK-12050
> URL: https://issues.apache.org/jira/browse/FLINK-12050
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> 21:21:28.689 [ERROR] 
> testProcessShutdownBlocking(org.apache.flink.runtime.util.BlockingShutdownTest)
>   Time elapsed: 0.961 s  <<< FAILURE!
> java.lang.AssertionError: Cannot determine process ID
>   at 
> org.apache.flink.runtime.util.BlockingShutdownTest.testProcessShutdownBlocking(BlockingShutdownTest.java:57)
> 21:21:28.689 [ERROR] 
> testProcessExitsDespiteBlockingShutdownHook(org.apache.flink.runtime.util.BlockingShutdownTest)
>   Time elapsed: 0.325 s  <<< FAILURE!
> java.lang.AssertionError: Cannot determine process ID
>   at 
> org.apache.flink.runtime.util.BlockingShutdownTest.testProcessExitsDespiteBlockingShutdownHook(BlockingShutdownTest.java:98)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] leesf closed pull request #8082: [FLINK-12050][Tests] BlockingShutdownTest fails on Java 9

2019-03-29 Thread GitBox
leesf closed pull request #8082: [FLINK-12050][Tests] BlockingShutdownTest 
fails on Java 9
URL: https://github.com/apache/flink/pull/8082
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8082: [FLINk-12050][Tests] BlockingShutdownTest fails on Java 9

2019-03-29 Thread GitBox
flinkbot commented on issue #8082: [FLINk-12050][Tests] BlockingShutdownTest 
fails on Java 9
URL: https://github.com/apache/flink/pull/8082#issuecomment-478204878
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] leesf opened a new pull request #8082: [FLINk-12050][Tests] BlockingShutdownTest fails on Java 9

2019-03-29 Thread GitBox
leesf opened a new pull request #8082: [FLINk-12050][Tests] 
BlockingShutdownTest fails on Java 9
URL: https://github.com/apache/flink/pull/8082
 
 
   
   ## What is the purpose of the change
   
   Fix error in BlockingShutdownTest on Java 9.
   
   
   ## Brief change log
   
   Add clazz.getName.equals("java.lang.ProcessImpl") in if condition.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis

2019-03-29 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804019#comment-16804019
 ] 

TisonKun edited comment on FLINK-12048 at 3/30/19 2:34 AM:
---

-I think this is caused because after FLINK-11718 we start a dispatcher and 
when {{#onStart}} called in the main thread, the {{SubmittedJobGraphStore}} 
just started.-

-For this test we need to find an approach to explicitly wait for the 
dispatcher get started.-
 -For production code we need to move 
{{pathCache.getListenable().addListener(new 
SubmittedJobGraphsPathCacheListener());}} to 
{{ZooKeeperSubmittedJobGraphStore#start}} and thus {{#onAddedJobGraph}} isn't 
called before the {{ZooKeeperSubmittedJobGraphStore}} started.-

-Further and personally, we'd better nudge FLINK-10333 to remove 
{{SubmittedJobGraphListener}}-

-cc- [~till.rohrmann]--

{{pathCache.start();}} is called in 
{{ZooKeeperSubmittedJobGraphStore#start()}}. My bad.


was (Author: tison):
I think this is caused because after FLINK-11718 we start a dispatcher and when 
{{#onStart}} called in the main thread, the {{SubmittedJobGraphStore}} just 
started.

For this test we need to find an approach to explicitly wait for the dispatcher 
get started.
For production code we need to move {{pathCache.getListenable().addListener(new 
SubmittedJobGraphsPathCacheListener());}} to 
{{ZooKeeperSubmittedJobGraphStore#start}} and thus {{#onAddedJobGraph}} isn't 
called before the {{ZooKeeperSubmittedJobGraphStore}} started.

Further and personally, we'd better nudge FLINK-10333 to remove 
{{SubmittedJobGraphListener}}

cc [~till.rohrmann]

> ZooKeeperHADispatcherTest failed on Travis
> --
>
> Key: FLINK-12048
> URL: https://issues.apache.org/jira/browse/FLINK-12048
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/apache/flink/builds/512077301
> {code}
> 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 9.671 s <<< FAILURE! - in 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest
> 01:14:56.364 [ERROR] 
> testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest)
>   Time elapsed: 1.209 s  <<< ERROR!
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: 
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the 
> added job d51eeb908f360e44c0a2004e00a6afd2
>   at 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not 
> start the added job d51eeb908f360e44c0a2004e00a6afd2
> Caused by: java.lang.IllegalStateException: Not running. Forgot to call 
> start()?
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable

2019-03-29 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805325#comment-16805325
 ] 

TisonKun commented on FLINK-11828:
--

Although this is the earlier issue, further discussion happened in FLINK-12048. 
Thus close this as a duplication of FLINK-12048.

> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
> --
>
> Key: FLINK-11828
> URL: https://issues.apache.org/jira/browse/FLINK-11828
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
>Reporter: Andrey Zagrebin
>Priority: Critical
>  Labels: test-stability
>
> I observed locally on Mac that 
> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes 
> sporadically fails when I run the whole test package 
> org.apache.flink.runtime.dispatcher in IntelliJ Idea:
> {code:java}
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: 
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the 
> added job 3cdec37e27b590a6f87b6c52151aa17d
> at 
> org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51)
> at 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not 
> start the added job 3cdec37e27b590a6f87b6c52151aa17d
> at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> at 
> 

[jira] [Updated] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable

2019-03-29 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-11828:
-
Release Note:   (was: Although this is the earlier issue, further 
discussion happened in FLINK-12048. Thus close this as a duplication of 
FLINK-12048.)

> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
> --
>
> Key: FLINK-11828
> URL: https://issues.apache.org/jira/browse/FLINK-11828
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
>Reporter: Andrey Zagrebin
>Priority: Critical
>  Labels: test-stability
>
> I observed locally on Mac that 
> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes 
> sporadically fails when I run the whole test package 
> org.apache.flink.runtime.dispatcher in IntelliJ Idea:
> {code:java}
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: 
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the 
> added job 3cdec37e27b590a6f87b6c52151aa17d
> at 
> org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51)
> at 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not 
> start the added job 3cdec37e27b590a6f87b6c52151aa17d
> at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> at 
> 

[jira] [Updated] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis

2019-03-29 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-12048:
-
Affects Version/s: 1.8.0

> ZooKeeperHADispatcherTest failed on Travis
> --
>
> Key: FLINK-12048
> URL: https://issues.apache.org/jira/browse/FLINK-12048
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/apache/flink/builds/512077301
> {code}
> 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 9.671 s <<< FAILURE! - in 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest
> 01:14:56.364 [ERROR] 
> testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest)
>   Time elapsed: 1.209 s  <<< ERROR!
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: 
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the 
> added job d51eeb908f360e44c0a2004e00a6afd2
>   at 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not 
> start the added job d51eeb908f360e44c0a2004e00a6afd2
> Caused by: java.lang.IllegalStateException: Not running. Forgot to call 
> start()?
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable

2019-03-29 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun closed FLINK-11828.

  Resolution: Duplicate
Release Note: Although this is the earlier issue, further discussion 
happened in FLINK-12048. Thus close this as a duplication of FLINK-12048.

> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
> --
>
> Key: FLINK-11828
> URL: https://issues.apache.org/jira/browse/FLINK-11828
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
>Reporter: Andrey Zagrebin
>Priority: Critical
>  Labels: test-stability
>
> I observed locally on Mac that 
> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes 
> sporadically fails when I run the whole test package 
> org.apache.flink.runtime.dispatcher in IntelliJ Idea:
> {code:java}
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: 
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the 
> added job 3cdec37e27b590a6f87b6c52151aa17d
> at 
> org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51)
> at 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not 
> start the added job 3cdec37e27b590a6f87b6c52151aa17d
> at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> at 
> 

[jira] [Commented] (FLINK-11935) Remove DateTimeUtils pull-in and fix datetime casting problem

2019-03-29 Thread Rong Rong (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805307#comment-16805307
 ] 

Rong Rong commented on FLINK-11935:
---

Thanks for checking in [~yanghua]. I think we can skip the calcite upgrade to 
1.19 because the file did not change between 1.18 and 1.19 i believe.

I think further investigation is needed to answer: "can we remove DateTimeUtils 
file".

if removing DateTimeUtils and fix the cast problem is possible we can just go 
with that path. (as you described)

if not, we will need to
 # backport the DateTimeUtils to latest version Flink use (which is 1.18),
 # make changes on top as needed
 # create Calcite/Avatica Jira ticket and marked in the DataTimeUtils file to 
properly log the issue why we still need the file.

Once we upgrade to 1.19 in the master ticket FLINK-11921. we will revisit and 
see if removing the file is possible during the main upgrade, if not a follow 
up ticket will be created.

 

Do you think this can be the right approach? 

> Remove DateTimeUtils pull-in and fix datetime casting problem
> -
>
> Key: FLINK-11935
> URL: https://issues.apache.org/jira/browse/FLINK-11935
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Rong Rong
>Assignee: vinoyang
>Priority: Major
>
> This {{DateTimeUtils}} was pulled in in FLINK-7235.
> Originally the time operation was not correctly done via the {{ymdToJulian}} 
> function before the date {{1970-01-01}} thus we need the fix. similar to 
> addressing this problem:
> {code:java}
>  Optimized :1017-12-05 22:58:58.998 
>  Expected :1017-11-29 22:58:58.998
>  Actual :1017-12-05 22:58:58.998
> {code}
>  
> However, after pulling in avatica 1.13, I found out that the optimized plans 
> of the time operations are actually correct. it is in fact the casting part 
> that creates problem:
> For example, the following:
> *{{(plus(-12000.months, cast('2017-11-29 22:58:58.998', TIMESTAMP))}}*
> result in a StringTestExpression of:
> *{{CAST(1017-11-29 22:58:58.998):VARCHAR(65536) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" NOT NULL}}*
> but the testing results are:
> {code:java}
>  Optimized :1017-11-29 22:58:58.998
>  Expected :1017-11-29 22:58:58.998
>  Actual :1017-11-23 22:58:58.998
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] tzulitai commented on issue #8076: [FLINK-12064] [core, State Backends] RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore

2019-03-29 Thread GitBox
tzulitai commented on issue #8076: [FLINK-12064] [core, State Backends] 
RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens 
during restore
URL: https://github.com/apache/flink/pull/8076#issuecomment-478061022
 
 
   Just to clarify:
   the problem is that the state backend uses the correct key serializer for 
state access, but the wrong one is snapshotted in checkpoints, is that correct?
   
   Otherwise, the title / description implies that it is using the wrong key 
serializer for runtime state access.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer

2019-03-29 Thread GitBox
flinkbot commented on issue #8081:  [FLINK-11501] [kafka] Add ratelimiting to 
Kafka consumer
URL: https://github.com/apache/flink/pull/8081#issuecomment-478060987
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] tweise commented on issue #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer

2019-03-29 Thread GitBox
tweise commented on issue #8081:  [FLINK-11501] [kafka] Add ratelimiting to 
Kafka consumer
URL: https://github.com/apache/flink/pull/8081#issuecomment-478060762
 
 
   CC: @glaksh100 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] tweise opened a new pull request #8081: [FLINK-11501] [kafka] Add ratelimiting to Kafka consumer

2019-03-29 Thread GitBox
tweise opened a new pull request #8081:  [FLINK-11501] [kafka] Add ratelimiting 
to Kafka consumer
URL: https://github.com/apache/flink/pull/8081
 
 
   Backport to 1.8.x
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12064) RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore

2019-03-29 Thread Tzu-Li (Gordon) Tai (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805168#comment-16805168
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-12064:
-

Just to clarify:
the problem is that the state backend uses the correct key serializer for state 
access, but the wrong one is snapshotted in checkpoints, is that correct?
Otherwise, the title implies that it is using the wrong key serializer for 
runtime state access.

> RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens 
> during restore
> 
>
> Key: FLINK-12064
> URL: https://issues.apache.org/jira/browse/FLINK-12064
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As titled, in current {{RocksDBKeyedStateBackend}} we use {{keySerializer}} 
> rather than {{keySerializerProvider.currentSchemaSerializer()}}, which is 
> incorrect. The issue is not revealed in existing UT since current cases 
> didn't check snapshot after state schema migration.
> This is a regression issue caused by the FLINK-10043 refactoring work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-03-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12070:
-

 Summary: Make blocking result partitions consumable multiple times
 Key: FLINK-12070
 URL: https://issues.apache.org/jira/browse/FLINK-12070
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Till Rohrmann


In order to avoid writing produced results multiple times for multiple 
consumers and in order to speed up batch recoveries, we should make the 
blocking result partitions to be consumable multiple times. At the moment a 
blocking result partition will be released once the consumers has processed all 
data. Instead the result partition should be released once the next blocking 
result has been produced and all consumers of a blocking result partition have 
terminated. Moreover, blocking results should not hold on slot resources like 
network buffers or memory as it is currently the case with 
{{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12069) Add proper lifecycle management for intermediate result partitions

2019-03-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12069:
-

 Summary: Add proper lifecycle management for intermediate result 
partitions
 Key: FLINK-12069
 URL: https://issues.apache.org/jira/browse/FLINK-12069
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination, Runtime / Network
Affects Versions: 1.8.0, 1.9.0
Reporter: Till Rohrmann


In order to properly execute batch jobs, we should make the lifecycle 
management of intermediate result partitions the responsibility of the 
{{JobMaster}}/{{Scheduler}} component. The {{Scheduler}} knows best when an 
intermediate result partition is no longer needed and, thus, can be freed. So 
for example, a blocking intermediate result should only be released after all 
subsequent blocking intermediate results have been completed in order to speed 
up potential failovers.

Moreover, having explicit control over intermediate result partitions, could 
also enable use cases like result partition sharing between jobs and even 
across clusters (by simply not releasing the result partitions). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8080: [hotfix][runtime] Improve exception message in SingleInputGate

2019-03-29 Thread GitBox
flinkbot commented on issue #8080: [hotfix][runtime] Improve exception message 
in SingleInputGate
URL: https://github.com/apache/flink/pull/8080#issuecomment-478056398
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] leesf opened a new pull request #8080: [hotfix][runtime] Improve exception message in SingleInputGate

2019-03-29 Thread GitBox
leesf opened a new pull request #8080: [hotfix][runtime] Improve exception 
message in SingleInputGate
URL: https://github.com/apache/flink/pull/8080
 
 
   Improve exception message in SingleInputGate.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12068) Backtrack fail over regions if intermediate results are unavailable

2019-03-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12068:
-

 Summary: Backtrack fail over regions if intermediate results are 
unavailable
 Key: FLINK-12068
 URL: https://issues.apache.org/jira/browse/FLINK-12068
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Till Rohrmann


The batch failover strategy needs to be able to backtrack fail over regions if 
an intermediate result is unavailable. Either by explicitly checking whether 
the intermediate result partition is available or via a special exception 
indicating that a result partition is no longer available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol commented on a change in pull request #8036: [FLINK-11431][runtime] Upgrade akka to 2.5

2019-03-29 Thread GitBox
zentol commented on a change in pull request #8036: [FLINK-11431][runtime] 
Upgrade akka to 2.5
URL: https://github.com/apache/flink/pull/8036#discussion_r270454915
 
 

 ##
 File path: flink-dist/src/main/resources/META-INF/NOTICE
 ##
 @@ -41,7 +41,7 @@ The following dependencies all share the same BSD license 
which you find under l
 - org.scala-lang:scala-library:2.11.12
 - org.scala-lang:scala-reflect:2.11.12
 - org.scala-lang.modules:scala-java8-compat_2.11:0.7.0
-- org.scala-lang.modules:scala-parser-combinators_2.11:1.0.4
+- org.scala-lang.modules:scala-parser-combinators_2.11:1.1.1
 
 Review comment:
   right, they do, but without declaring the dependency  , but given how it's 
in a rather central place I'd again assume that the tests cover this 
sufficiently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] zentol commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
zentol commented on issue #7927: [FLINK-11603][metrics] Port the 
MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#issuecomment-478034908
 
 
   All done, big part of the PR is now renaming variables/methods from *path to 
*address, which is rather unfortunate...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-6227) Introduce the DataConsumptionException for downstream task failure

2019-03-29 Thread zhijiang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805081#comment-16805081
 ] 

zhijiang commented on FLINK-6227:
-

Thanks for the concern [~till.rohrmann].

I could work on it if needed after the whole fine-grained recovery is 
confirmed. :)

> Introduce the DataConsumptionException for downstream task failure
> --
>
> Key: FLINK-6227
> URL: https://issues.apache.org/jira/browse/FLINK-6227
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: zhijiang
>Priority: Minor
>
> It is part of FLIP-1.
> We define a new special exception to indicate the downstream task failure in 
> consuming upstream data. 
> The {{JobManager}} will receive and consider this special exception to 
> calculate the minimum connected sub-graph which is called {{FailoverRegion}}. 
> So the {{DataConsumptionException}} should contain {{ResultPartitionID}} 
> information for {{JobMaster}} tracking upstream executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805079#comment-16805079
 ] 

Till Rohrmann commented on FLINK-12048:
---

The problem is actually a callback to the {{#onAddedJobGraph}} of the second 
{{Dispatcher}} which is so delayed that it is executed after the second 
{{Dispatcher}} has been shut down. Here is a commit with which one can 
reproduce the interleaving locally: 
https://github.com/tillrohrmann/flink/commit/f361cdec484e707061e0cbbd727f417fbe60e8b7.
 As part of FLINK-11843 I want to rework that a {{Dispatcher}} is only running 
if it has the leadership and not if it is on stand by. This could fix the 
problem. Moreover, we should make sure that no concurrent operations are 
ongoing when we terminate the {{Dispatcher}}.

> ZooKeeperHADispatcherTest failed on Travis
> --
>
> Key: FLINK-12048
> URL: https://issues.apache.org/jira/browse/FLINK-12048
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/apache/flink/builds/512077301
> {code}
> 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 9.671 s <<< FAILURE! - in 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest
> 01:14:56.364 [ERROR] 
> testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest)
>   Time elapsed: 1.209 s  <<< ERROR!
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: 
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the 
> added job d51eeb908f360e44c0a2004e00a6afd2
>   at 
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not 
> start the added job d51eeb908f360e44c0a2004e00a6afd2
> Caused by: java.lang.IllegalStateException: Not running. Forgot to call 
> start()?
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port 
the MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#discussion_r270430314
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ##
 @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID 
resourceID) {
}
 
@Override
-   public CompletableFuture> 
requestMetricQueryServiceAddress(Time timeout) {
-   return 
CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath));
+   public CompletableFuture> 
requestMetricQueryServiceGateway(Time timeout) {
 
 Review comment:
   yes it is very similar to the previous behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot edited a comment on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9

2019-03-29 Thread GitBox
flinkbot edited a comment on issue #8079: [FLINK-12054][Tests] 
HBaseConnectorITCase fails on Java 9
URL: https://github.com/apache/flink/pull/8079#issuecomment-478011172
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ✅ 1. The [description] looks good.
   - Approved by @zentol [PMC]
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] zentol commented on a change in pull request #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9

2019-03-29 Thread GitBox
zentol commented on a change in pull request #8079: [FLINK-12054][Tests] 
HBaseConnectorITCase fails on Java 9
URL: https://github.com/apache/flink/pull/8079#discussion_r270426021
 
 

 ##
 File path: 
flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/HBaseTestingClusterAutostarter.java
 ##
 @@ -211,8 +211,7 @@ private static void createHBaseSiteXml(File 
hbaseSiteXmlDirectory, String zookee
 
private static void addDirectoryToClassPath(File directory) {
try {
-   // Get the classloader actually used by 
HBaseConfiguration
-   ClassLoader classLoader = 
HBaseConfiguration.create().getClassLoader();
 
 Review comment:
   The point of this method is to mutate the default classloader so that the 
hbase-site.xml is on the classpath later on.
   
   By creating a new classloader that is immediately discarded once the method 
exists and never used to actually load anything this method becomes a no-op, 
which isn't acceptable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-9363) Bump up the Jackson version

2019-03-29 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-9363:

Affects Version/s: shaded-6.0
Fix Version/s: shaded-7.0
  Component/s: (was: Runtime / Coordination)
   BuildSystem / Shaded

> Bump up the Jackson version
> ---
>
> Key: FLINK-9363
> URL: https://issues.apache.org/jira/browse/FLINK-9363
> Project: Flink
>  Issue Type: Improvement
>  Components: BuildSystem / Shaded
>Affects Versions: shaded-6.0
>Reporter: Ted Yu
>Assignee: aloyszhang
>Priority: Major
>  Labels: pull-request-available, security
> Fix For: shaded-7.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CVE's for Jackson :
> CVE-2017-17485
> CVE-2018-5968
> CVE-2018-7489
> We can upgrade to 2.9.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12067) Refactor the constructor of NetworkEnvironment

2019-03-29 Thread zhijiang (JIRA)
zhijiang created FLINK-12067:


 Summary: Refactor the constructor of NetworkEnvironment
 Key: FLINK-12067
 URL: https://issues.apache.org/jira/browse/FLINK-12067
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: zhijiang
Assignee: zhijiang


The constructor of {{NetworkEnvironment}} could be refactored to only contain 
{{NetworkEnvironmentConfiguration}}, the other related components such as 
{{TaskEventDispatcher}}, {{ResultPartitionManager}}, {{NetworkBufferPool}} 
could be created internally.

We also refactor the process of generating {{NetworkEnvironmentConfiguration}} 
in {{TaskManagerServiceConfiguration}} to add {{numNetworkBuffers}} instead of 
previous {{networkBufFraction}}, {{networkBufMin}}, {{networkBufMax}}.

Further we introduce the {{NetworkEnvironmentConfigurationBuilder}} for 
creating {{NetworkEnvironmentConfiguration}} easily especially for tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase fails on Java 9

2019-03-29 Thread GitBox
flinkbot commented on issue #8079: [FLINK-12054][Tests] HBaseConnectorITCase 
fails on Java 9
URL: https://github.com/apache/flink/pull/8079#issuecomment-478011172
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12054) HBaseConnectorITCase fails on Java 9

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12054:
---
Labels: pull-request-available  (was: )

> HBaseConnectorITCase fails on Java 9
> 
>
> Key: FLINK-12054
> URL: https://issues.apache.org/jira/browse/FLINK-12054
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System, Connectors / HBase
>Reporter: Chesnay Schepler
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> An issue in hbase.
> {code}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 21.83 sec <<< 
> FAILURE! - in org.apache.flink.addons.hbase.HBaseConnectorITCase
> org.apache.flink.addons.hbase.HBaseConnectorITCase  Time elapsed: 21.829 sec  
> <<< FAILURE!
> java.lang.AssertionError: We should get a URLClassLoader
>   at 
> org.apache.flink.addons.hbase.HBaseConnectorITCase.activateHBaseCluster(HBaseConnectorITCase.java:81)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] leesf opened a new pull request #8079: [FLINK-12054] HBaseConnectorITCase fails on Java 9

2019-03-29 Thread GitBox
leesf opened a new pull request #8079: [FLINK-12054] HBaseConnectorITCase fails 
on Java 9
URL: https://github.com/apache/flink/pull/8079
 
 
   
   
   ## What is the purpose of the change
   
   Fix the error in HBaseConnectorITCase on Java 9.
   
   ## Brief change log
   
   Create a new URLClassLoader with directory.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-9679) Implement ConfluentRegistryAvroSerializationSchema

2019-03-29 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Wosiński updated FLINK-9679:

Summary: Implement ConfluentRegistryAvroSerializationSchema  (was: 
Implement AvroSerializationSchema)

> Implement ConfluentRegistryAvroSerializationSchema
> --
>
> Key: FLINK-9679
> URL: https://issues.apache.org/jira/browse/FLINK-9679
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.6.0
>Reporter: Yazdan Shirvany
>Assignee: Yazdan Shirvany
>Priority: Major
>  Labels: pull-request-available
>
> Implement AvroSerializationSchema using Confluent Schema Registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-12050) BlockingShutdownTest fails on Java 9

2019-03-29 Thread leesf (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned FLINK-12050:
-

Assignee: leesf

> BlockingShutdownTest fails on Java 9
> 
>
> Key: FLINK-12050
> URL: https://issues.apache.org/jira/browse/FLINK-12050
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: leesf
>Priority: Major
>
> {code}
> 21:21:28.689 [ERROR] 
> testProcessShutdownBlocking(org.apache.flink.runtime.util.BlockingShutdownTest)
>   Time elapsed: 0.961 s  <<< FAILURE!
> java.lang.AssertionError: Cannot determine process ID
>   at 
> org.apache.flink.runtime.util.BlockingShutdownTest.testProcessShutdownBlocking(BlockingShutdownTest.java:57)
> 21:21:28.689 [ERROR] 
> testProcessExitsDespiteBlockingShutdownHook(org.apache.flink.runtime.util.BlockingShutdownTest)
>   Time elapsed: 0.325 s  <<< FAILURE!
> java.lang.AssertionError: Cannot determine process ID
>   at 
> org.apache.flink.runtime.util.BlockingShutdownTest.testProcessExitsDespiteBlockingShutdownHook(BlockingShutdownTest.java:98)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] sorahn commented on issue #8016: [FLINK-10705][web]: Rework Flink Web Dashboard

2019-03-29 Thread GitBox
sorahn commented on issue #8016: [FLINK-10705][web]: Rework Flink Web Dashboard
URL: https://github.com/apache/flink/pull/8016#issuecomment-477990235
 
 
   @vthinkxie I don't really see anything else, it all seems to work as 
advertised.  I think the next step is getting this merged and getting some 
real-world testing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] tillrohrmann commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers

2019-03-29 Thread GitBox
tillrohrmann commented on a change in pull request #8002: 
[FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters 
Suppliers
URL: https://github.com/apache/flink/pull/8002#discussion_r270397980
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java
 ##
 @@ -144,11 +135,9 @@ public MetricRegistryImpl(MetricRegistryConfiguration 
config) {
}
}
 
-   Class reporterClass = 
Class.forName(className);
-   MetricReporter reporterInstance = 
(MetricReporter) reporterClass.newInstance();
+   final MetricReporter reporterInstance = 
reporterSetup.getSupplier().get();
 
 Review comment:
   Can't this happen outside of the `MetricRegistryImpl` when we create the 
`MetricReporter` or is there a contract that certain operations can only be 
when we create the `MetricRegistryImpl`? If the `MetricReporter` depend on the 
`MetricRegistryImpl` then I understand why we need a factory. But if not, then 
I think it would be easier to directly start the `MetricReporters`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (FLINK-11898) Support code generation for all Blink built-in functions and operators

2019-03-29 Thread Jark Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu resolved FLINK-11898.
-
Resolution: Resolved

Resolved in 1.9: 2d66d4ae42576426d9df67d54a99ed57034f90b7

> Support code generation for all Blink built-in functions and operators
> --
>
> Key: FLINK-11898
> URL: https://issues.apache.org/jira/browse/FLINK-11898
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner
>Reporter: Jark Wu
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support code generation for built-in functions and operators. 
> FLINK-11788 has supported some of the operators. This issue is aiming to 
> complement the functions and operators supported in Flink SQL.
> This should inlclude: CONCAT, LIKE, SUBSTRING, UPPER, LOWER, and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] asfgit closed pull request #8029: [FLINK-11898] [table-planner-blink] Support code generation for all Blink built-in functions and operators

2019-03-29 Thread GitBox
asfgit closed pull request #8029: [FLINK-11898] [table-planner-blink] Support 
code generation for all Blink built-in functions and operators
URL: https://github.com/apache/flink/pull/8029
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-6227) Introduce the DataConsumptionException for downstream task failure

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804926#comment-16804926
 ] 

Till Rohrmann commented on FLINK-6227:
--

This issue might still be relevant depending on how we plan to solve the 
backtracking problem. It could now be easier with the introduction of 
FLINK-10289.

> Introduce the DataConsumptionException for downstream task failure
> --
>
> Key: FLINK-6227
> URL: https://issues.apache.org/jira/browse/FLINK-6227
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: zhijiang
>Priority: Minor
>
> It is part of FLIP-1.
> We define a new special exception to indicate the downstream task failure in 
> consuming upstream data. 
> The {{JobManager}} will receive and consider this special exception to 
> calculate the minimum connected sub-graph which is called {{FailoverRegion}}. 
> So the {{DataConsumptionException}} should contain {{ResultPartitionID}} 
> information for {{JobMaster}} tracking upstream executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10288) Failover Strategy improvement

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804923#comment-16804923
 ] 

Till Rohrmann commented on FLINK-10288:
---

Point 4. might already be covered by the design of FLINK-4256 but has not been 
implemented yet.

> Failover Strategy improvement
> -
>
> Key: FLINK-10288
> URL: https://issues.apache.org/jira/browse/FLINK-10288
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: JIN SUN
>Assignee: ryantaocer
>Priority: Major
>
> Flink pays significant efforts to make Streaming Job fault tolerant. The 
> checkpoint mechanism and exactly once semantics make Flink different than 
> other systems. However, there are still some cases not been handled very 
> well. Those cases can apply to both Streaming and Batch scenarios, and its 
> orthogonal with current fault tolerant mechanism. Here is a summary of those 
> cases:
>  # Some failures are non-recoverable, such as a user error: 
> DividebyZeroException. We shouldn't try to restart the task, as it will never 
> succeed. The DivideByZeroException is just a simple case, those errors 
> sometime are not easy to reproduce or predict, as it might be only triggered 
> by specific input data, we shouldn’t retry for all user code exceptions.
>  # There is no limit for task retry today, unless a SuppressRestartException 
> was encountered, a task will keep on retrying until it succeeds. As mentioned 
> above, we shouldn’t retry for some cases at all, and for the Exceptions we 
> can retry, such as a network exception, should we have a retry limit? We need 
> retry for any transient issue, but we also need to set a limit to avoid 
> infinite retry and resource wasting. For Batch and Streaming workload, we 
> might need different strategies.
>  # There are some exceptions due to hardware issues, such as disk/network 
> malfunction. when a task/TaskManager fail on this, we’d better detect and 
> avoid to schedule to that machine next time.
>  # If a task read from a blocking result partition, when its input is not 
> available, we can ‘revoke’ the produce task, set the task fail and rerun the 
> upstream task to regenerate data.  the revoke can propagate up through the 
> chain. In Spark, revoke is naturally support by lineage.
> To make fault tolerance easier, we need to keep deterministic behavior as 
> much as possible. For user code, it’s not easy to control. However, for 
> system related code, we can fix it. For example, we should at least make sure 
> the different attempt of a same task to have the same inputs (we have a bug 
> in current codebase (DataSourceTask) that cannot guarantee this). Note that 
> this is track by [Flink-10205]
> Details see this proposal:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] wuchong commented on issue #8029: [FLINK-11898] [table-planner-blink] Support code generation for all Blink built-in functions and operators

2019-03-29 Thread GitBox
wuchong commented on issue #8029: [FLINK-11898] [table-planner-blink] Support 
code generation for all Blink built-in functions and operators
URL: https://github.com/apache/flink/pull/8029#issuecomment-477986803
 
 
   Merging
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (FLINK-12020) Add documentation for mesos-appmaster-job.sh

2019-03-29 Thread Jacky Yin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Yin reassigned FLINK-12020:
-

Assignee: Jacky Yin

> Add documentation for mesos-appmaster-job.sh
> 
>
> Key: FLINK-12020
> URL: https://issues.apache.org/jira/browse/FLINK-12020
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Mesos, Documentation
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Till Rohrmann
>Assignee: Jacky Yin
>Priority: Minor
>
> The Flink documentation is currently lacking information about the 
> {{mesos-appmaster-job.sh}} and how to use it. It would be helpful for our 
> users to add documentation and examples how to use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot edited a comment on issue #7940: [hotfix][docs] fix error in functions example

2019-03-29 Thread GitBox
flinkbot edited a comment on issue #7940: [hotfix][docs] fix error in functions 
example 
URL: https://github.com/apache/flink/pull/7940#issuecomment-470805294
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❗ 3. Needs [attention] from.
   - Needs attention by @dawidwys [committer], @twalthr [PMC], @zentol [PMC]
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] leesf commented on issue #7940: [hotfix][docs] fix error in functions example

2019-03-29 Thread GitBox
leesf commented on issue #7940: [hotfix][docs] fix error in functions example 
URL: https://github.com/apache/flink/pull/7940#issuecomment-477981097
 
 
   @flinkbot attention @dawidwys would you review this PR in your free time?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-9307) Running flink program doesn't print infos in console any more

2019-03-29 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804907#comment-16804907
 ] 

Chesnay Schepler commented on FLINK-9307:
-

Previously the client was notified of job status updates, this is no longer the 
case since we switched over to REST.

> Running flink program doesn't print infos in console any more
> -
>
> Key: FLINK-9307
> URL: https://issues.apache.org/jira/browse/FLINK-9307
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Affects Versions: 1.6.0
> Environment: Mac OS X 12.10.6
> Flink 1.6-SNAPSHOT commit c8fa8d025684c222582 .
>Reporter: TisonKun
>Priority: Critical
>
> As shown in [this 
> page|https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html#run-the-example],
>  when flink run program, it should have printed infos like
> {code:java}
> Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123
>   Using address 127.0.0.1:6123 to connect to JobManager.
>   JobManager web interface address http://127.0.0.1:8081
>   Starting execution of program
>   Submitting job with JobID: 574a10c8debda3dccd0c78a3bde55e1b. Waiting for 
> job completion.
>   Connected to JobManager at 
> Actor[akka.tcp://flink@127.0.0.1:6123/user/jobmanager#297388688]
>   11/04/2016 14:04:50 Job execution switched to status RUNNING.
>   11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to 
> SCHEDULED
>   11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to 
> DEPLOYING
>   11/04/2016 14:04:50 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to SCHEDULED
>   11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to DEPLOYING
>   11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to RUNNING
>   11/04/2016 14:04:51 Source: Socket Stream -> Flat Map(1/1) switched to 
> RUNNING
> {code}
> but when I follow the same command, it just printed
> {code:java}
> Starting execution of program
> {code}
> Those message are printed properly if I use flink 1.4, and also logged at 
> `log/flink--taskexecutor-0-localhost.log` when running on 
> 1.6-SNAPSHOT.
> Now it breaks [this pull request|https://github.com/apache/flink/pull/5957] 
> and [~Zentol] suspects it's a regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] carp84 commented on issue #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends

2019-03-29 Thread GitBox
carp84 commented on issue #8078: [FLINK-12066] [State Backends] Remove 
StateSerializerProvider field in keyed state backends
URL: https://github.com/apache/flink/pull/8078#issuecomment-477978217
 
 
   The current PR also includes fix for FLINK-12066 (blocker) and will do a 
rebase after that goes in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] 
Port the MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#discussion_r270385597
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ##
 @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID 
resourceID) {
}
 
@Override
-   public CompletableFuture> 
requestMetricQueryServiceAddress(Time timeout) {
-   return 
CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath));
+   public CompletableFuture> 
requestMetricQueryServiceGateway(Time timeout) {
 
 Review comment:
   Then the behavior should be like we previously do with `actorSelection`. 
Thanks for your help and I would also learn how to pass an address and connect 
to the RpcEndpoint with it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12066) Remove StateSerializerProvider field in keyed state backend

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12066:
---
Labels: pull-request-available  (was: )

> Remove StateSerializerProvider field in keyed state backend
> ---
>
> Key: FLINK-12066
> URL: https://issues.apache.org/jira/browse/FLINK-12066
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> As mentioned in [PR review of 
> FLINK-10043|https://github.com/apache/flink/pull/7674#discussion_r257630962] 
> with Stefan and offline discussion with Gordon, after the refactoring work 
> serializer passed to {{RocksDBKeyedStateBackend}} constructor is a final one, 
> thus the {{StateSerializerProvider}} field is no longer needed.
> For {{HeapKeyedStateBackend}}, the only thing stops us to pass a final 
> serializer is the circle dependency between the backend and 
> {{HeapRestoreOperation}}, and we aim to decouple them by introducing a new 
> {{HeapInternalKeyContext}} as the bridge. More details please refer to the 
> coming PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends

2019-03-29 Thread GitBox
flinkbot commented on issue #8078: [FLINK-12066] [State Backends] Remove 
StateSerializerProvider field in keyed state backends
URL: https://github.com/apache/flink/pull/8078#issuecomment-477977949
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 opened a new pull request #8078: [FLINK-12066] [State Backends] Remove StateSerializerProvider field in keyed state backends

2019-03-29 Thread GitBox
carp84 opened a new pull request #8078: [FLINK-12066] [State Backends] Remove 
StateSerializerProvider field in keyed state backends
URL: https://github.com/apache/flink/pull/8078
 
 
   ## What is the purpose of the change
   
   This PR removes the {{StateSerializerProvider}} field in rocksdb and heap 
keyed backends, as per 
[discussed](https://github.com/apache/flink/pull/7674#discussion_r257630962) 
and planed in previous review of FLINK-10043
   
   ## Brief change log
   
   * Removed {{StateSerializerProvider}} field from keyed backends
   * Introduced a new {{HeapInternalKeyContext}} which allows restore to happen 
before backend construction in heap backend builder.
   
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as:
   
   * All tests under `org.apache.flink.runtime.state` package
   * All tests under `org.apache.flink.runtime.state.heap` package
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-9307) Running flink program doesn't print infos in console any more

2019-03-29 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804894#comment-16804894
 ] 

TisonKun commented on FLINK-9307:
-

It is still our case that "running flink program doesn't print infos in 
console". However, with FLINK-9282 we updated our documentation and now our 
document says it is the proper behavior. It depends on whether we want a patch 
to print infos in console as in flink 1.4.

> Running flink program doesn't print infos in console any more
> -
>
> Key: FLINK-9307
> URL: https://issues.apache.org/jira/browse/FLINK-9307
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Affects Versions: 1.6.0
> Environment: Mac OS X 12.10.6
> Flink 1.6-SNAPSHOT commit c8fa8d025684c222582 .
>Reporter: TisonKun
>Priority: Critical
>
> As shown in [this 
> page|https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html#run-the-example],
>  when flink run program, it should have printed infos like
> {code:java}
> Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123
>   Using address 127.0.0.1:6123 to connect to JobManager.
>   JobManager web interface address http://127.0.0.1:8081
>   Starting execution of program
>   Submitting job with JobID: 574a10c8debda3dccd0c78a3bde55e1b. Waiting for 
> job completion.
>   Connected to JobManager at 
> Actor[akka.tcp://flink@127.0.0.1:6123/user/jobmanager#297388688]
>   11/04/2016 14:04:50 Job execution switched to status RUNNING.
>   11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to 
> SCHEDULED
>   11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to 
> DEPLOYING
>   11/04/2016 14:04:50 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to SCHEDULED
>   11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to DEPLOYING
>   11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to RUNNING
>   11/04/2016 14:04:51 Source: Socket Stream -> Flat Map(1/1) switched to 
> RUNNING
> {code}
> but when I follow the same command, it just printed
> {code:java}
> Starting execution of program
> {code}
> Those message are printed properly if I use flink 1.4, and also logged at 
> `log/flink--taskexecutor-0-localhost.log` when running on 
> 1.6-SNAPSHOT.
> Now it breaks [this pull request|https://github.com/apache/flink/pull/5957] 
> and [~Zentol] suspects it's a regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port 
the MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#discussion_r270384272
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ##
 @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID 
resourceID) {
}
 
@Override
-   public CompletableFuture> 
requestMetricQueryServiceAddress(Time timeout) {
-   return 
CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath));
+   public CompletableFuture> 
requestMetricQueryServiceGateway(Time timeout) {
 
 Review comment:
   You can't just add `Serializable` to a class and expect it to magically 
work. The `MQS` extends `RpcEndpoint` which is not serialiable and contains 
loads of fields that are neither.
   
   From what i can tell we have to return the RpcService address instead, but 
again, let me investigate first before making any changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12066) Remove StateSerializerProvider field in keyed state backend

2019-03-29 Thread Yu Li (JIRA)
Yu Li created FLINK-12066:
-

 Summary: Remove StateSerializerProvider field in keyed state 
backend
 Key: FLINK-12066
 URL: https://issues.apache.org/jira/browse/FLINK-12066
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Yu Li
Assignee: Yu Li
 Fix For: 1.9.0


As mentioned in [PR review of 
FLINK-10043|https://github.com/apache/flink/pull/7674#discussion_r257630962] 
with Stefan and offline discussion with Gordon, after the refactoring work 
serializer passed to {{RocksDBKeyedStateBackend}} constructor is a final one, 
thus the {{StateSerializerProvider}} field is no longer needed.

For {{HeapKeyedStateBackend}}, the only thing stops us to pass a final 
serializer is the circle dependency between the backend and 
{{HeapRestoreOperation}}, and we aim to decouple them by introducing a new 
{{HeapInternalKeyContext}} as the bridge. More details please refer to the 
coming PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] TisonKun commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
TisonKun commented on issue #7927: [FLINK-11603][metrics] Port the 
MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#issuecomment-477975190
 
 
   Do a rebase on FLINK-12057 and try to address comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
TisonKun commented on a change in pull request #7927: [FLINK-11603][metrics] 
Port the MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#discussion_r270382585
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ##
 @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID 
resourceID) {
}
 
@Override
-   public CompletableFuture> 
requestMetricQueryServiceAddress(Time timeout) {
-   return 
CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath));
+   public CompletableFuture> 
requestMetricQueryServiceGateway(Time timeout) {
 
 Review comment:
   Is using `SerializableOptional` reasonable? It causes we implements 
`Serializable` for `MetricQueryServiceGateway` and thus for 
`MetricQueryService`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-9904) Allow users to control MaxDirectMemorySize

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9904:
-
Affects Version/s: 1.9.0
   1.8.0
   1.7.2

> Allow users to control MaxDirectMemorySize
> --
>
> Key: FLINK-9904
> URL: https://issues.apache.org/jira/browse/FLINK-9904
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Scripts
>Affects Versions: 1.4.2, 1.5.1, 1.7.2, 1.8.0, 1.9.0
>Reporter: Himanshu Roongta
>Priority: Minor
>
> For people who use docker image and run flink in pods, currently, there is no 
> way to update 
> {{MaxDirectMemorySize}}
> (Well one can create a custom version of 
> [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh])
>  
> As a result, it starts with a value of 8388607T . If the param 
> {{taskmanager.memory.preallocate}} is set to false (default) the clean up 
> will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle 
> kicks in. However with pods especially in kuberenete they will get killed 
> because pods do not run at such a high value. (In our case we run 8GB per pod)
>  
> The fix would be to allow it be configurable via {{flink-conf}}. We can still 
> have a default of 8388607T to avoid a breaking change. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9904) Allow users to control MaxDirectMemorySize

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9904:
-
Component/s: (was: Runtime / Coordination)
 Deployment / Scripts

> Allow users to control MaxDirectMemorySize
> --
>
> Key: FLINK-9904
> URL: https://issues.apache.org/jira/browse/FLINK-9904
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Scripts
>Affects Versions: 1.4.2, 1.5.1
>Reporter: Himanshu Roongta
>Priority: Minor
>
> For people who use docker image and run flink in pods, currently, there is no 
> way to update 
> {{MaxDirectMemorySize}}
> (Well one can create a custom version of 
> [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh])
>  
> As a result, it starts with a value of 8388607T . If the param 
> {{taskmanager.memory.preallocate}} is set to false (default) the clean up 
> will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle 
> kicks in. However with pods especially in kuberenete they will get killed 
> because pods do not run at such a high value. (In our case we run 8GB per pod)
>  
> The fix would be to allow it be configurable via {{flink-conf}}. We can still 
> have a default of 8388607T to avoid a breaking change. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9904) Allow users to control MaxDirectMemorySize

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804879#comment-16804879
 ] 

Till Rohrmann commented on FLINK-9904:
--

Yes, please provide a fix for this problem [~hroongta].

> Allow users to control MaxDirectMemorySize
> --
>
> Key: FLINK-9904
> URL: https://issues.apache.org/jira/browse/FLINK-9904
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.4.2, 1.5.1
>Reporter: Himanshu Roongta
>Priority: Minor
>
> For people who use docker image and run flink in pods, currently, there is no 
> way to update 
> {{MaxDirectMemorySize}}
> (Well one can create a custom version of 
> [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh])
>  
> As a result, it starts with a value of 8388607T . If the param 
> {{taskmanager.memory.preallocate}} is set to false (default) the clean up 
> will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle 
> kicks in. However with pods especially in kuberenete they will get killed 
> because pods do not run at such a high value. (In our case we run 8GB per pod)
>  
> The fix would be to allow it be configurable via {{flink-conf}}. We can still 
> have a default of 8388607T to avoid a breaking change. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9904) Allow users to control MaxDirectMemorySize

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9904:
-
Priority: Minor  (was: Critical)

> Allow users to control MaxDirectMemorySize
> --
>
> Key: FLINK-9904
> URL: https://issues.apache.org/jira/browse/FLINK-9904
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.4.2, 1.5.1
>Reporter: Himanshu Roongta
>Priority: Minor
>
> For people who use docker image and run flink in pods, currently, there is no 
> way to update 
> {{MaxDirectMemorySize}}
> (Well one can create a custom version of 
> [taskmanager.sh|https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh])
>  
> As a result, it starts with a value of 8388607T . If the param 
> {{taskmanager.memory.preallocate}} is set to false (default) the clean up 
> will only occur when the MaxDirectMemorySize limit is hit and a gc full cycle 
> kicks in. However with pods especially in kuberenete they will get killed 
> because pods do not run at such a high value. (In our case we run 8GB per pod)
>  
> The fix would be to allow it be configurable via {{flink-conf}}. We can still 
> have a default of 8388607T to avoid a breaking change. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6455) Web UI should not show link to upload new jobs when using job mode cluster

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804877#comment-16804877
 ] 

Till Rohrmann commented on FLINK-6455:
--

Update: With Flip-6 it is no longer possible to submit jobs to a job mode 
cluster. However we still show the web UI link which should not be the case.

> Web UI should not show link to upload new jobs when using job mode cluster
> --
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.2.1, 1.7.2, 1.8.0
>Reporter: Stephan Ewen
>Priority: Minor
>
> When running a job in job mode, the web UI must not show the 'upload new job' 
> option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6455) Web UI should not show link to upload new jobs when using job mode cluster

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6455:
-
Summary: Web UI should not show link to upload new jobs when using job mode 
cluster  (was: Yarn single-job JobManagers should not allow uploads of new 
jobs.)

> Web UI should not show link to upload new jobs when using job mode cluster
> --
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN, Runtime / Web Frontend
>Affects Versions: 1.2.1, 1.7.2, 1.8.0
>Reporter: Stephan Ewen
>Priority: Minor
>
> When running a job in job mode, the web UI must not show the 'upload new job' 
> option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6455) Web UI should not show link to upload new jobs when using job mode cluster

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6455:
-
Component/s: (was: Deployment / YARN)

> Web UI should not show link to upload new jobs when using job mode cluster
> --
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.2.1, 1.7.2, 1.8.0
>Reporter: Stephan Ewen
>Priority: Minor
>
> When running a job in job mode, the web UI must not show the 'upload new job' 
> option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12049) ClassLoaderUtilsTest fails on Java 9

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12049:
---
Labels: pull-request-available  (was: )

> ClassLoaderUtilsTest fails on Java 9
> 
>
> Key: FLINK-12049
> URL: https://issues.apache.org/jira/browse/FLINK-12049
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> 21:21:24.547 [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 0.214 s <<< FAILURE! - in 
> org.apache.flink.runtime.util.ClassLoaderUtilsTest
> 21:21:24.547 [ERROR] 
> testWithAppClassLoader(org.apache.flink.runtime.util.ClassLoaderUtilsTest)  
> Time elapsed: 0.021 s  <<< FAILURE!
> java.lang.AssertionError
>   at 
> org.apache.flink.runtime.util.ClassLoaderUtilsTest.testWithAppClassLoader(ClassLoaderUtilsTest.java:140)
> {code}
> {code}
> public void testWithAppClassLoader() {
>   String result = 
> ClassLoaderUtil.getUserCodeClassLoaderInfo(ClassLoader.getSystemClassLoader());
>   assertTrue(result.toLowerCase().contains("system classloader"));
> {code}
> {{ClassLoader.getSystemClassLoader()}} no longer returns an URLClassLoader on 
> Java 9, but {{ClassLoaderUtil.getUserCodeClassLoaderInfo}} relies on this to 
> extract information about the ClassLoader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6455:
-
Description: When running a job in job mode, the web UI must not show the 
'upload new job' option.  (was: When running a job on Yarn in the 'single job 
cluster' mode, the web UI must not show the 'upload new job' option.

If users actually submit another job via the web UI, the JobManager starts 
sharing the yarn cluster between these two jobs and there will be competition 
for TaskManagers between the jobs.)

> Yarn single-job JobManagers should not allow uploads of new jobs.
> -
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN, Runtime / Web Frontend
>Affects Versions: 1.2.1, 1.7.2, 1.8.0
>Reporter: Stephan Ewen
>Priority: Minor
>
> When running a job in job mode, the web UI must not show the 'upload new job' 
> option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] leesf opened a new pull request #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest fails on Java 9

2019-03-29 Thread GitBox
leesf opened a new pull request #8077: [FLINK-12049][Tests] 
ClassLoaderUtilsTest fails on Java 9
URL: https://github.com/apache/flink/pull/8077
 
 
   
   
   ## What is the purpose of the change
   
   Fix error in ClassLoaderUtilsTest#testWithAppClassLoader on Java 9.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6455:
-
Issue Type: Improvement  (was: Bug)

> Yarn single-job JobManagers should not allow uploads of new jobs.
> -
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN, Runtime / Web Frontend
>Affects Versions: 1.2.1
>Reporter: Stephan Ewen
>Priority: Critical
>
> When running a job on Yarn in the 'single job cluster' mode, the web UI must 
> not show the 'upload new job' option.
> If users actually submit another job via the web UI, the JobManager starts 
> sharing the yarn cluster between these two jobs and there will be competition 
> for TaskManagers between the jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12064) RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore

2019-03-29 Thread Yu Li (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804874#comment-16804874
 ] 

Yu Li commented on FLINK-12064:
---

[~tzulitai] [~srichter] [~aljoscha] FYI.

> RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens 
> during restore
> 
>
> Key: FLINK-12064
> URL: https://issues.apache.org/jira/browse/FLINK-12064
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As titled, in current {{RocksDBKeyedStateBackend}} we use {{keySerializer}} 
> rather than {{keySerializerProvider.currentSchemaSerializer()}}, which is 
> incorrect. The issue is not revealed in existing UT since current cases 
> didn't check snapshot after state schema migration.
> This is a regression issue caused by the FLINK-10043 refactoring work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6455:
-
Priority: Minor  (was: Critical)

> Yarn single-job JobManagers should not allow uploads of new jobs.
> -
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN, Runtime / Web Frontend
>Affects Versions: 1.2.1
>Reporter: Stephan Ewen
>Priority: Minor
>
> When running a job on Yarn in the 'single job cluster' mode, the web UI must 
> not show the 'upload new job' option.
> If users actually submit another job via the web UI, the JobManager starts 
> sharing the yarn cluster between these two jobs and there will be competition 
> for TaskManagers between the jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest fails on Java 9

2019-03-29 Thread GitBox
flinkbot commented on issue #8077: [FLINK-12049][Tests] ClassLoaderUtilsTest 
fails on Java 9
URL: https://github.com/apache/flink/pull/8077#issuecomment-477972236
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-6455) Yarn single-job JobManagers should not allow uploads of new jobs.

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6455:
-
Affects Version/s: 1.8.0
   1.7.2

> Yarn single-job JobManagers should not allow uploads of new jobs.
> -
>
> Key: FLINK-6455
> URL: https://issues.apache.org/jira/browse/FLINK-6455
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN, Runtime / Web Frontend
>Affects Versions: 1.2.1, 1.7.2, 1.8.0
>Reporter: Stephan Ewen
>Priority: Minor
>
> When running a job on Yarn in the 'single job cluster' mode, the web UI must 
> not show the 'upload new job' option.
> If users actually submit another job via the web UI, the JobManager starts 
> sharing the yarn cluster between these two jobs and there will be competition 
> for TaskManagers between the jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8035) Unable to submit job when HA is enabled

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804866#comment-16804866
 ] 

Till Rohrmann commented on FLINK-8035:
--

Ping [~longtimer].

> Unable to submit job when HA is enabled
> ---
>
> Key: FLINK-8035
> URL: https://issues.apache.org/jira/browse/FLINK-8035
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.4.0
> Environment: Mac OS X
>Reporter: Robert Metzger
>Priority: Critical
>
> Steps to reproduce:
> - Get Flink 1.4 (f5a0b4bdfb)
> - Get ZK (3.3.6 in this case)
> - Put the following flink-conf.yaml:
> {code}
> high-availability: zookeeper
> high-availability.storageDir: file:///tmp/flink-ha
> high-availability.zookeeper.quorum: localhost:2181
> high-availability.zookeeper.path.cluster-id: /my-namespace
> {code}
> - Start Flink, submit a job (any streaming example will do)
> The job submission will time out. On the JobManager, it seems that the job 
> submission gets stuck when trying to submit something to Zookeeper.
> In the JM UI, the job will sit there in status "CREATED"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10484) New latency tracking metrics format causes metrics cardinality explosion

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804865#comment-16804865
 ] 

Till Rohrmann commented on FLINK-10484:
---

Can this issue be closed since FLINK-10242 has been backported [~jgrier]?

> New latency tracking metrics format causes metrics cardinality explosion
> 
>
> Key: FLINK-10484
> URL: https://issues.apache.org/jira/browse/FLINK-10484
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.5.4, 1.6.0, 1.6.1
>Reporter: Jamie Grier
>Assignee: Jamie Grier
>Priority: Critical
>
> The new metrics format for latency tracking causes huge metrics cardinality 
> explosion due to the format and the fact that there is a metric reported for 
> a every combination of source subtask index and operator subtask index.  
> Yikes!
> This format is actually responsible for basically taking down our metrics 
> system due to DDOSing our metrics servers (at Lyft).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9930) Flink slot memory leak after restarting job with static variables

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804864#comment-16804864
 ] 

Till Rohrmann commented on FLINK-9930:
--

The best solution is to not use static variables in your user code. Otherwise 
it will only be cleaned up after the class is being unloaded.

> Flink slot memory leak after restarting job with static variables
> -
>
> Key: FLINK-9930
> URL: https://issues.apache.org/jira/browse/FLINK-9930
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.4.2
>Reporter: Martin Shang
>Priority: Critical
>
> I find flink taskmanager exit caused by OOM ,and I dumped the head file.
>  
> The result is that the static variables are not released when canceled the 
> job, and a new object is generated after I restarted the job.
> The static variable is multiple for the slot memory isolation.
> How could I make the variables collected by the Flink?
> [multiple static variable|https://i.stack.imgur.com/KJcIN.jpg]
> [path to gc|https://i.stack.imgur.com/bvrDF.jpg]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-9930) Flink slot memory leak after restarting job with static variables

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-9930.

Resolution: Not A Problem

> Flink slot memory leak after restarting job with static variables
> -
>
> Key: FLINK-9930
> URL: https://issues.apache.org/jira/browse/FLINK-9930
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.4.2
>Reporter: Martin Shang
>Priority: Critical
>
> I find flink taskmanager exit caused by OOM ,and I dumped the head file.
>  
> The result is that the static variables are not released when canceled the 
> job, and a new object is generated after I restarted the job.
> The static variable is multiple for the slot memory isolation.
> How could I make the variables collected by the Flink?
> [multiple static variable|https://i.stack.imgur.com/KJcIN.jpg]
> [path to gc|https://i.stack.imgur.com/bvrDF.jpg]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
zentol commented on a change in pull request #7927: [FLINK-11603][metrics] Port 
the MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#discussion_r270377183
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ##
 @@ -851,8 +852,8 @@ public void heartbeatFromResourceManager(ResourceID 
resourceID) {
}
 
@Override
-   public CompletableFuture> 
requestMetricQueryServiceAddress(Time timeout) {
-   return 
CompletableFuture.completedFuture(SerializableOptional.ofNullable(metricQueryServicePath));
+   public CompletableFuture> 
requestMetricQueryServiceGateway(Time timeout) {
 
 Review comment:
   This doesn't work at runtime since `Optional` is not serializable. 
Investigating alternatives.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-9848) When Parallelism more than available task slots Flink stays idle

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804860#comment-16804860
 ] 

Till Rohrmann commented on FLINK-9848:
--

Is this still an issue [~yazdanjs]? If not, then please close this JIRA issue.

> When Parallelism more than available task slots Flink stays idle
> 
>
> Key: FLINK-9848
> URL: https://issues.apache.org/jira/browse/FLINK-9848
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Yazdan Shirvany
>Assignee: Yazdan Shirvany
>Priority: Critical
>
> For version 1.4.x when select Parallelism > Available task Slots, Flink throw 
> bellow error right away 
> {{NoResourceAvailableException: Not enough free slots available to run the 
> job}}
>   
>  but for version 1.5.x there are two different behaviors: Sometimes job ran 
> successfully and sometimes it throw bellow error after 5 minutes 
>  
> {{org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Could not allocate all requires slots within timeout of 30 ms. Slots 
> required: 5, slots allocated: 2}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-9710) Make ClusterClient be used as multiple instances in a single jvm process

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-9710.

Resolution: Not A Problem

The new {{RestClusterClient}} should be usable multiple times. Therefore, I 
will close this issue for the moment. Please re-open this issue if the new 
cluster client has the same problems.

> Make ClusterClient be used as multiple instances in a single jvm process
> 
>
> Key: FLINK-9710
> URL: https://issues.apache.org/jira/browse/FLINK-9710
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Affects Versions: 1.4.2
>Reporter: Chuanlei Ni
>Priority: Critical
>
> We can use `ClusterClient` to submit job, but it is designed for command. So 
> we cannot use this class in a long running jvm process which will create 
> multiple cluster client concurrently.
> This Jira aims to make `ClusterClient` be used as multiple instances in a 
> single jvm process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-9132) Cluster runs out of task slots when a job falls into restart loop

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-9132.
--
Resolution: Won't Fix

This problem should be fixed with a newer Flink version (>= 1.5). Please try 
this version and report back if it is not workin. 

The community unfortunately no longer supports Flink 1.4.2.

> Cluster runs out of task slots when a job falls into restart loop
> -
>
> Key: FLINK-9132
> URL: https://issues.apache.org/jira/browse/FLINK-9132
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.4.2
> Environment: env.java.opts in flink-conf.yaml file:
>  
> env.java.opts: -Xloggc:/home/user/flink/log/flinkServer-gc.log  -verbose:gc 
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
> -XX:+UseG1GC -XX:MaxGCPauseMillis=150 -XX:InitiatingHeapOccupancyPercent=55 
> -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=2 -XX:-ResizePLAB 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=100M
>Reporter: Alex Smirnov
>Priority: Critical
> Attachments: FailedJob.java, jconsole-classes.png
>
>
> If there's a job which is restarting in a loop, then Task Manager hosting it 
> goes down after some time. Job manager automatically assigns the job to 
> another Task Manager and the new Task Manager goes down as well. After some 
> time, all Task Managers are gone. Cluster becomes paralyzed.
> I've attached to TaskManager's java process using jconsole and noticed that 
> number of loaded classes increases dramatically if a job is in restarting 
> loop and restores from checkpoint.
> See attachment for the graph with G1GC enabled for the node. Standard GC 
> performs even worse - task manager shuts down within 20 minutes since the 
> restart loop start.
> I've also attached minimal program to reproduce the problem
>  
> please let me know if additional information is required from me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol commented on issue #7927: [FLINK-11603][metrics] Port the MetricQueryService to the new RpcEndpoint

2019-03-29 Thread GitBox
zentol commented on issue #7927: [FLINK-11603][metrics] Port the 
MetricQueryService to the new RpcEndpoint
URL: https://github.com/apache/flink/pull/7927#issuecomment-477968927
 
 
   So this doesn't actually work in production since 
`MetricQueryService#start()` was never called in the `MetricRegistryImpl`.
   
   I will fix this and add another test for the MR to ensure this doesn't 
happen again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-8803) Mini Cluster Shutdown with HA unstable, causing test failures

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-8803.

   Resolution: Not A Problem
Fix Version/s: (was: 1.6.5)

Not a problem anymore, because the {{FlinkMiniCluster}} has been removed with 
FLINK-10540.

> Mini Cluster Shutdown with HA unstable, causing test failures
> -
>
> Key: FLINK-8803
> URL: https://issues.apache.org/jira/browse/FLINK-8803
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Reporter: Stephan Ewen
>Priority: Critical
>
> When the {{FlinkMiniCluster}} is created for HA tests with ZooKeeper, the 
> shutdown is unstable.
> It looks like ZooKeeper may be shut down before the JobManager is shut down, 
> causing the shutdown procedure of the JobManager (specifically 
> {{ZooKeeperSubmittedJobGraphStore.removeJobGraph}}) to block until tests time 
> out.
> Full log: https://api.travis-ci.org/v3/job/346853707/log.txt
> Note that no ZK threads are alive any more, seems ZK is shut down already.
> Relevant Stack Traces:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f973800a800 nid=0x43b4 waiting on 
> condition [0x7f973eb0b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x8966cf18> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:212)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157)
>   at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
>   at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.ready(package.scala:169)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.startInternalShutdown(FlinkMiniCluster.scala:469)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:435)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.closeAsync(FlinkMiniCluster.scala:719)
>   at 
> org.apache.flink.test.util.MiniClusterResource.after(MiniClusterResource.java:104)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
> ...
> {code}
> {code}
> "flink-akka.actor.default-dispatcher-2" #1012 prio=5 os_prio=0 
> tid=0x7f97394fa800 nid=0x3328 waiting on condition [0x7f971db29000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x87f82a70> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>   at 
> org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:336)
>   at 
> org.apache.flink.shaded.curator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241)
>   at 
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225)
>   at 
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
>   at 
> org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.release(ZooKeeperStateHandleStore.java:478)
>   at 
> org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:435)
>   at 
> org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:405)
>   at 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.removeJobGraph(ZooKeeperSubmittedJobGraphStore.java:266)
>   - locked 

[jira] [Closed] (FLINK-10735) flink on yarn close container exception

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-10735.
-
Resolution: Duplicate

Should be fixed with FLINK-10848.

> flink on yarn close container exception
> ---
>
> Key: FLINK-10735
> URL: https://issues.apache.org/jira/browse/FLINK-10735
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Runtime / Coordination
>Affects Versions: 1.6.2
> Environment: Hadoop 2.7
> flink 1.6.2
>Reporter: Fei Feng
>Priority: Critical
>  Labels: yarn
>
> flink on yarn with detached mode, when cancle flink job,yarn resource release 
> very slow!
> if job failed and continouslly restart , it will get more and more container 
> until the resource is  used up。
> Log:
> 18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Job 
> 32F01A0FC50EFE8F4794AD0C45678EC4: xxx switched from state RUNNING to 
> CANCELLING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: 
> Kafka09TableSource(SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, 
> SS_CUSTOMER_SK, SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, 
> SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY, SS_WHOLESALE_COST, SS_LIST_PRICE, 
> SS_SALES_PRICE, SS_EXT_DISCOUNT_AMT, SS_EXT_SALES_PRICE, 
> SS_EXT_WHOLESALE_COST, SS_EXT_LIST_PRICE, SS_EXT_TAX, SS_COUPON_AMT, 
> SS_NET_PAID, SS_NET_PAID_INC_TAX, SS_NET_PROFIT, ROWTIME) -> from: 
> (SS_WHOLESALE_COST, SS_SALES_PRICE, SS_COUPON_AMT, SS_NET_PAID, 
> SS_NET_PROFIT, ROWTIME) -> Timestamps/Watermarks -> where: (>(SS_COUPON_AMT, 
> 0)), select: (ROWTIME, SS_WHOLESALE_COST, SS_SALES_PRICE, SS_NET_PAID, 
> SS_NET_PROFIT) -> time attribute: (ROWTIME) (1/3) 
> (0807b5f291f897ac4545dbfdb8ec3448) switched from RUNNING to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: 
> Kafka09TableSource(SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, 
> SS_CUSTOMER_SK, SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, 
> SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY, SS_WHOLESALE_COST, SS_LIST_PRICE, 
> SS_SALES_PRICE, SS_EXT_DISCOUNT_AMT, SS_EXT_SALES_PRICE, 
> SS_EXT_WHOLESALE_COST, SS_EXT_LIST_PRICE, SS_EXT_TAX, SS_COUPON_AMT, 
> SS_NET_PAID, SS_NET_PAID_INC_TAX, SS_NET_PROFIT, ROWTIME) -> from: 
> (SS_WHOLESALE_COST, SS_SALES_PRICE, SS_COUPON_AMT, SS_NET_PAID, 
> SS_NET_PROFIT, ROWTIME) -> Timestamps/Watermarks -> where: (>(SS_COUPON_AMT, 
> 0)), select: (ROWTIME, SS_WHOLESALE_COST, SS_SALES_PRICE, SS_NET_PAID, 
> SS_NET_PROFIT) -> time attribute: (ROWTIME) (2/3) 
> (a56a70eb6807dacf18fbf272ee6160e2) switched from RUNNING to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: 
> Kafka09TableSource(SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, 
> SS_CUSTOMER_SK, SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, 
> SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY, SS_WHOLESALE_COST, SS_LIST_PRICE, 
> SS_SALES_PRICE, SS_EXT_DISCOUNT_AMT, SS_EXT_SALES_PRICE, 
> SS_EXT_WHOLESALE_COST, SS_EXT_LIST_PRICE, SS_EXT_TAX, SS_COUPON_AMT, 
> SS_NET_PAID, SS_NET_PAID_INC_TAX, SS_NET_PROFIT, ROWTIME) -> from: 
> (SS_WHOLESALE_COST, SS_SALES_PRICE, SS_COUPON_AMT, SS_NET_PAID, 
> SS_NET_PROFIT, ROWTIME) -> Timestamps/Watermarks -> where: (>(SS_COUPON_AMT, 
> 0)), select: (ROWTIME, SS_WHOLESALE_COST, SS_SALES_PRICE, SS_NET_PAID, 
> SS_NET_PROFIT) -> time attribute: (ROWTIME) (3/3) 
> (a2eca3dc06087dfdaf3fd0200e545cc4) switched from RUNNING to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: window: 
> (TumblingGroupWindow('w$, 'ROWTIME, 360.millis)), select: 
> (SUM(SS_WHOLESALE_COST) AS EXPR$1, SUM(SS_SALES_PRICE) AS EXPR$2, 
> SUM(SS_NET_PAID) AS EXPR$3, SUM(SS_NET_PROFIT) AS EXPR$4, start('w$) AS 
> w$start, end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS 
> w$proctime) -> where: (>(EXPR$4, 1000)), select: (CAST(w$start) AS WSTART, 
> EXPR$1, EXPR$2, EXPR$3, EXPR$4) -> to: Row (1/1) 
> (9fa3592c74eda97124a033b6afea6c87) switched from RUNNING to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Sink: 
> JDBCAppendTableSink(RS_DAY, RS_WHOLESALE_COST, RS_SALES_PRICE, RS_NET_PAID, 
> RS_NET_PROFIT) (1/3) (6d4f413ace1206714566b610e5bf47b6) switched from RUNNING 
> to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Sink: 
> JDBCAppendTableSink(RS_DAY, RS_WHOLESALE_COST, RS_SALES_PRICE, RS_NET_PAID, 
> RS_NET_PROFIT) (2/3) (8871efc7c19abae7f833e02aa1d2107d) switched from RUNNING 
> to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Sink: 
> JDBCAppendTableSink(RS_DAY, RS_WHOLESALE_COST, RS_SALES_PRICE, RS_NET_PAID, 
> RS_NET_PROFIT) (3/3) (fcaf0b9247d60a31e3381d70871d929f) switched from RUNNING 
> to CANCELING.
>  18/10/31 19:43:59 INFO executiongraph.ExecutionGraph: Source: 
> Kafka09TableSource(SS_SOLD_DATE_SK, 

[jira] [Commented] (FLINK-9307) Running flink program doesn't print infos in console any more

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804850#comment-16804850
 ] 

Till Rohrmann commented on FLINK-9307:
--

Is this issue still valid [~Tison]?

> Running flink program doesn't print infos in console any more
> -
>
> Key: FLINK-9307
> URL: https://issues.apache.org/jira/browse/FLINK-9307
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Affects Versions: 1.6.0
> Environment: Mac OS X 12.10.6
> Flink 1.6-SNAPSHOT commit c8fa8d025684c222582 .
>Reporter: TisonKun
>Priority: Critical
>
> As shown in [this 
> page|https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html#run-the-example],
>  when flink run program, it should have printed infos like
> {code:java}
> Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123
>   Using address 127.0.0.1:6123 to connect to JobManager.
>   JobManager web interface address http://127.0.0.1:8081
>   Starting execution of program
>   Submitting job with JobID: 574a10c8debda3dccd0c78a3bde55e1b. Waiting for 
> job completion.
>   Connected to JobManager at 
> Actor[akka.tcp://flink@127.0.0.1:6123/user/jobmanager#297388688]
>   11/04/2016 14:04:50 Job execution switched to status RUNNING.
>   11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to 
> SCHEDULED
>   11/04/2016 14:04:50 Source: Socket Stream -> Flat Map(1/1) switched to 
> DEPLOYING
>   11/04/2016 14:04:50 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to SCHEDULED
>   11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to DEPLOYING
>   11/04/2016 14:04:51 Fast TumblingProcessingTimeWindows(5000) of 
> WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
> switched to RUNNING
>   11/04/2016 14:04:51 Source: Socket Stream -> Flat Map(1/1) switched to 
> RUNNING
> {code}
> but when I follow the same command, it just printed
> {code:java}
> Starting execution of program
> {code}
> Those message are printed properly if I use flink 1.4, and also logged at 
> `log/flink--taskexecutor-0-localhost.log` when running on 
> 1.6-SNAPSHOT.
> Now it breaks [this pull request|https://github.com/apache/flink/pull/5957] 
> and [~Zentol] suspects it's a regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6949) Add ability to ship custom resource files to YARN cluster

2019-03-29 Thread Mikhail Pryakhin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804849#comment-16804849
 ] 

Mikhail Pryakhin commented on FLINK-6949:
-

Hi [~till.rohrmann],

Yes, unfortunately it is still an issue. Sorry for being inactive for a long 
time, I hope to contribute to a solution soon

> Add ability to ship custom resource files to YARN cluster
> -
>
> Key: FLINK-6949
> URL: https://issues.apache.org/jira/browse/FLINK-6949
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.3.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>
> *The problem:*
> When deploying a flink job on YARN it is not possible to specify custom 
> resource files to be shipped to YARN cluster.
>  
> *The use case description:*
> When running a flink job on multiple environments it becomes necessary to 
> pass environment-related configuration files to the job's runtime. It can be 
> accomplished by packaging configuration files within the job's jar. But 
> having tens of different environments one can easily end up packaging as many 
> jars as there are environments. It would be great to have an ability to 
> separate configuration files from the job artifacts. 
>  
> *The possible solution:*
> add the --yarnship-files option to flink cli to specify files that should be 
> shipped to the YARN cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7990) Strange behavior when configuring Logback for logging

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804845#comment-16804845
 ] 

Till Rohrmann commented on FLINK-7990:
--

Is this issue still valid [~fhueske]? If not, then please close it.

> Strange behavior when configuring Logback for logging
> -
>
> Key: FLINK-7990
> URL: https://issues.apache.org/jira/browse/FLINK-7990
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.3.2
>Reporter: Fabian Hueske
>Priority: Major
>
> The following issue was reported on the [user 
> mailinglist|https://lists.apache.org/thread.html/c06a9f0b1189bf21d946d3d9728631295c88bfc57043cdbe18409d52@%3Cuser.flink.apache.org%3E]
> {quote}
> I have a single node Flink instance which has the required jars for logback 
> in the lib folder (logback-classic.jar, logback-core.jar, 
> log4j-over-slf4j.jar). I have removed the jars for log4j from the lib folder 
> (log4j-1.2.17.jar, slf4j-log4j12-1.7.7.jar). 'logback.xml' is also correctly 
> updated in 'conf' folder. I have also included 'logback.xml' in the 
> classpath, although this does not seem to be considered while the job is run. 
> Flink refers to logback.xml inside the conf folder only. I have updated 
> pom.xml as per Flink's documentation in order to exclude log4j. I have some 
> log entries set inside a few map and flatmap functions and some log entries 
> outside those functions (eg: "program execution started").
> When I run the job, Flink writes only those logs that are coded outside the 
> transformations. Those logs that are coded inside the transformations (map, 
> flatmap etc) are not getting written to the log file. If this was happening 
> always, I could have assumed that the Task Manager is not writing the logs. 
> But Flink displays a strange behavior regarding this. Whenever I update the 
> logback jars inside the the lib folder(due to version changes), during the 
> next job run, all logs (even those inside map and flatmap) are written 
> correctly into the log file. But the logs don't get written in any of the 
> runs after that. This means that my 'logback.xml' file is correct and the 
> settings are also correct. But I don't understand why the same settings don't 
> work while the same job is run again.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6206) Log task state transitions as warn/error for FAILURE scenarios

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6206:
-
Priority: Minor  (was: Critical)

> Log task state transitions as warn/error for FAILURE scenarios
> --
>
> Key: FLINK-6206
> URL: https://issues.apache.org/jira/browse/FLINK-6206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.2.0
>Reporter: Dan Bress
>Priority: Minor
>  Labels: pull-request-available
>
> If a task fails due to an exception, I would like that to be logged at a warn 
> or an error level.  currently its info
> {code}
> private boolean transitionState(ExecutionState currentState, ExecutionState 
> newState, Throwable cause) {
>   if (STATE_UPDATER.compareAndSet(this, currentState, newState)) {
>   if (cause == null) {
>   LOG.info("{} ({}) switched from {} to {}.", 
> taskNameWithSubtask, executionId, currentState, newState);
>   } else {
>   LOG.info("{} ({}) switched from {} to {}.", 
> taskNameWithSubtask, executionId, currentState, newState, cause);
>   }
>   return true;
>   } else {
>   return false;
>   }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7990) Strange behavior when configuring Logback for logging

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-7990:
-
Priority: Major  (was: Critical)

> Strange behavior when configuring Logback for logging
> -
>
> Key: FLINK-7990
> URL: https://issues.apache.org/jira/browse/FLINK-7990
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.3.2
>Reporter: Fabian Hueske
>Priority: Major
>
> The following issue was reported on the [user 
> mailinglist|https://lists.apache.org/thread.html/c06a9f0b1189bf21d946d3d9728631295c88bfc57043cdbe18409d52@%3Cuser.flink.apache.org%3E]
> {quote}
> I have a single node Flink instance which has the required jars for logback 
> in the lib folder (logback-classic.jar, logback-core.jar, 
> log4j-over-slf4j.jar). I have removed the jars for log4j from the lib folder 
> (log4j-1.2.17.jar, slf4j-log4j12-1.7.7.jar). 'logback.xml' is also correctly 
> updated in 'conf' folder. I have also included 'logback.xml' in the 
> classpath, although this does not seem to be considered while the job is run. 
> Flink refers to logback.xml inside the conf folder only. I have updated 
> pom.xml as per Flink's documentation in order to exclude log4j. I have some 
> log entries set inside a few map and flatmap functions and some log entries 
> outside those functions (eg: "program execution started").
> When I run the job, Flink writes only those logs that are coded outside the 
> transformations. Those logs that are coded inside the transformations (map, 
> flatmap etc) are not getting written to the log file. If this was happening 
> always, I could have assumed that the Task Manager is not writing the logs. 
> But Flink displays a strange behavior regarding this. Whenever I update the 
> logback jars inside the the lib folder(due to version changes), during the 
> next job run, all logs (even those inside map and flatmap) are written 
> correctly into the log file. But the logs don't get written in any of the 
> runs after that. This means that my 'logback.xml' file is correct and the 
> settings are also correct. But I don't understand why the same settings don't 
> work while the same job is run again.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers

2019-03-29 Thread GitBox
zentol commented on a change in pull request #8002: [FLINK-11923][metrics] 
MetricRegistryConfiguration provides MetricReporters Suppliers
URL: https://github.com/apache/flink/pull/8002#discussion_r270372947
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java
 ##
 @@ -144,11 +135,9 @@ public MetricRegistryImpl(MetricRegistryConfiguration 
config) {
}
}
 
-   Class reporterClass = 
Class.forName(className);
-   MetricReporter reporterInstance = 
(MetricReporter) reporterClass.newInstance();
+   final MetricReporter reporterInstance = 
reporterSetup.getSupplier().get();
 
 Review comment:
   see 
https://github.com/zentol/flink/commit/b5feb554a965913255c3a7f3ab94430bb37d6c52#diff-65246650592087a2e5a816275f23dcc3


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] zentol commented on a change in pull request #8002: [FLINK-11923][metrics] MetricRegistryConfiguration provides MetricReporters Suppliers

2019-03-29 Thread GitBox
zentol commented on a change in pull request #8002: [FLINK-11923][metrics] 
MetricRegistryConfiguration provides MetricReporters Suppliers
URL: https://github.com/apache/flink/pull/8002#discussion_r270372880
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java
 ##
 @@ -144,11 +135,9 @@ public MetricRegistryImpl(MetricRegistryConfiguration 
config) {
}
}
 
-   Class reporterClass = 
Class.forName(className);
-   MetricReporter reporterInstance = 
(MetricReporter) reporterClass.newInstance();
+   final MetricReporter reporterInstance = 
reporterSetup.getSupplier().get();
 
 Review comment:
   In FLINK-11922 I want do add support for reporter factories, with which 
supported reporters would allocate resources in their constructor instead of 
`open`. I don't want that to occur in the configuration, hence the supplier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (FLINK-6125) Commons httpclient is not shaded anymore in Flink 1.2

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-6125.
--
Resolution: Later

Closed because of inactivity. If this is still a problem, please re-open this 
issue.

> Commons httpclient is not shaded anymore in Flink 1.2
> -
>
> Key: FLINK-6125
> URL: https://issues.apache.org/jira/browse/FLINK-6125
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Connectors / Kinesis
>Reporter: Robert Metzger
>Priority: Critical
>
> This has been reported by a user: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Return-of-Flink-shading-problems-in-1-2-0-td12257.html
> The Kinesis connector requires Flink to not expose any httpclient 
> dependencies. Since Flink 1.2 it seems that we are exposing that dependency 
> again



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6949) Add ability to ship custom resource files to YARN cluster

2019-03-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804842#comment-16804842
 ] 

Till Rohrmann commented on FLINK-6949:
--

Hi [~m.pryahin] is this still an issue or has this been fixed by now?

> Add ability to ship custom resource files to YARN cluster
> -
>
> Key: FLINK-6949
> URL: https://issues.apache.org/jira/browse/FLINK-6949
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.3.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>
> *The problem:*
> When deploying a flink job on YARN it is not possible to specify custom 
> resource files to be shipped to YARN cluster.
>  
> *The use case description:*
> When running a flink job on multiple environments it becomes necessary to 
> pass environment-related configuration files to the job's runtime. It can be 
> accomplished by packaging configuration files within the job's jar. But 
> having tens of different environments one can easily end up packaging as many 
> jars as there are environments. It would be great to have an ability to 
> separate configuration files from the job artifacts. 
>  
> *The possible solution:*
> add the --yarnship-files option to flink cli to specify files that should be 
> shipped to the YARN cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-7351) test instability in JobClientActorRecoveryITCase#testJobClientRecovery

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-7351.

   Resolution: Not A Problem
Fix Version/s: (was: 1.6.5)

Not a problem anymore with Flip-6

> test instability in JobClientActorRecoveryITCase#testJobClientRecovery
> --
>
> Key: FLINK-7351
> URL: https://issues.apache.org/jira/browse/FLINK-7351
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.3.2, 1.4.0
>Reporter: Nico Kruber
>Priority: Critical
>  Labels: test-stability
>
> On a 16-core VM, the following test failed during {{mvn clean verify}}
> {code}
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec 
> <<< FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase
> testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase)
>   Time elapsed: 21.299 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Not enough free slots available to run the job. You can decrease the operator 
> parallelism or increase the number of slots per TaskManager in the 
> configuration. Resources available to scheduler: Number of instances=0, total 
> number of slots=0, available slots=0
> at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
> at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
> at 
> org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
> at 
> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6949) Add ability to ship custom resource files to YARN cluster

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-6949:
-
Priority: Major  (was: Critical)

> Add ability to ship custom resource files to YARN cluster
> -
>
> Key: FLINK-6949
> URL: https://issues.apache.org/jira/browse/FLINK-6949
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.3.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>
> *The problem:*
> When deploying a flink job on YARN it is not possible to specify custom 
> resource files to be shipped to YARN cluster.
>  
> *The use case description:*
> When running a flink job on multiple environments it becomes necessary to 
> pass environment-related configuration files to the job's runtime. It can be 
> accomplished by packaging configuration files within the job's jar. But 
> having tens of different environments one can easily end up packaging as many 
> jars as there are environments. It would be great to have an ability to 
> separate configuration files from the job artifacts. 
>  
> *The possible solution:*
> add the --yarnship-files option to flink cli to specify files that should be 
> shipped to the YARN cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-7610) xerces lib conflict

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-7610.

Resolution: Resolved

Should be resolved with child-first classloading.

> xerces lib conflict
> ---
>
> Key: FLINK-7610
> URL: https://issues.apache.org/jira/browse/FLINK-7610
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.3.2
>Reporter: Domenico Campagnolo
>Priority: Critical
>
> Flink job fails when I try to use the method validate((new 
> StAXSource(reader))) from the class _javax.xml.validation.Validator_ because 
> StAXSource is [supported since xerces 
> 2.10|https://stackoverflow.com/questions/20493359/staxsource-not-accepted-by-validator-in-jboss-eap-6-1]
>  and the library _flink-shaded-hadoop2-uber-1.3.2_ contains the 2.9.1 version 
> in it. 
> I also included the dependency to the latest version of xerces 2.11.0 into my 
> job-jar but it still does not work.
> I also added that library into _%FLINK_HOME%/lib/_ folder hoping it would be 
> loaded on the classpath at the boot, overriding the original one, but it does 
> not work either.
> See the cause:
> _Caused by: java.lang.IllegalArgumentException: *Source parameter of type 
> 'javax.xml.transform.stax.StAXSource' is not accepted by this validator*._
> {code:java}
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
>   at 
> org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
>   at 
> org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
>   at 
> org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
>   at 
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
>   at 
> org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
>   at 
> org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
>   at 
> org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
>   at 
> org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
>   at 
> io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
>   at 
> io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
>   at 
> org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
>   at 
> 

[jira] [Updated] (FLINK-7040) Flip-6 client-cluster communication

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-7040:
-
Priority: Major  (was: Critical)

> Flip-6 client-cluster communication
> ---
>
> Key: FLINK-7040
> URL: https://issues.apache.org/jira/browse/FLINK-7040
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Mesos, Runtime / Coordination
>Reporter: Till Rohrmann
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: flip-6
>
> With the new Flip-6 architecture, the client will communicate with the 
> cluster in a RESTful manner.
> The cluster shall support the following REST calls:
> * List jobs (GET): Get list of all running jobs on the cluster
> * Submit job (POST): Submit a job to the cluster (only supported in session 
> mode)
> * Lookup job leader (GET): Gets the JM leader for the given job
> * Get job status (GET): Get the status of an executed job (and maybe the 
> JobExecutionResult)
> * Cancel job (PUT): Cancel the given job
> * Stop job (PUT): Stops the given job
> * Take savepoint (POST): Take savepoint for given job (How to return the 
> savepoint under which the savepoint was stored? Maybe always having to 
> specify a path)
> * Get KV state (GET): Gets the KV state for the given job and key (Queryable 
> state)
> * Poll/subscribe to notifications for job (GET, WebSocket): Polls new 
> notifications from the execution of the given job/Opens WebSocket to receive 
> notifications
> The first four REST calls will be served by the REST endpoint running in the 
> application master/cluster entrypoint. The other calls will be served by a 
> REST endpoint running along side to the JobManager.
> Detailed information about different implementations and their pros and cons 
> can be found in this document:
> https://docs.google.com/document/d/1eIX6FS9stwraRdSUgRSuLXC1sL7NAmxtuqIXe_jSi-k/edit?usp=sharing
> The implementation will most likely be Netty based.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8641) Move BootstrapTools#getTaskManagerShellCommand to flink-yarn

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-8641:
-
Priority: Major  (was: Critical)

> Move BootstrapTools#getTaskManagerShellCommand to flink-yarn
> 
>
> Key: FLINK-8641
> URL: https://issues.apache.org/jira/browse/FLINK-8641
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN, Runtime / Configuration
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Priority: Major
>
> I would like to move {{getTaskManagerShellCommand()}} and 
> {{getStartCommand()}} from 
> {{org.apache.flink.runtime.clusterframework.BootstrapTools}} in flink-runtime 
> to flink-yarn.
> Yarn is the sole user of these methods, and both methods are directly related 
> to the {{YARN_CONTAINER_START_COMMAND_TEMPLATE}} {{ConfigConstants}}
> We can't move this constant to {{YarnOptions}} at this point since the 
> {{YarnOptions}} are in {{flink-yarn}}, but the above methods require the 
> option to be accessible from {{flink-runtime}}.
> [~till.rohrmann] Do you see any problems that this move could cause?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-5536) Config option: HA

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-5536.
--
Resolution: Incomplete

Parent issue closed. Please re-open if you want to pick it up again.

> Config option: HA
> -
>
> Key: FLINK-5536
> URL: https://issues.apache.org/jira/browse/FLINK-5536
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Mesos
>Reporter: Eron Wright 
>Assignee: Stavros Kontopoulos
>Priority: Major
>
> Configure Flink HA thru package options plus good defaults.   The main 
> components are ZK configuration and state backend configuration.
> - The ZK information can be defaulted to `master.mesos` as with other packages
> - Evaluate whether ZK can be fully configured by default, even if a state 
> backend isn't configured.
> - Use DCOS HDFS as the filesystem for the state backend.  Evaluate whether to 
> assume that DCOS HDFS is installed by default, or whether to make it explicit.
> - To use DCOS HDFS, the init script should download the core-site.xml and 
> hdfs-site.xml from the HDFS 'connection' endpoint.   Supply a default value 
> for the endpoint address; see 
> [https://docs.mesosphere.com/service-docs/hdfs/connecting-clients/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-5534) Config option: 'flink-options'

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-5534.
--
Resolution: Incomplete

Parent issue closed. Please re-open if you want to pick it up again.

> Config option: 'flink-options'
> --
>
> Key: FLINK-5534
> URL: https://issues.apache.org/jira/browse/FLINK-5534
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Mesos
>Reporter: Eron Wright 
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-5540) CLI: savepoint

2019-03-29 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-5540.
--
Resolution: Incomplete

Parent issue closed. Please re-open if you want to pick it up again.

> CLI: savepoint
> --
>
> Key: FLINK-5540
> URL: https://issues.apache.org/jira/browse/FLINK-5540
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Mesos
>Reporter: Eron Wright 
>Priority: Major
>
> Implement CLI support for savepoints, in both the 'run' and 'cancel' 
> operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >