[GitHub] flink pull request: [FLINK-1521] Chained operators respect reuse
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/392#issuecomment-74367347 The problem here I think that an error caused by the reusing mapper could be very hard to detect. So some users might have it but they don't realise. For streaming we copy the input at every chained function call just to be on the safe side, but we'll have to reintroduce the object reuse mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1521) Some Chained Drivers do not respect object-reuse/non-reuse flag
[ https://issues.apache.org/jira/browse/FLINK-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321297#comment-14321297 ] ASF GitHub Bot commented on FLINK-1521: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/392#issuecomment-74367347 The problem here I think that an error caused by the reusing mapper could be very hard to detect. So some users might have it but they don't realise. For streaming we copy the input at every chained function call just to be on the safe side, but we'll have to reintroduce the object reuse mode. Some Chained Drivers do not respect object-reuse/non-reuse flag --- Key: FLINK-1521 URL: https://issues.apache.org/jira/browse/FLINK-1521 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1545) Spurious failure of AsynchronousFileIOChannelsTest.testExceptionForwardsToClose
Till Rohrmann created FLINK-1545: Summary: Spurious failure of AsynchronousFileIOChannelsTest.testExceptionForwardsToClose Key: FLINK-1545 URL: https://issues.apache.org/jira/browse/FLINK-1545 Project: Flink Issue Type: Bug Reporter: Till Rohrmann On Travis the test case {{AsynchronousFileIOChannelsTest.testExceptionForwardsToClose}} failed. {code} java.lang.AssertionError: did not forward exception at org.junit.Assert.fail(Assert.java:88) at org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannelsTest.testExceptionForwardsToClose(AsynchronousFileIOChannelsTest.java:130) at org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannelsTest.testExceptionForwardsToClose(AsynchronousFileIOChannelsTest.java:94) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1531) Custom Kryo Serializer fails in itertation scenario
[ https://issues.apache.org/jira/browse/FLINK-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321520#comment-14321520 ] Johannes commented on FLINK-1531: - Thanks for the info [~StephanEwen] Makes sense to me and considering the old SO issue, you are definitively right, just wanted to raise this issue. Custom Kryo Serializer fails in itertation scenario --- Key: FLINK-1531 URL: https://issues.apache.org/jira/browse/FLINK-1531 Project: Flink Issue Type: Bug Components: Iterations Affects Versions: 0.9 Reporter: Johannes Priority: Minor Fix For: 0.9, 0.8.1 Attachments: TestKryoIterationSerializer.java When using iterations with a custom serializer for a domain object, the iteration will fail. {code:java} org.apache.flink.runtime.client.JobExecutionException: com.esotericsoftware.kryo.KryoException: Buffer underflow at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:76) at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:198) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:203) at org.apache.flink.runtime.io.disk.InputViewIterator.next(InputViewIterator.java:43) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamOutFinalOutputBulk(IterationHeadPactTask.java:404) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:377) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:204) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1546) Failed job causes JobManager to shutdown due to uncatched WebFrontend exception
Robert Metzger created FLINK-1546: - Summary: Failed job causes JobManager to shutdown due to uncatched WebFrontend exception Key: FLINK-1546 URL: https://issues.apache.org/jira/browse/FLINK-1546 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger {code} 16:59:26,588 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job ef19b2b201d4b81f031334cb76eadc78 (Basic Page Rank Example) changed to FAILEDCleanup job ef19b2b201d4b81f031334cb76eadc78.. 16:59:26,591 ERROR akka.actor.OneForOneStrategy - Can only archive the job from a terminal state java.lang.IllegalStateException: Can only archive the job from a terminal state at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:648) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$removeJob(JobManager.scala:508) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:271) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.YarnJobManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnJobManager.scala:70) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:86) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 16:59:26,595 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopping webserver. 16:59:26,654 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopped webserver. 16:59:26,656 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopping job manager akka://flink/user/jobmanager. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1547) Disable automated ApplicationMaster restart
[ https://issues.apache.org/jira/browse/FLINK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321551#comment-14321551 ] Stephan Ewen commented on FLINK-1547: - I agree, we should also deactivate the restart other root actors. Right now, the system is not designed to recover from JobManager crashes by pure actor restart (without process restart). While it is possible to extend the system towards that, in its current state, disabling the restart leads to a clearer error behavior. Disable automated ApplicationMaster restart --- Key: FLINK-1547 URL: https://issues.apache.org/jira/browse/FLINK-1547 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Currently, Flink on YARN is restarting the the ApplicationMaster, if it crashes. The other components don't support this (frontend tries to reconnect.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1548) [DISCUSS] Make Scala implicit parameters explicit in the runtime
Stephan Ewen created FLINK-1548: --- Summary: [DISCUSS] Make Scala implicit parameters explicit in the runtime Key: FLINK-1548 URL: https://issues.apache.org/jira/browse/FLINK-1548 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Priority: Minor Fix For: 0.9 Scala's feature of implicit parameters is very powerful and invaluable in the design of nice high level APIs. In the system runtime, thought, I think we should not use implicit parameters, as they make the code more tricky to understand and make it harder to figure out where parameters actually come from. The API niceties are not required there. I propose to make all parameters explicit in runtime classes. Right now, this concerns mostly ActorSystem and Timeout parameters. This is nothing we need to do as a separate task, I would suggest to change that whenever we encounter such a method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1547) Disable automated ApplicationMaster restart
Robert Metzger created FLINK-1547: - Summary: Disable automated ApplicationMaster restart Key: FLINK-1547 URL: https://issues.apache.org/jira/browse/FLINK-1547 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Currently, Flink on YARN is restarting the the ApplicationMaster, if it crashes. The other components don't support this (frontend tries to reconnect.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1508] Removes AkkaUtil.ask
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/384 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321635#comment-14321635 ] Robert Metzger commented on FLINK-1388: --- Yes, for private fields you need to make them accessible first. Currently, there is only code in Flink which is accessing these fields as keys. And somewhere we make the fields accessible prior to using them. You can safely make them accessible. I would recommend to do that inside the CSV writer code to make sure it happens properly on all machines. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1546) Failed job causes JobManager to shutdown due to uncatched WebFrontend exception
[ https://issues.apache.org/jira/browse/FLINK-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321649#comment-14321649 ] Till Rohrmann commented on FLINK-1546: -- The problem with the uncaught exception in the actor thread is fixed with 589b539c5acdd25f53ef6c9a453198a960ba93d5. However, the interesting question is why the system complains that the current job is not in a terminal state. The log line before says that it switched to {{FAILED}}. Can you reproduce the error [~rmetzger]? Maybe we can add in which state the job is when it throws the exception. Failed job causes JobManager to shutdown due to uncatched WebFrontend exception --- Key: FLINK-1546 URL: https://issues.apache.org/jira/browse/FLINK-1546 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger {code} 16:59:26,588 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job ef19b2b201d4b81f031334cb76eadc78 (Basic Page Rank Example) changed to FAILEDCleanup job ef19b2b201d4b81f031334cb76eadc78.. 16:59:26,591 ERROR akka.actor.OneForOneStrategy - Can only archive the job from a terminal state java.lang.IllegalStateException: Can only archive the job from a terminal state at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:648) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$removeJob(JobManager.scala:508) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:271) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.YarnJobManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnJobManager.scala:70) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:86) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 16:59:26,595 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopping webserver. 16:59:26,654 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopped webserver. 16:59:26,656 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopping job manager akka://flink/user/jobmanager. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1539) Runtime context not initialized when running streaming PojoExample
[ https://issues.apache.org/jira/browse/FLINK-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1539: -- Affects Version/s: 0.9 Runtime context not initialized when running streaming PojoExample -- Key: FLINK-1539 URL: https://issues.apache.org/jira/browse/FLINK-1539 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gyula Fora When running streaming PojoExample received the following exception: Exception in thread main java.lang.IllegalStateException: The runtime context has not been initialized. at org.apache.flink.api.common.functions.AbstractRichFunction.getRuntimeContext(AbstractRichFunction.java:49) at org.apache.flink.streaming.api.function.aggregation.SumAggregator$PojoSumAggregator.init(SumAggregator.java:149) at org.apache.flink.streaming.api.function.aggregation.SumAggregator.getSumFunction(SumAggregator.java:52) at org.apache.flink.streaming.api.datastream.DataStream.sum(DataStream.java:632) at org.apache.flink.streaming.examples.wordcount.PojoExample.main(PojoExample.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321633#comment-14321633 ] Adnan Khan commented on FLINK-1388: --- Hey Timo, So the rules for defining a POJO according to [this|https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md] include this part {quote} All fields in the class (and all superclasses) are either public or or have a public getter and a setter method that follows the Java beans naming conventions for getters and setters. {quote} So that means that it could have private fields. I was thinking we should add {{field.setAccessible(true)}} in the implementation for {{PojoField}}. Otherwise something like {{pFieldValue = pField.field.get(myPojo)}} does not seem to work. I ran into while I was testing the CSV writer with POJOs with private fields but public getter/setters. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321645#comment-14321645 ] Robert Metzger commented on FLINK-1388: --- How about {{org.apache.flink.api.java.functions}} ? POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1543) Proper exception handling in actors
[ https://issues.apache.org/jira/browse/FLINK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321583#comment-14321583 ] ASF GitHub Bot commented on FLINK-1543: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/394#issuecomment-74384108 @tillrohrmann I had concern about style and question about value about parameter passed. Would be nice to address those before committing. Proper exception handling in actors --- Key: FLINK-1543 URL: https://issues.apache.org/jira/browse/FLINK-1543 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann With Akka's actors it is important to not throw exceptions in the actor thread, if one does not want that the actor restarts or stops. Many of the Java components which are called from the actor's receive method throw exceptions which are not properly handled by the actor thread. Therefore, we have to catch these exceptions and handle them properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1543] Adds better exception handling in...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/394#discussion_r24716021 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/taskmanager/TaskManagerTest.java --- @@ -486,7 +489,8 @@ protected void run() { public void onReceive(Object message) throws Exception { if(message instanceof RegistrationMessages.RegisterTaskManager){ final InstanceID iid = new InstanceID(); - getSender().tell(new RegistrationMessages.AcknowledgeRegistration(iid, -1), + getSender().tell(new RegistrationMessages.AcknowledgeRegistration(iid, -1, + Option.ActorRefapply(null)), --- End diff -- That was the only way I found to create a ```None``` with type parameter ```ActorRef``` in Java. ```None``` itself in Java has no type parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321642#comment-14321642 ] Adnan Khan commented on FLINK-1388: --- Okay thanks, that sounds good. Also - Where should I put the csv writer code? Right now I have it in {{org.apache.flink.api.java.typeutils.runtime}} POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1547) Disable automated ApplicationMaster restart
[ https://issues.apache.org/jira/browse/FLINK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321654#comment-14321654 ] Till Rohrmann commented on FLINK-1547: -- This is not so surprising. I just looked into the {{YarnJobManager}} and I could spot several method calls which throw exceptions which are not properly caught. This causes the {{YarnJobManager}} to crash. Disable automated ApplicationMaster restart --- Key: FLINK-1547 URL: https://issues.apache.org/jira/browse/FLINK-1547 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Currently, Flink on YARN is restarting the the ApplicationMaster, if it crashes. The other components don't support this (frontend tries to reconnect.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1543] Adds better exception handling in...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/394 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1508) Remove AkkaUtils.ask to encourage explicit future handling
[ https://issues.apache.org/jira/browse/FLINK-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321564#comment-14321564 ] ASF GitHub Bot commented on FLINK-1508: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/384 Remove AkkaUtils.ask to encourage explicit future handling -- Key: FLINK-1508 URL: https://issues.apache.org/jira/browse/FLINK-1508 Project: Flink Issue Type: Bug Reporter: Till Rohrmann {{AkkaUtils.ask}} asks another actor and awaits its response. Since this constitutes a blocking call, it might be potentially harmful when used in an actor thread. In order to encourage developers to program asynchronously I propose to remove this helper function. That forces the developer to handle futures explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1543) Proper exception handling in actors
[ https://issues.apache.org/jira/browse/FLINK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-1543. Resolution: Fixed Fixed in 589b539c5acdd25f53ef6c9a453198a960ba93d5 Proper exception handling in actors --- Key: FLINK-1543 URL: https://issues.apache.org/jira/browse/FLINK-1543 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann With Akka's actors it is important to not throw exceptions in the actor thread, if one does not want that the actor restarts or stops. Many of the Java components which are called from the actor's receive method throw exceptions which are not properly handled by the actor thread. Therefore, we have to catch these exceptions and handle them properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1543) Proper exception handling in actors
[ https://issues.apache.org/jira/browse/FLINK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321600#comment-14321600 ] ASF GitHub Bot commented on FLINK-1543: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/394#discussion_r24716021 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/taskmanager/TaskManagerTest.java --- @@ -486,7 +489,8 @@ protected void run() { public void onReceive(Object message) throws Exception { if(message instanceof RegistrationMessages.RegisterTaskManager){ final InstanceID iid = new InstanceID(); - getSender().tell(new RegistrationMessages.AcknowledgeRegistration(iid, -1), + getSender().tell(new RegistrationMessages.AcknowledgeRegistration(iid, -1, + Option.ActorRefapply(null)), --- End diff -- That was the only way I found to create a ```None``` with type parameter ```ActorRef``` in Java. ```None``` itself in Java has no type parameter. Proper exception handling in actors --- Key: FLINK-1543 URL: https://issues.apache.org/jira/browse/FLINK-1543 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann With Akka's actors it is important to not throw exceptions in the actor thread, if one does not want that the actor restarts or stops. Many of the Java components which are called from the actor's receive method throw exceptions which are not properly handled by the actor thread. Therefore, we have to catch these exceptions and handle them properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1543) Proper exception handling in actors
[ https://issues.apache.org/jira/browse/FLINK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321601#comment-14321601 ] ASF GitHub Bot commented on FLINK-1543: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/394#issuecomment-74385078 @hsaputra, sorry I did not see your comments before I committed the PR. I started merging it yesterday but it always failed because of some minor issues on Travis. Therefore, I only looked at the Travis results. Won't happen again. You're right that the extra spaces are missing. Good catch. I'll add them with my next commit. Proper exception handling in actors --- Key: FLINK-1543 URL: https://issues.apache.org/jira/browse/FLINK-1543 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann With Akka's actors it is important to not throw exceptions in the actor thread, if one does not want that the actor restarts or stops. Many of the Java components which are called from the actor's receive method throw exceptions which are not properly handled by the actor thread. Therefore, we have to catch these exceptions and handle them properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1547) Disable automated ApplicationMaster restart
[ https://issues.apache.org/jira/browse/FLINK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321631#comment-14321631 ] Robert Metzger commented on FLINK-1547: --- The AM stopped due to: https://issues.apache.org/jira/browse/FLINK-1546. (Even starting a job with the wrong file path causes the JobManager to crash). Yarn is automatically restarting the AM. For YARN, its a super simple fix. I opened the issue just to remind myself that I have to do it. Disable automated ApplicationMaster restart --- Key: FLINK-1547 URL: https://issues.apache.org/jira/browse/FLINK-1547 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Currently, Flink on YARN is restarting the the ApplicationMaster, if it crashes. The other components don't support this (frontend tries to reconnect.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1543] Adds better exception handling in...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/394#issuecomment-74388302 No worries, thanks for replying to my concern =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1543) Proper exception handling in actors
[ https://issues.apache.org/jira/browse/FLINK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321658#comment-14321658 ] ASF GitHub Bot commented on FLINK-1543: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/394#issuecomment-74388302 No worries, thanks for replying to my concern =) Proper exception handling in actors --- Key: FLINK-1543 URL: https://issues.apache.org/jira/browse/FLINK-1543 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann With Akka's actors it is important to not throw exceptions in the actor thread, if one does not want that the actor restarts or stops. Many of the Java components which are called from the actor's receive method throw exceptions which are not properly handled by the actor thread. Therefore, we have to catch these exceptions and handle them properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/374 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager
[ https://issues.apache.org/jira/browse/FLINK-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321676#comment-14321676 ] ASF GitHub Bot commented on FLINK-1179: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/374 Add button to JobManager web interface to request stack trace of a TaskManager -- Key: FLINK-1179 URL: https://issues.apache.org/jira/browse/FLINK-1179 Project: Flink Issue Type: New Feature Components: JobManager Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter This is something I do quite often manually and I think it might be helpful for users as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager
[ https://issues.apache.org/jira/browse/FLINK-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1179. --- Resolution: Fixed Fix Version/s: 0.9 Thank you Chiwan for implementing this! The issue has been merged in https://git1-us-west.apache.org/repos/asf?p=flink.git;a=commit;h=da8c02b9 Add button to JobManager web interface to request stack trace of a TaskManager -- Key: FLINK-1179 URL: https://issues.apache.org/jira/browse/FLINK-1179 Project: Flink Issue Type: New Feature Components: JobManager Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Fix For: 0.9 This is something I do quite often manually and I think it might be helpful for users as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1546) Failed job causes JobManager to shutdown due to uncatched WebFrontend exception
[ https://issues.apache.org/jira/browse/FLINK-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321661#comment-14321661 ] Robert Metzger commented on FLINK-1546: --- Indeed. The uncaught exception doesn't cause the JM to die anymore. Now I see the following output in the logs {code} 20:21:48,968 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Received job 49a866e90bce097d9ebb7f2caee0b103 (Read only job). 20:21:49,145 ERROR org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Job submission failed. org.apache.flink.runtime.JobException: Creating the input splits caused an error: File does not exist: hdfs:/user/robert/datasets/access-100.log at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.init(ExecutionJobVertex.java:161) at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:194) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:460) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:171) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.YarnJobManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnJobManager.scala:70) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:86) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.io.FileNotFoundException: File does not exist: hdfs:/user/robert/datasets/access-100.log at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getFileStatus(HadoopFileSystem.java:339) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:403) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.init(ExecutionJobVertex.java:145) ... 23 more 20:21:49,151 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at akka.tcp://fl...@cloud-34.dima.tu-berlin.de:54138/user/taskmanager as 75fc90247a92e285f4cb45a7028b6fbd. Current number of registered hosts is 10. 20:21:49,517 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at akka.tcp://fl...@cloud-36.dima.tu-berlin.de:54285/user/taskmanager as c63f5e6425cf95175d37bea8d6be35fa. Current number of registered hosts is 11. 20:21:49,635 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at akka.tcp://fl...@cloud-18.dima.tu-berlin.de:40967/user/taskmanager as d6937096c3dc5989a688e5c23c38853c. Current number of registered hosts is 12. 20:21:49,814 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at akka.tcp://fl...@cloud-19.dima.tu-berlin.de:49978/user/taskmanager as 6c7be9dc4480c1acde14df555ae6d472. Current number of registered hosts is 13. 20:21:50,216 INFO org.apache.flink.runtime.instance.InstanceManager - Registered
[GitHub] flink pull request: StreamWindow abstraction + modular window comp...
Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/395#issuecomment-74393323 +1 for merging asap. It's an improvement on both semantics and performance prospects for windowing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1546) Failed job causes JobManager to shutdown due to uncatched WebFrontend exception
[ https://issues.apache.org/jira/browse/FLINK-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321725#comment-14321725 ] Till Rohrmann commented on FLINK-1546: -- Apparently, the specified file is wrong and thus the job cannot be started. I think this behaviour is correct, isn't it? However, what is interesting is the problem with the non terminal state. Could you add the current state to the thrown exception in ExecutionGraph.java:648. Failed job causes JobManager to shutdown due to uncatched WebFrontend exception --- Key: FLINK-1546 URL: https://issues.apache.org/jira/browse/FLINK-1546 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger {code} 16:59:26,588 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job ef19b2b201d4b81f031334cb76eadc78 (Basic Page Rank Example) changed to FAILEDCleanup job ef19b2b201d4b81f031334cb76eadc78.. 16:59:26,591 ERROR akka.actor.OneForOneStrategy - Can only archive the job from a terminal state java.lang.IllegalStateException: Can only archive the job from a terminal state at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:648) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$removeJob(JobManager.scala:508) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:271) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.YarnJobManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnJobManager.scala:70) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:86) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 16:59:26,595 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopping webserver. 16:59:26,654 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopped webserver. 16:59:26,656 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Stopping job manager akka://flink/user/jobmanager. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1549] Adds proper exception handling to...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/397 [FLINK-1549] Adds proper exception handling to YarnJobManager Adds proper exception handling to ```YarnJobManager``` by catching the thrown exceptions and sending a ```StopYarnSession(FinalApplicationStatus.FAILED)``` message to itself. This message will shutdown the JobManager and the corresponding ```ActorSystem```. @rmetzger could you take a look at the changes? You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink yarnExceptions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #397 commit 7fa65233cd068779ffd8518d40d9547857042d34 Author: Till Rohrmann trohrm...@apache.org Date: 2015-02-14T22:36:45Z [FLINK-1549] [yarn] Adds proper exception handling to YarnJobManager --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1549) Add proper exception handling for YarnJobManager
[ https://issues.apache.org/jira/browse/FLINK-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321741#comment-14321741 ] ASF GitHub Bot commented on FLINK-1549: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/397 [FLINK-1549] Adds proper exception handling to YarnJobManager Adds proper exception handling to ```YarnJobManager``` by catching the thrown exceptions and sending a ```StopYarnSession(FinalApplicationStatus.FAILED)``` message to itself. This message will shutdown the JobManager and the corresponding ```ActorSystem```. @rmetzger could you take a look at the changes? You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink yarnExceptions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #397 commit 7fa65233cd068779ffd8518d40d9547857042d34 Author: Till Rohrmann trohrm...@apache.org Date: 2015-02-14T22:36:45Z [FLINK-1549] [yarn] Adds proper exception handling to YarnJobManager Add proper exception handling for YarnJobManager Key: FLINK-1549 URL: https://issues.apache.org/jira/browse/FLINK-1549 Project: Flink Issue Type: Bug Reporter: Till Rohrmann The YarnJobManager actor thread calls methods which can throw an exception. These exceptions should be caught and properly handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)