Re: Cluster execution - Jobmanager unreachable

2015-02-11 Thread Chesnay Schepler
I just tried Till's fix, rebased to the latest master and got a whole lot of these exceptions right away: java.lang.Exception: The slot in which the task was scheduled has been killed (probably loss of TaskManager). at

Re: Cluster execution - Jobmanager unreachable

2015-02-05 Thread Till Rohrmann
It looks to me that the TaskManager does not receive a ConsumerNotificationResult after having send the ScheduleOrUpdateConsumers message. This can either mean that something went wrong in ExecutionGraph.scheduleOrUpdateConsumers method or the connection was disassociated for some reasons. The

Re: Cluster execution - Jobmanager unreachable

2015-02-05 Thread Stephan Ewen
I suspect that this is one of the cases where an exception in an actor causes the actor to die (here the job manager) On Thu, Feb 5, 2015 at 10:40 AM, Till Rohrmann trohrm...@apache.org wrote: It looks to me that the TaskManager does not receive a ConsumerNotificationResult after having send

Re: Cluster execution - Jobmanager unreachable

2015-02-05 Thread Till Rohrmann
I checked and indeed the scheduleOrUpdateConsumers method can throw an IllegalStateException without properly handling such an exception on the JobManager level. It is a design decision of Scala not to complain about unhandled exceptions which are otherwise properly annotated in Java code. We

Cluster execution - Jobmanager unreachable

2015-02-04 Thread Chesnay Schepler
Hello, I'm trying to run python jobs with the latest master on a cluster and get the following exception: Error: The program execution failed: JobManager not reachable anymore. Terminate waiting for job answer. org.apache.flink.client.program.ProgramInvocationException: The program