Re: Job completion or failure callback?

2017-03-10 Thread Robert Metzger
Hi Shannon,

the web UI runs on the same JVM as the JobManager, so log outputs should go
there.

There is no way of running user code on the JobManager on job completion.
We try to not allow users to execute code on the JobManager...bringing the
JM down, will kill the entire cluster :)

What you are asking for is a valid feature. That's why there is a very old
JIRA for adding it: https://issues.apache.org/jira/browse/FLINK-2313.
I'm not aware of anybody working on the feature right now, but I know that
we at data Artisans are considering to put some effort into it for the 1.4
release (no guarantee for this of course, others are obviously free to pick
the issue up any time).

For now, you'll probably have to resort to the web REST api, or to
connecting to the JobManager actor system and subscribing to the
"JobStatusListener" or something :)

I hope that helps.

Regards,
Robert


On Thu, Mar 9, 2017 at 12:29 AM, Shannon Carey <sca...@expedia.com> wrote:

> Hi,
>
> Is there any way we can run a callback on job completion or failure
> without leaving the client running during job execution? For example, when
> we submit the job via the web UI the main() method's call to
> ExecutionEnvironment#execute() appears to by asynchronous with the job
> execution. Therefore, the execute() call returns before the job is
> completed. This is a bit confusing because the behavior is different when
> run from the IDE vs. run in a cluster, and because signature of the
> returned class JobExecutionResult implies that it can tell you how long
> execution took (it has getNetRuntime()). We would like to be able to detect
> job completion or failure so that we can monitor the success or failure of
> batch jobs, in particular, so that we can react to failures appropriately.
> It seems like the JobManager should be capable of executing callbacks like
> this. Otherwise, we'll have to create an external component that eg. polls
> the web UI/API for job status.
>
> Does the web UI run in the same JVM as the JobManager (when deployed in
> YARN)? If so, I would expect logs from the main method to appear in the
> JobManager logs. However, for some reason I can't find log messages or
> System.out  messages when they are logged in the main() method after
> execute() is called. Why is that?
> Edit: figured it out: OptimizerPlanEnvironment#execute() ends with "throw
> new ProgramAbortException()". Tricky and unexpected. That should definitely
> be mentioned in the javadocs of the execute() method! Even the
> documentation says, "The execute() method is returning a
> JobExecutionResult, this contains execution times and accumulator results."
> which isn't true, or at least isn't always true.
>
> Thanks,
> Shannon
>


Job completion or failure callback?

2017-03-08 Thread Shannon Carey
Hi,

Is there any way we can run a callback on job completion or failure without 
leaving the client running during job execution? For example, when we submit 
the job via the web UI the main() method's call to 
ExecutionEnvironment#execute() appears to by asynchronous with the job 
execution. Therefore, the execute() call returns before the job is completed. 
This is a bit confusing because the behavior is different when run from the IDE 
vs. run in a cluster, and because signature of the returned class 
JobExecutionResult implies that it can tell you how long execution took (it has 
getNetRuntime()). We would like to be able to detect job completion or failure 
so that we can monitor the success or failure of batch jobs, in particular, so 
that we can react to failures appropriately. It seems like the JobManager 
should be capable of executing callbacks like this. Otherwise, we'll have to 
create an external component that eg. polls the web UI/API for job status.

Does the web UI run in the same JVM as the JobManager (when deployed in YARN)? 
If so, I would expect logs from the main method to appear in the JobManager 
logs. However, for some reason I can't find log messages or System.out  
messages when they are logged in the main() method after execute() is called. 
Why is that?
Edit: figured it out: OptimizerPlanEnvironment#execute() ends with "throw new 
ProgramAbortException()". Tricky and unexpected. That should definitely be 
mentioned in the javadocs of the execute() method! Even the documentation says, 
"The execute() method is returning a JobExecutionResult, this contains 
execution times and accumulator results." which isn't true, or at least isn't 
always true.

Thanks,
Shannon