Re: Run command after Batch is finished

2020-06-09 Thread Mark Davis
Hi Chesnay,

That is an interesting proposal, thank you!
I was doing something similar with the OutputFormat#close() method respecting 
the Format's parallelism. Using FinalizeOnMaster will make things easier.
But the problem is that several OutputFormats must be synchronized externally - 
every output must check whether other outputs finished already... Quite 
cumbersome.
Also there is a problem with exceptions - the OutputFormats can be never open 
and never closed.

  Mark

‐‐‐ Original Message ‐‐‐
On Monday, June 8, 2020 5:50 PM, Chesnay Schepler  wrote:

> This goes in the right direction; have a look at 
> org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat can 
> implement to run something on the Master after all subtasks have been closed.
>
> On 08/06/2020 17:25, Andrey Zagrebin wrote:
>
>> Hi Mark,
>>
>> I do not know how you output the results in your pipeline.
>> If you use DataSet#output(OutputFormat outputFormat), you could try to 
>> extend the format with a custom close method which should be called once the 
>> task of the sink batch operator is done in the task manager.
>> I also cc Aljoscha, maybe, he has more ideas.
>>
>> Best,
>> Andrey
>>
>> On Sun, Jun 7, 2020 at 1:35 PM Mark Davis  wrote:
>>
>>> Hi Jeff,
>>>
>>> Unfortunately this is not good enough for me.
>>> My clients are very volatile, they start a batch and can go away any moment 
>>> without waiting for it to finish. Think of an elastic web application or an 
>>> AWS Lambda.
>>>
>>> I hopped to find something what could be deployed to the cluster together 
>>> with the batch code. Maybe a hook to a job manager or similar. I do not 
>>> plan to run anything heavy there, just some formal cleanups.
>>> Is there something like this?
>>>
>>> Thank you!
>>>
>>>   Mark
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>> On Saturday, June 6, 2020 4:29 PM, Jeff Zhang  wrote:
>>>
 It would run in the client side where ExecutionEnvironment is created.

 Mark Davis  于2020年6月6日周六 下午8:14写道:

> Hi Jeff,
>
> Thank you very much! That is exactly what I need.
>
> Where the listener code will run in the cluster deployment(YARN, k8s)?
> Will it be sent over the network?
>
> Thank you!
>
>   Mark
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, June 5, 2020 6:13 PM, Jeff Zhang  wrote:
>
>> You can try JobListener which you can register to ExecutionEnvironment.
>>
>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java
>>
>> Mark Davis  于2020年6月6日周六 上午12:00写道:
>>
>>> Hi there,
>>>
>>> I am running a Batch job with several outputs.
>>> Is there a way to run some code(e.g. release a distributed lock) after 
>>> all outputs are finished?
>>>
>>> Currently I do this in a try-finally block around 
>>> ExecutionEnvironment.execute() call, but I have to switch to the 
>>> detached execution mode - in this mode the finally block is never run.
>>>
>>> Thank you!
>>>
>>>   Mark
>>
>> --
>> Best Regards
>>
>> Jeff Zhang

 --
 Best Regards

 Jeff Zhang

Re: Run command after Batch is finished

2020-06-08 Thread Chesnay Schepler
This goes in the right direction; have a look at 
org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat 
can implement to run something on the Master after all subtasks have 
been closed.


On 08/06/2020 17:25, Andrey Zagrebin wrote:

Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat outputFormat), you could try 
to extend the format with a custom close method which should be called 
once the task of the sink batch operator is done in the task manager.

I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis > wrote:


Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away
any moment without waiting for it to finish. Think of an elastic
web application or an AWS Lambda.

I hopped to find something what could be deployed to the cluster
together with the batch code. Maybe a hook to a job manager or
similar. I do not plan to run anything heavy there, just some
formal cleanups.
Is there something like this?

Thank you!

  Mark

‐‐‐ Original Message ‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang mailto:zjf...@gmail.com>> wrote:


It would run in the client side where ExecutionEnvironment is
created.

Mark Davis mailto:moda...@protonmail.com>> 于2020年6月6日周六 下午8:14写道:

Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster
deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐ Original Message ‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang mailto:zjf...@gmail.com>> wrote:


You can try JobListener which you can register to
ExecutionEnvironment.


https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java


Mark Davis mailto:moda...@protonmail.com>> 于2020年6月6日周六
上午12:00写道:

Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a
distributed lock) after all outputs are finished?

Currently I do this in a try-finally block around
ExecutionEnvironment.execute() call, but I have to
switch to the detached execution mode - in this mode the
finally block is never run.

Thank you!

  Mark



-- 
Best Regards


Jeff Zhang




-- 
Best Regards


Jeff Zhang






Re: Run command after Batch is finished

2020-06-08 Thread Flavio Pompermaier
I usually run some code after env.execute(), it's not elegant but it works
(only if you run the code from Cli client, not from REST one)

On Mon, Jun 8, 2020 at 5:25 PM Andrey Zagrebin  wrote:

> Hi Mark,
>
> I do not know how you output the results in your pipeline.
> If you use DataSet#output(OutputFormat outputFormat), you could try to
> extend the format with a custom close method which should be called once
> the task of the sink batch operator is done in the task manager.
> I also cc Aljoscha, maybe, he has more ideas.
>
> Best,
> Andrey
>
> On Sun, Jun 7, 2020 at 1:35 PM Mark Davis  wrote:
>
>> Hi Jeff,
>>
>> Unfortunately this is not good enough for me.
>> My clients are very volatile, they start a batch and can go away any
>> moment without waiting for it to finish. Think of an elastic web
>> application or an AWS Lambda.
>>
>> I hopped to find something what could be deployed to the cluster together
>> with the batch code. Maybe a hook to a job manager or similar. I do not
>> plan to run anything heavy there, just some formal cleanups.
>> Is there something like this?
>>
>> Thank you!
>>
>>   Mark
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Saturday, June 6, 2020 4:29 PM, Jeff Zhang  wrote:
>>
>> It would run in the client side where ExecutionEnvironment is created.
>>
>> Mark Davis  于2020年6月6日周六 下午8:14写道:
>>
>>> Hi Jeff,
>>>
>>> Thank you very much! That is exactly what I need.
>>>
>>> Where the listener code will run in the cluster deployment(YARN, k8s)?
>>> Will it be sent over the network?
>>>
>>> Thank you!
>>>
>>>   Mark
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>> On Friday, June 5, 2020 6:13 PM, Jeff Zhang  wrote:
>>>
>>> You can try JobListener which you can register to ExecutionEnvironment.
>>>
>>>
>>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java
>>>
>>>
>>> Mark Davis  于2020年6月6日周六 上午12:00写道:
>>>
 Hi there,

 I am running a Batch job with several outputs.
 Is there a way to run some code(e.g. release a distributed lock) after
 all outputs are finished?

 Currently I do this in a try-finally block around
 ExecutionEnvironment.execute() call, but I have to switch to the detached
 execution mode - in this mode the finally block is never run.

 Thank you!

   Mark

>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>>
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>


Re: Run command after Batch is finished

2020-06-08 Thread Andrey Zagrebin
Hi Mark,

I do not know how you output the results in your pipeline.
If you use DataSet#output(OutputFormat outputFormat), you could try to
extend the format with a custom close method which should be called once
the task of the sink batch operator is done in the task manager.
I also cc Aljoscha, maybe, he has more ideas.

Best,
Andrey

On Sun, Jun 7, 2020 at 1:35 PM Mark Davis  wrote:

> Hi Jeff,
>
> Unfortunately this is not good enough for me.
> My clients are very volatile, they start a batch and can go away any
> moment without waiting for it to finish. Think of an elastic web
> application or an AWS Lambda.
>
> I hopped to find something what could be deployed to the cluster together
> with the batch code. Maybe a hook to a job manager or similar. I do not
> plan to run anything heavy there, just some formal cleanups.
> Is there something like this?
>
> Thank you!
>
>   Mark
>
> ‐‐‐ Original Message ‐‐‐
> On Saturday, June 6, 2020 4:29 PM, Jeff Zhang  wrote:
>
> It would run in the client side where ExecutionEnvironment is created.
>
> Mark Davis  于2020年6月6日周六 下午8:14写道:
>
>> Hi Jeff,
>>
>> Thank you very much! That is exactly what I need.
>>
>> Where the listener code will run in the cluster deployment(YARN, k8s)?
>> Will it be sent over the network?
>>
>> Thank you!
>>
>>   Mark
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Friday, June 5, 2020 6:13 PM, Jeff Zhang  wrote:
>>
>> You can try JobListener which you can register to ExecutionEnvironment.
>>
>>
>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java
>>
>>
>> Mark Davis  于2020年6月6日周六 上午12:00写道:
>>
>>> Hi there,
>>>
>>> I am running a Batch job with several outputs.
>>> Is there a way to run some code(e.g. release a distributed lock) after
>>> all outputs are finished?
>>>
>>> Currently I do this in a try-finally block around
>>> ExecutionEnvironment.execute() call, but I have to switch to the detached
>>> execution mode - in this mode the finally block is never run.
>>>
>>> Thank you!
>>>
>>>   Mark
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>


Re: Run command after Batch is finished

2020-06-07 Thread Mark Davis
Hi Jeff,

Unfortunately this is not good enough for me.
My clients are very volatile, they start a batch and can go away any moment 
without waiting for it to finish. Think of an elastic web application or an AWS 
Lambda.

I hopped to find something what could be deployed to the cluster together with 
the batch code. Maybe a hook to a job manager or similar. I do not plan to run 
anything heavy there, just some formal cleanups.
Is there something like this?

Thank you!

  Mark

‐‐‐ Original Message ‐‐‐
On Saturday, June 6, 2020 4:29 PM, Jeff Zhang  wrote:

> It would run in the client side where ExecutionEnvironment is created.
>
> Mark Davis  于2020年6月6日周六 下午8:14写道:
>
>> Hi Jeff,
>>
>> Thank you very much! That is exactly what I need.
>>
>> Where the listener code will run in the cluster deployment(YARN, k8s)?
>> Will it be sent over the network?
>>
>> Thank you!
>>
>>   Mark
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Friday, June 5, 2020 6:13 PM, Jeff Zhang  wrote:
>>
>>> You can try JobListener which you can register to ExecutionEnvironment.
>>>
>>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java
>>>
>>> Mark Davis  于2020年6月6日周六 上午12:00写道:
>>>
 Hi there,

 I am running a Batch job with several outputs.
 Is there a way to run some code(e.g. release a distributed lock) after all 
 outputs are finished?

 Currently I do this in a try-finally block around 
 ExecutionEnvironment.execute() call, but I have to switch to the detached 
 execution mode - in this mode the finally block is never run.

 Thank you!

   Mark
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>
> --
> Best Regards
>
> Jeff Zhang

Re: Run command after Batch is finished

2020-06-06 Thread Jeff Zhang
It would run in the client side where ExecutionEnvironment is created.

Mark Davis  于2020年6月6日周六 下午8:14写道:

> Hi Jeff,
>
> Thank you very much! That is exactly what I need.
>
> Where the listener code will run in the cluster deployment(YARN, k8s)?
> Will it be sent over the network?
>
> Thank you!
>
>   Mark
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, June 5, 2020 6:13 PM, Jeff Zhang  wrote:
>
> You can try JobListener which you can register to ExecutionEnvironment.
>
>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java
>
>
> Mark Davis  于2020年6月6日周六 上午12:00写道:
>
>> Hi there,
>>
>> I am running a Batch job with several outputs.
>> Is there a way to run some code(e.g. release a distributed lock) after
>> all outputs are finished?
>>
>> Currently I do this in a try-finally block around
>> ExecutionEnvironment.execute() call, but I have to switch to the detached
>> execution mode - in this mode the finally block is never run.
>>
>> Thank you!
>>
>>   Mark
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>

-- 
Best Regards

Jeff Zhang


Re: Run command after Batch is finished

2020-06-06 Thread Mark Davis
Hi Jeff,

Thank you very much! That is exactly what I need.

Where the listener code will run in the cluster deployment(YARN, k8s)?
Will it be sent over the network?

Thank you!

  Mark

‐‐‐ Original Message ‐‐‐
On Friday, June 5, 2020 6:13 PM, Jeff Zhang  wrote:

> You can try JobListener which you can register to ExecutionEnvironment.
>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java
>
> Mark Davis  于2020年6月6日周六 上午12:00写道:
>
>> Hi there,
>>
>> I am running a Batch job with several outputs.
>> Is there a way to run some code(e.g. release a distributed lock) after all 
>> outputs are finished?
>>
>> Currently I do this in a try-finally block around 
>> ExecutionEnvironment.execute() call, but I have to switch to the detached 
>> execution mode - in this mode the finally block is never run.
>>
>> Thank you!
>>
>>   Mark
>
> --
> Best Regards
>
> Jeff Zhang

Re: Run command after Batch is finished

2020-06-05 Thread Jeff Zhang
You can try JobListener which you can register to ExecutionEnvironment.

https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java


Mark Davis  于2020年6月6日周六 上午12:00写道:

> Hi there,
>
> I am running a Batch job with several outputs.
> Is there a way to run some code(e.g. release a distributed lock) after all
> outputs are finished?
>
> Currently I do this in a try-finally block around
> ExecutionEnvironment.execute() call, but I have to switch to the detached
> execution mode - in this mode the finally block is never run.
>
> Thank you!
>
>   Mark
>


-- 
Best Regards

Jeff Zhang


Run command after Batch is finished

2020-06-05 Thread Mark Davis
Hi there,

I am running a Batch job with several outputs.
Is there a way to run some code(e.g. release a distributed lock) after all 
outputs are finished?

Currently I do this in a try-finally block around 
ExecutionEnvironment.execute() call, but I have to switch to the detached 
execution mode - in this mode the finally block is never run.

Thank you!

  Mark