Thank you all for the hints. However looking at the REST API[1] of AirFlow
2.0 I can't find how to setup my DAG (if this is the right concept).
Do I need to first create a Connection? A DAG?  a TaskInstance? How do I
specify the 2 BashOperator?
I was thinking to connect to AirFlow via Java so I can't use the Python
API..

[1]
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview

On Tue, Feb 2, 2021 at 10:53 AM Arvid Heise <ar...@apache.org> wrote:

> Hi Flavio,
>
> If you know a bit of Python, it's also trivial to add a new Flink operator
> where you can use REST API.
>
> In general, I'd consider Airflow to be the best choice for your problem,
> especially if it gets more complicated in the future (do something else if
> the first job fails).
>
> If you have specific questions, feel free to ask.
>
> Best,
>
> Arvid
>
> On Tue, Feb 2, 2021 at 10:08 AM 姜鑫 <jiangxin...@gmail.com> wrote:
>
>> Hi Flavio,
>>
>> I probably understand what you need. Apache AirFlow is a scheduling
>> framework which you can define your own dependent operators, therefore you
>> can define a BashOperator to submit flink job to you local flink cluster.
>> For example:
>> ```
>> t1 = BashOperator(
>>     task_id=‘flink-wordcount',
>>     bash_command=‘./bin/flink run
>> flink/build-target/examples/batch/WordCount.jar',
>>     ...
>> )
>> ```
>> Alse Airflow supports submitting jobs to kubernetes and you can even
>> implement your own operator if bash command doesn’t meet your demands.
>>
>> Indeed Flink AI (flink-ai-extended
>> <https://github.com/alibaba/flink-ai-extended> ?) needs an enhanced
>> version of AirFlow, but it is mainly for streaming scenario which means the
>> job won’t stop. In your case which are all batch jobs it doesn’t help much.
>> Hope this helps.
>>
>> Regard,
>> Xin
>>
>>
>> 2021年2月2日 下午4:30,Flavio Pompermaier <pomperma...@okkam.it> 写道:
>>
>> Hi Xin,
>> let me state first that I never used AirFlow so I can probably miss some
>> background here.
>> I just want to externalize the job scheduling to some consolidated
>> framework and from what I see Apache AirFlow is probably what I need.
>> However I can't find any good blog post or documentation about how to
>> integrate these 2 technologies using REST API of both services.
>> I saw that Flink AI decided to use a customized/enhanced version of
>> AirFlow [1] but I didn't look into the code to understand how they use it.
>> In my use case I just want to schedule 2 Flink batch jobs using the REST
>> API of AirFlow, where the second one is fired after the first.
>>
>> [1]
>> https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow
>>
>> Best,
>> Flavio
>>
>> On Tue, Feb 2, 2021 at 2:43 AM 姜鑫 <jiangxin...@gmail.com> wrote:
>>
>>> Hi Flavio,
>>>
>>> Could you explain what your direct question is? In my opinion, it is
>>> possible to define two airflow operators to submit dependent flink job, as
>>> long as the first one can reach the end.
>>>
>>> Regards,
>>> Xin
>>>
>>> 2021年2月1日 下午6:43,Flavio Pompermaier <pomperma...@okkam.it> 写道:
>>>
>>> Any advice here?
>>>
>>> On Wed, Jan 27, 2021 at 9:49 PM Flavio Pompermaier <pomperma...@okkam.it>
>>> wrote:
>>>
>>>> Hello everybody,
>>>> is there any suggested way/pointer to schedule Flink jobs using Apache
>>>> AirFlow?
>>>> What I'd like to achieve is the submission (using the REST API of
>>>> AirFlow) of 2 jobs, where the second one can be executed only if the first
>>>> one succeed.
>>>>
>>>> Thanks in advance
>>>> Flavio
>>>>
>>>
>>

Reply via email to