Thank you all for the hints. However looking at the REST API[1] of AirFlow 2.0 I can't find how to setup my DAG (if this is the right concept). Do I need to first create a Connection? A DAG? a TaskInstance? How do I specify the 2 BashOperator? I was thinking to connect to AirFlow via Java so I can't use the Python API..
[1] https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview On Tue, Feb 2, 2021 at 10:53 AM Arvid Heise <ar...@apache.org> wrote: > Hi Flavio, > > If you know a bit of Python, it's also trivial to add a new Flink operator > where you can use REST API. > > In general, I'd consider Airflow to be the best choice for your problem, > especially if it gets more complicated in the future (do something else if > the first job fails). > > If you have specific questions, feel free to ask. > > Best, > > Arvid > > On Tue, Feb 2, 2021 at 10:08 AM 姜鑫 <jiangxin...@gmail.com> wrote: > >> Hi Flavio, >> >> I probably understand what you need. Apache AirFlow is a scheduling >> framework which you can define your own dependent operators, therefore you >> can define a BashOperator to submit flink job to you local flink cluster. >> For example: >> ``` >> t1 = BashOperator( >> task_id=‘flink-wordcount', >> bash_command=‘./bin/flink run >> flink/build-target/examples/batch/WordCount.jar', >> ... >> ) >> ``` >> Alse Airflow supports submitting jobs to kubernetes and you can even >> implement your own operator if bash command doesn’t meet your demands. >> >> Indeed Flink AI (flink-ai-extended >> <https://github.com/alibaba/flink-ai-extended> ?) needs an enhanced >> version of AirFlow, but it is mainly for streaming scenario which means the >> job won’t stop. In your case which are all batch jobs it doesn’t help much. >> Hope this helps. >> >> Regard, >> Xin >> >> >> 2021年2月2日 下午4:30,Flavio Pompermaier <pomperma...@okkam.it> 写道: >> >> Hi Xin, >> let me state first that I never used AirFlow so I can probably miss some >> background here. >> I just want to externalize the job scheduling to some consolidated >> framework and from what I see Apache AirFlow is probably what I need. >> However I can't find any good blog post or documentation about how to >> integrate these 2 technologies using REST API of both services. >> I saw that Flink AI decided to use a customized/enhanced version of >> AirFlow [1] but I didn't look into the code to understand how they use it. >> In my use case I just want to schedule 2 Flink batch jobs using the REST >> API of AirFlow, where the second one is fired after the first. >> >> [1] >> https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow >> >> Best, >> Flavio >> >> On Tue, Feb 2, 2021 at 2:43 AM 姜鑫 <jiangxin...@gmail.com> wrote: >> >>> Hi Flavio, >>> >>> Could you explain what your direct question is? In my opinion, it is >>> possible to define two airflow operators to submit dependent flink job, as >>> long as the first one can reach the end. >>> >>> Regards, >>> Xin >>> >>> 2021年2月1日 下午6:43,Flavio Pompermaier <pomperma...@okkam.it> 写道: >>> >>> Any advice here? >>> >>> On Wed, Jan 27, 2021 at 9:49 PM Flavio Pompermaier <pomperma...@okkam.it> >>> wrote: >>> >>>> Hello everybody, >>>> is there any suggested way/pointer to schedule Flink jobs using Apache >>>> AirFlow? >>>> What I'd like to achieve is the submission (using the REST API of >>>> AirFlow) of 2 jobs, where the second one can be executed only if the first >>>> one succeed. >>>> >>>> Thanks in advance >>>> Flavio >>>> >>> >>