How would nifi look or have to look to support batch cases I wonder

On January 22, 2019 at 10:24:10, Boris Tyukin ([email protected]) wrote:

We've looked at both...Airflow might be a way better tool for
coordination/scheduling. Why do not you take one of your pipelines and try
to implement it in both tools?

We really liked Airflow but unfortunately, Airflow was not a good fit for
real-time processes - that's why we decided to go with NiFi. But if you use
it strictly for job coordination and typical ETL-like dependencies, you
will have hard time. Things, which are easy and obvious with Airflow or ETL
tools like Informatica or SSIS, are quite difficult with NiFi. Just check
some examples on Wait/Notify or merge patterns and you will see why.

IMHO since NiFi was designed from the ground up to support real-time use
cases not batch cases, the design and approach are quite different from
batch oriented tools like Airflow.

Boris

On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <[email protected]>
wrote:

> Hello,
>
> I am looking into the possibility of using NiFi as a Data Pipeline
> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
> Airflow and AWS Step Functions/Lambdas.
>
>
>
> Has anyone used NiFi as an orchestration/scheduling tool for tasks such as
> submitting spark jobs to an EMR cluster? These are some of the requirements
> we are considering while evaluating such a tool:
>
>
>
>    1. SSH capabilities to execute remote commands
>    2. Rich scheduling (CRON)
>    3. Ability to write custom routines and import custom libraries
>    4. Event-based triggering of a pipeline
>
>
>
> Any insight would be helpful. We have used NiFi for about a year now for
> data movement and are familiar with its capabilities. My biggest worry is
> the ability to coordinate with other machines using SSH.
>
>
>
> Thanks,
>
> Jon
>

Reply via email to