You may want to monitor https://issues.apache.org/jira/browse/NIFI-3698
On January 11, 2019 at 14:22:24, Jonathan Meran ([email protected]) wrote: Thanks Joe! We appreciate the kind words and am happy you enjoy our products! My thinking is aligned with yours for sure. A main driver for the consideration of NiFi for orchestration is that it’s a system we already have up and running and maintain. Thanks again, Jon *From: *Joe Witt <[email protected]> *Reply-To: *"[email protected]" <[email protected]> *Date: *Friday, January 11, 2019 at 12:28 PM *To: *"[email protected]" <[email protected]> *Subject: *Re: NiFI as Data Pipeline Orchestration Tool? Jon First things first - Sonos is awesome. Now back to the matter at hand... NiFi is quite often used for various forms of orchestration of other systems doing their thing. However, I'll state that isn't really its primary purpose so for pure orchestration cases it can leave you with a less than ideal user experience. NiFi is more about managing the flow of data to and from systems and doing the necessary routing/splitting/forking/joining/merging/transforming/cajoling to make that work well. We're less about telling those other systems what to do with the data or when to run. Now, having said this it is pretty common. We have the Spark Livy integration for example. I'd recommend you give tools that cater primarily to orchestration a first stab on this and if you find the problem looks more and more like I describe then NiFi is probably appropriate. Hope that helps a bit. Talking at a terminology basis is tough as things like ETL, orchestration, transformation often mean wildly different things to different people. Thanks On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <[email protected]> wrote: Hello, I am looking into the possibility of using NiFi as a Data Pipeline Orchestration Tool. I’m evaluating NiFi along with some other tools such as Airflow and AWS Step Functions/Lambdas. Has anyone used NiFi as an orchestration/scheduling tool for tasks such as submitting spark jobs to an EMR cluster? These are some of the requirements we are considering while evaluating such a tool: 1. SSH capabilities to execute remote commands 2. Rich scheduling (CRON) 3. Ability to write custom routines and import custom libraries 4. Event-based triggering of a pipeline Any insight would be helpful. We have used NiFi for about a year now for data movement and are familiar with its capabilities. My biggest worry is the ability to coordinate with other machines using SSH. Thanks, Jon
