My work is mostly old world ETL. I create data pipelines using SSIS where I move data usually from flat files into a data warehouse. Since I discovered the Hadoop ecosystem, I’ve been looking for ways to speed up data warehouse load even to a point where I’m loading data in real time. I’ve seen Storm and Spark used as a way to do that. However, I’m not a java developer (yet) and Storm has a pretty steep learning curve for me. When NiFi got announced, I looked at it and said, “hey this looks like SSIS for big data”. So I’ve been kind of looking at it through the lens of visual data flow development.
While I’ve seen stacks where Storm is used to load data warehouses but warehouse loads aren’t that complex; using java to cleanse data kind of seams like over kill. 95% of my work can get done using SQL and SSIS is just the traffic cop that tells which stored procs to execute. I guess a better question to ask is, can NiFi be used in place of SSIS (Data Stage, Informatica, etc)? So instead of: SSIS –> Warehouse (batch processing paradigm) you have Kafka –> Nifi –> Warehouse (real time processing) Am I even thinking about this correctly? I know we’re talking about moving data between systems but frequently the move I deal with is on the same box and there is some other piece of software that drops the files to be processed into a folder. B. From: Joe Percivall Sent: Wednesday, October 28, 2015 7:43 PM To: [email protected] Subject: Re: NiFi for CEP Hey Bob, It really depends on your definition of CEP (complex event processing). If what you're trying to do is advanced processing on a single piece of data (anything from small sensor data to huge medical data) NiFi could potentially be a great candidate to replace Storm or Spark Streaming. If you're trying to do advanced analytics on many pieces of data or create a stateful response to data then Storm or Spark Streaming would handle that better (but NiFi would do a great job getting the data from the edge to the other tech!). There are others that can go into more depth but that's it in a nutshell, Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: [email protected] On Wednesday, October 28, 2015 8:02 PM, "Adaryl "Bob" Wakefield, MBA" <[email protected]> wrote: I've been studying NiFi and there is something I'm not quite understanding. Can NiFi be used in place of Storm or Spark Streaming to process streaming data? B.
