Bob,

I can relate to your background, and thought process for NiFi.  My limited
experience with it has highlighted the "simple event processing" portion of
the tool.  It can replace some of the things that SSIS, Informatica, or
AbInitio are used for, but has a much wider and more shallow focus.  The
ETL tools are focused on transformations like joins, aggregates, columns to
rows, or rows to columns, change data capture etc.  NiFi could do some of
these things, but seems to be designed more for iterative processing of
file objects.  It's a lightweight data processing tool, or a lightweight
service bus, depending on your perspective.

You could probably use the sql processor to insert records, assign new DW
keys etc, and if you already have your logic in stored procedures, you
might even be able to use NiFi as a process manager, passing the return
values as needed until you eventually use the email processor to send a
notification that the warehouse load is complete.  But I don't think that
you want to try to manage  the tasks associated with relational and
dimensional models from within NiFi itself.

Charlie


On Wed, Oct 28, 2015 at 10:29 PM, Adaryl "Bob" Wakefield, MBA <
[email protected]> wrote:

> My work is mostly old world ETL. I create data pipelines using SSIS where
> I move data usually from flat files into a data warehouse. Since I
> discovered the Hadoop ecosystem, I’ve been looking for ways to speed up
> data warehouse load even to a point where I’m loading data in real time.
> I’ve seen Storm and Spark used as a way to do that. However, I’m not a java
> developer (yet) and Storm has a pretty steep learning curve for me. When
> NiFi got announced, I looked at it and said, “hey this looks like SSIS for
> big data”. So I’ve been kind of looking at it through the lens of visual
> data flow development.
>
> While I’ve seen stacks where Storm is used to load data warehouses but
> warehouse loads aren’t that complex; using java to cleanse data kind of
> seams like over kill. 95% of my work can get done using SQL and SSIS is
> just the traffic cop that tells which stored procs to execute.
>
> I guess a better question to ask is, can NiFi be used in place of SSIS
> (Data Stage, Informatica, etc)? So instead of:
> SSIS –> Warehouse (batch processing paradigm)
> you have
> Kafka –> Nifi –> Warehouse (real time processing)
>
> Am I even thinking about this correctly? I know we’re talking about moving
> data between systems but frequently the move I deal with is on the same box
> and there is some other piece of software that drops the files to be
> processed into a folder.
>
> B.
>
> *From:* Joe Percivall <[email protected]>
> *Sent:* Wednesday, October 28, 2015 7:43 PM
> *To:* [email protected]
> *Subject:* Re: NiFi for CEP
>
> Hey Bob,
>
> It really depends on your definition of CEP (complex event processing). If
> what you're trying to do is advanced processing on a single piece of data
> (anything from small sensor data to huge medical data)  NiFi could
> potentially be a great candidate to replace Storm or Spark Streaming. If
> you're trying to do advanced analytics on many pieces of data or create a
> stateful response to data then Storm or Spark Streaming would handle that
> better (but NiFi would do a great job getting the data from the edge to the
> other tech!).
>
> There are others that can go into more depth but that's it in a nutshell,
> Joe
> - - - - - -
> *Joseph Percivall*
> linkedin.com/in/Percivall
> e: [email protected]
>
>
>
>
> On Wednesday, October 28, 2015 8:02 PM, "Adaryl "Bob" Wakefield, MBA" <
> [email protected]> wrote:
>
>
> I've been studying NiFi and there is something I'm not quite
> understanding.
> Can NiFi be used in place of Storm or Spark Streaming to process streaming
> data?
>
>
> B.
>
>
>
>

Reply via email to