Re: NiFi vs. Pentaho ?

Joe Witt Sun, 20 Mar 2016 12:07:36 -0700

Dmitry,

While it is not uncommon to see such comparisons made by various
apache projects against other tools (commercial or otherwise) there
have been apache threads recently suggesting this can be poor form.
So, I'd prefer to see this thread stay focused how we as a community
see NiFi rather than attempting to draw too much comparison to other
systems specifically.

I'll kick this off with a few key areas:
1) Interactive Command and Control

With NiFi the API to interact with a running instance of NiFi is an
HTTP (REST style) API for all things.  Not simply gathering stats but
actually understanding and interacting with the flow.  You can use
this to change behavior of the flow, add components, remove
components, start and stop elements, fork data feeds for exploratory
activities and try new techniques.  This is really powerful.  We
provide a nice user interface on as a common way for users to interact
with the system but systems are also able to directly interact through
the API.

2) Data Provenance

In NiFi it is recording facts about data as it enters NiFI such as
timing, protocol, etc.. and all the things it does to data including
manipulations such as enrichment, conversion, filtering, aggregation.
It also tracks where it gets sent and when it is dropped.  This is
extremely useful for dataflow management such as understanding how the
flow works and how it has evolved.  Really helps troubleshooting cases
and largely eliminates the "Hey I never got the data" nonsense that
can happen across large organizations.  It in many ways eliminates the
use of traditional logs in similar systems because it provides a nice
tracked way to follow the flow rather than making you figure out or
guess how various log events relate.  Further, in NiFi we link to the
content underlying each event and while they content is still
accessible you can view it or replay it for some really cool use cases
and frankly to dramatically reduce the time to go-live on a new flow
or new idea.

3) Security

Since we have interactive command and control we have control plane
security in addition to the typical data plane security.  On the
control plane we're talking about how users or systems interact with
NiFi.  They can use things like Kerberos, LDAP, SSL to pass
credentials.  Further we have pluggable authorization which allows you
to integrate NiFi nicely into whatever system you have for
entitlements of a given authenticated user/system.  On the data plane
side we support encryption/decryption of content in the flow which
greatly aids a number of security/privacy related cases and you can of
course also do things like two-way SSL based transport.  Further,
regarding the transport mechanisms a single NiFi instance can act as a
number of different entities at once when talking to external
endpoints.  This is critical when bridging between network domains or
just interacting with different services that have different trust
models and certificate regimes for instance.

I have avoided talking about extensibility here and our impressive
number of connectors.  Everyone claims they are extensible and has a
large number of connectors.  The reality is that no system in this
space is ever done regarding connectors.  You will always end up
needing to make something to operate on some special protocol or
format.  So the real question is what is the experience of making
extensions?  How long does it take?  IF that framework has a UI how
hard it is to take advantage of it?  What about documentation and the
effort required to make those available to the user? How much effort
does it take to get that extension to perform well and be resilient to
error conditions?  For the occasional quick and dirty cases are you
given nice options?  The script processors we have that support
Groovy, Lua, Javascript, Jython are good examples.  From idea to
production impact for those can be minutes.

Now, we certainly have some cool features.  But we're not done.  Not
even close.  We absolutely need to make entire life cycle of managing
extensions far better.  We should have a registry for these and that
should make finding components, upgrading, running multiple versions
at the same time, and more easy.  We also need to make it easier for
numerous organizations/teams to operate on the same flow at once even
when they're not coordinating among each other.  Some great work on
the near and longer term roadmap to go after this.  We have quite a
few feature proposals/ideas documented on the NiFI wiki [1].  Would
love to hear your input on those.

[1] https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals

Thanks
Joe

On Sun, Mar 20, 2016 at 2:38 PM, Dmitry Goldenberg
<[email protected]> wrote:
> Could someone provide a short list of the most essential features and
> benefits of using NiFi vs. other existing ETL / data wrangling tools such as
> for example Pentaho?
>
> Thanks.
> - Dmitry

Re: NiFi vs. Pentaho ?

Reply via email to