Joe,

Your point about not doing a direct comparison is fair. Thanks for your
detailed response; this helps.

As far as ideas, for now, only two things come to mind which are, (as
raised in another thread), a cohesive command-line interface (useful for
DevOps, but also for developers, testing etc.);  the second thing I'd say
the ability to tone down, if not turn off, the data provenance.  I've
developed custom provenance frameworks for indexing data into Solr for
example; such specific plugins may also be useful.

I'll be looking at the custom processor authoring next, I may have some
ideas after that. The roadmap list you mentioned looks great.

- Dmitry

On Sun, Mar 20, 2016 at 3:07 PM, Joe Witt <[email protected]> wrote:

> Dmitry,
>
> While it is not uncommon to see such comparisons made by various
> apache projects against other tools (commercial or otherwise) there
> have been apache threads recently suggesting this can be poor form.
> So, I'd prefer to see this thread stay focused how we as a community
> see NiFi rather than attempting to draw too much comparison to other
> systems specifically.
>
> I'll kick this off with a few key areas:
> 1) Interactive Command and Control
>
> With NiFi the API to interact with a running instance of NiFi is an
> HTTP (REST style) API for all things.  Not simply gathering stats but
> actually understanding and interacting with the flow.  You can use
> this to change behavior of the flow, add components, remove
> components, start and stop elements, fork data feeds for exploratory
> activities and try new techniques.  This is really powerful.  We
> provide a nice user interface on as a common way for users to interact
> with the system but systems are also able to directly interact through
> the API.
>
> 2) Data Provenance
>
> In NiFi it is recording facts about data as it enters NiFI such as
> timing, protocol, etc.. and all the things it does to data including
> manipulations such as enrichment, conversion, filtering, aggregation.
> It also tracks where it gets sent and when it is dropped.  This is
> extremely useful for dataflow management such as understanding how the
> flow works and how it has evolved.  Really helps troubleshooting cases
> and largely eliminates the "Hey I never got the data" nonsense that
> can happen across large organizations.  It in many ways eliminates the
> use of traditional logs in similar systems because it provides a nice
> tracked way to follow the flow rather than making you figure out or
> guess how various log events relate.  Further, in NiFi we link to the
> content underlying each event and while they content is still
> accessible you can view it or replay it for some really cool use cases
> and frankly to dramatically reduce the time to go-live on a new flow
> or new idea.
>
> 3) Security
>
> Since we have interactive command and control we have control plane
> security in addition to the typical data plane security.  On the
> control plane we're talking about how users or systems interact with
> NiFi.  They can use things like Kerberos, LDAP, SSL to pass
> credentials.  Further we have pluggable authorization which allows you
> to integrate NiFi nicely into whatever system you have for
> entitlements of a given authenticated user/system.  On the data plane
> side we support encryption/decryption of content in the flow which
> greatly aids a number of security/privacy related cases and you can of
> course also do things like two-way SSL based transport.  Further,
> regarding the transport mechanisms a single NiFi instance can act as a
> number of different entities at once when talking to external
> endpoints.  This is critical when bridging between network domains or
> just interacting with different services that have different trust
> models and certificate regimes for instance.
>
> I have avoided talking about extensibility here and our impressive
> number of connectors.  Everyone claims they are extensible and has a
> large number of connectors.  The reality is that no system in this
> space is ever done regarding connectors.  You will always end up
> needing to make something to operate on some special protocol or
> format.  So the real question is what is the experience of making
> extensions?  How long does it take?  IF that framework has a UI how
> hard it is to take advantage of it?  What about documentation and the
> effort required to make those available to the user? How much effort
> does it take to get that extension to perform well and be resilient to
> error conditions?  For the occasional quick and dirty cases are you
> given nice options?  The script processors we have that support
> Groovy, Lua, Javascript, Jython are good examples.  From idea to
> production impact for those can be minutes.
>
> Now, we certainly have some cool features.  But we're not done.  Not
> even close.  We absolutely need to make entire life cycle of managing
> extensions far better.  We should have a registry for these and that
> should make finding components, upgrading, running multiple versions
> at the same time, and more easy.  We also need to make it easier for
> numerous organizations/teams to operate on the same flow at once even
> when they're not coordinating among each other.  Some great work on
> the near and longer term roadmap to go after this.  We have quite a
> few feature proposals/ideas documented on the NiFI wiki [1].  Would
> love to hear your input on those.
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
>
> Thanks
> Joe
>
> On Sun, Mar 20, 2016 at 2:38 PM, Dmitry Goldenberg
> <[email protected]> wrote:
> > Could someone provide a short list of the most essential features and
> > benefits of using NiFi vs. other existing ETL / data wrangling tools
> such as
> > for example Pentaho?
> >
> > Thanks.
> > - Dmitry
>

Reply via email to