Dmitry, While it is not uncommon to see such comparisons made by various apache projects against other tools (commercial or otherwise) there have been apache threads recently suggesting this can be poor form. So, I'd prefer to see this thread stay focused how we as a community see NiFi rather than attempting to draw too much comparison to other systems specifically.
I'll kick this off with a few key areas: 1) Interactive Command and Control With NiFi the API to interact with a running instance of NiFi is an HTTP (REST style) API for all things. Not simply gathering stats but actually understanding and interacting with the flow. You can use this to change behavior of the flow, add components, remove components, start and stop elements, fork data feeds for exploratory activities and try new techniques. This is really powerful. We provide a nice user interface on as a common way for users to interact with the system but systems are also able to directly interact through the API. 2) Data Provenance In NiFi it is recording facts about data as it enters NiFI such as timing, protocol, etc.. and all the things it does to data including manipulations such as enrichment, conversion, filtering, aggregation. It also tracks where it gets sent and when it is dropped. This is extremely useful for dataflow management such as understanding how the flow works and how it has evolved. Really helps troubleshooting cases and largely eliminates the "Hey I never got the data" nonsense that can happen across large organizations. It in many ways eliminates the use of traditional logs in similar systems because it provides a nice tracked way to follow the flow rather than making you figure out or guess how various log events relate. Further, in NiFi we link to the content underlying each event and while they content is still accessible you can view it or replay it for some really cool use cases and frankly to dramatically reduce the time to go-live on a new flow or new idea. 3) Security Since we have interactive command and control we have control plane security in addition to the typical data plane security. On the control plane we're talking about how users or systems interact with NiFi. They can use things like Kerberos, LDAP, SSL to pass credentials. Further we have pluggable authorization which allows you to integrate NiFi nicely into whatever system you have for entitlements of a given authenticated user/system. On the data plane side we support encryption/decryption of content in the flow which greatly aids a number of security/privacy related cases and you can of course also do things like two-way SSL based transport. Further, regarding the transport mechanisms a single NiFi instance can act as a number of different entities at once when talking to external endpoints. This is critical when bridging between network domains or just interacting with different services that have different trust models and certificate regimes for instance. I have avoided talking about extensibility here and our impressive number of connectors. Everyone claims they are extensible and has a large number of connectors. The reality is that no system in this space is ever done regarding connectors. You will always end up needing to make something to operate on some special protocol or format. So the real question is what is the experience of making extensions? How long does it take? IF that framework has a UI how hard it is to take advantage of it? What about documentation and the effort required to make those available to the user? How much effort does it take to get that extension to perform well and be resilient to error conditions? For the occasional quick and dirty cases are you given nice options? The script processors we have that support Groovy, Lua, Javascript, Jython are good examples. From idea to production impact for those can be minutes. Now, we certainly have some cool features. But we're not done. Not even close. We absolutely need to make entire life cycle of managing extensions far better. We should have a registry for these and that should make finding components, upgrading, running multiple versions at the same time, and more easy. We also need to make it easier for numerous organizations/teams to operate on the same flow at once even when they're not coordinating among each other. Some great work on the near and longer term roadmap to go after this. We have quite a few feature proposals/ideas documented on the NiFI wiki [1]. Would love to hear your input on those. [1] https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals Thanks Joe On Sun, Mar 20, 2016 at 2:38 PM, Dmitry Goldenberg <[email protected]> wrote: > Could someone provide a short list of the most essential features and > benefits of using NiFi vs. other existing ETL / data wrangling tools such as > for example Pentaho? > > Thanks. > - Dmitry
