Joe, Your point about not doing a direct comparison is fair. Thanks for your detailed response; this helps.
As far as ideas, for now, only two things come to mind which are, (as raised in another thread), a cohesive command-line interface (useful for DevOps, but also for developers, testing etc.); the second thing I'd say the ability to tone down, if not turn off, the data provenance. I've developed custom provenance frameworks for indexing data into Solr for example; such specific plugins may also be useful. I'll be looking at the custom processor authoring next, I may have some ideas after that. The roadmap list you mentioned looks great. - Dmitry On Sun, Mar 20, 2016 at 3:07 PM, Joe Witt <[email protected]> wrote: > Dmitry, > > While it is not uncommon to see such comparisons made by various > apache projects against other tools (commercial or otherwise) there > have been apache threads recently suggesting this can be poor form. > So, I'd prefer to see this thread stay focused how we as a community > see NiFi rather than attempting to draw too much comparison to other > systems specifically. > > I'll kick this off with a few key areas: > 1) Interactive Command and Control > > With NiFi the API to interact with a running instance of NiFi is an > HTTP (REST style) API for all things. Not simply gathering stats but > actually understanding and interacting with the flow. You can use > this to change behavior of the flow, add components, remove > components, start and stop elements, fork data feeds for exploratory > activities and try new techniques. This is really powerful. We > provide a nice user interface on as a common way for users to interact > with the system but systems are also able to directly interact through > the API. > > 2) Data Provenance > > In NiFi it is recording facts about data as it enters NiFI such as > timing, protocol, etc.. and all the things it does to data including > manipulations such as enrichment, conversion, filtering, aggregation. > It also tracks where it gets sent and when it is dropped. This is > extremely useful for dataflow management such as understanding how the > flow works and how it has evolved. Really helps troubleshooting cases > and largely eliminates the "Hey I never got the data" nonsense that > can happen across large organizations. It in many ways eliminates the > use of traditional logs in similar systems because it provides a nice > tracked way to follow the flow rather than making you figure out or > guess how various log events relate. Further, in NiFi we link to the > content underlying each event and while they content is still > accessible you can view it or replay it for some really cool use cases > and frankly to dramatically reduce the time to go-live on a new flow > or new idea. > > 3) Security > > Since we have interactive command and control we have control plane > security in addition to the typical data plane security. On the > control plane we're talking about how users or systems interact with > NiFi. They can use things like Kerberos, LDAP, SSL to pass > credentials. Further we have pluggable authorization which allows you > to integrate NiFi nicely into whatever system you have for > entitlements of a given authenticated user/system. On the data plane > side we support encryption/decryption of content in the flow which > greatly aids a number of security/privacy related cases and you can of > course also do things like two-way SSL based transport. Further, > regarding the transport mechanisms a single NiFi instance can act as a > number of different entities at once when talking to external > endpoints. This is critical when bridging between network domains or > just interacting with different services that have different trust > models and certificate regimes for instance. > > I have avoided talking about extensibility here and our impressive > number of connectors. Everyone claims they are extensible and has a > large number of connectors. The reality is that no system in this > space is ever done regarding connectors. You will always end up > needing to make something to operate on some special protocol or > format. So the real question is what is the experience of making > extensions? How long does it take? IF that framework has a UI how > hard it is to take advantage of it? What about documentation and the > effort required to make those available to the user? How much effort > does it take to get that extension to perform well and be resilient to > error conditions? For the occasional quick and dirty cases are you > given nice options? The script processors we have that support > Groovy, Lua, Javascript, Jython are good examples. From idea to > production impact for those can be minutes. > > Now, we certainly have some cool features. But we're not done. Not > even close. We absolutely need to make entire life cycle of managing > extensions far better. We should have a registry for these and that > should make finding components, upgrading, running multiple versions > at the same time, and more easy. We also need to make it easier for > numerous organizations/teams to operate on the same flow at once even > when they're not coordinating among each other. Some great work on > the near and longer term roadmap to go after this. We have quite a > few feature proposals/ideas documented on the NiFI wiki [1]. Would > love to hear your input on those. > > [1] > https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals > > Thanks > Joe > > On Sun, Mar 20, 2016 at 2:38 PM, Dmitry Goldenberg > <[email protected]> wrote: > > Could someone provide a short list of the most essential features and > > benefits of using NiFi vs. other existing ETL / data wrangling tools > such as > > for example Pentaho? > > > > Thanks. > > - Dmitry >
