Hey Boris, There’s a good bit to unpack here but I’ll try to answer each question.
1) I would say that the target audience for NiFi really is a person with a pretty technical role. Not developers, necessarily, though. We do see a lot of developers using it, as well as data scientists, data engineers, sys admins, etc. So while there may be quite a few tasks that a non-technical person can achieve, it may be hard to expose the platform to someone without a technical background. That said, I do believe that you’re right about the notion of flow dependencies. I’ve done some work recently to help improve this. For example, NIFI-7476 [1] makes it possible to configure a Process Group in such a way that only a single FlowFile at a time is allowed into the group. And the data is optionally held within the group until that FlowFile has completed processing, even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the List* processors so that they can use an optional Record Writer. This makes it possible to get a full listing of a directory from ListFile as a single FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, etc. So when that is combined with NIFI-7476, it makes it very easy to process an entire directory of files or an entire bucket, etc. and wait until all processing is complete before data is transferred on to the next task. (Additionally, NIFI-7552 updates this to add attributes indicating FlowFile counts for each Output Port so it’s easy to determine if there were any “processing failures” etc.). So with all of the above said, I don’t think that it necessarily solves in a simple and generic sense the requirement to complete Task A, then Task B, and then Task C. But it does put us far closer. This may be achievable still with some nesting of Process Groups, etc. but it won’t be completely as straight-forward as I’d like and would perhaps add significantly latency if it’s allowing only a single FlowFile at a time though the Process Group. Perhaps that can be addressed in the future by having the ability to bulk transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a Time.” I do think there will be some future improvements along these lines, though. 2) This should be fairly straight-forward. It would basically be just creating an assembly like the nifi-assembly module but one that doesn’t include all of the nar’s. 3) This probably boils down to some trade-offs and what makes most sense for your organization. A single, large NiFi deployment makes it much easier for the sys admins, generally. The NiFi policies should provide the needed multi-tenancy in terms of authorization. But it doesn’t really offer much in terms of resource isolation. So, if resource isolation is important to you, then using separate NiFi deployments is likely desirable. Hope this helps! -Mark [1] https://issues.apache.org/jira/browse/NIFI-7476 [2] https://issues.apache.org/jira/browse/NIFI-7509 [3] https://issues.apache.org/jira/browse/NIFI-7552 On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[email protected]<mailto:[email protected]>> wrote: Hi guys, I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions: 1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool. 2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation. Thanks, Boris
