Hi Mark, thanks for the great comments and for working on these improvements. these are great enhancements that we can certainly benefit from - I am thinking of two projects at least we support today.
As far as making it more user-friendly, at some point I looked at Kylo.io and it was quite an interesting project - not sure if it is alive still - but I liked how they created their own UI/tooling around NiFi. I am going to toy with this idea to have a "dumb down" version of NiFi. On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[email protected]> wrote: > Hey Boris, > > There’s a good bit to unpack here but I’ll try to answer each question. > > 1) I would say that the target audience for NiFi really is a person with a > pretty technical role. Not developers, necessarily, though. We do see a lot > of developers using it, as well as data scientists, data engineers, sys > admins, etc. So while there may be quite a few tasks that a non-technical > person can achieve, it may be hard to expose the platform to someone > without a technical background. > > That said, I do believe that you’re right about the notion of flow > dependencies. I’ve done some work recently to help improve this. For > example, NIFI-7476 [1] makes it possible to configure a Process Group in > such a way that only a single FlowFile at a time is allowed into the group. > And the data is optionally held within the group until that FlowFile has > completed processing, even if it’s split up into many parts. Additionally, > NIFI-7509 [2] updates the List* processors so that they can use an optional > Record Writer. This makes it possible to get a full listing of a directory > from ListFile as a single FlowFile. Or a listing of all items in an S3 > bucket or an Azure Blob Store, etc. So when that is combined with > NIFI-7476, it makes it very easy to process an entire directory of files or > an entire bucket, etc. and wait until all processing is complete before > data is transferred on to the next task. (Additionally, NIFI-7552 updates > this to add attributes indicating FlowFile counts for each Output Port so > it’s easy to determine if there were any “processing failures” etc.). > > So with all of the above said, I don’t think that it necessarily solves in > a simple and generic sense the requirement to complete Task A, then Task B, > and then Task C. But it does put us far closer. This may be achievable > still with some nesting of Process Groups, etc. but it won’t be completely > as straight-forward as I’d like and would perhaps add significantly latency > if it’s allowing only a single FlowFile at a time though the Process Group. > Perhaps that can be addressed in the future by having the ability to bulk > transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch > Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile > at a Time.” I do think there will be some future improvements along these > lines, though. > > 2) This should be fairly straight-forward. It would basically be just > creating an assembly like the nifi-assembly module but one that doesn’t > include all of the nar’s. > > 3) This probably boils down to some trade-offs and what makes most sense > for your organization. A single, large NiFi deployment makes it much easier > for the sys admins, generally. The NiFi policies should provide the needed > multi-tenancy in terms of authorization. But it doesn’t really offer much > in terms of resource isolation. So, if resource isolation is important to > you, then using separate NiFi deployments is likely desirable. > > Hope this helps! > -Mark > > > [1] https://issues.apache.org/jira/browse/NIFI-7476 > [2] https://issues.apache.org/jira/browse/NIFI-7509 > [3] https://issues.apache.org/jira/browse/NIFI-7552 > > > > On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[email protected]> wrote: > > Hi guys, > > I am thinking to increase the footprint of NiFi in my org to extend it to > less technical roles. I have a few questions: > > 1) is there any plans to support easy dependencies at some point? We are > aware of all the current options (wait-notify, kafka, > mergerecord/mergecontent etc.) and all of them are still hard and not > reliable. For non-technical roles, we really need very stupid simple way to > define classical dependencies like run task C only after task A and B are > finished. I realize it is a challenge because of the whole concept of NiFi > with flowfiles (which we do love being on a technical side of the house), > but I really do not want to get another ETL/scheduling tool. > > 2) is it fairly easy to build and support our custom version of > NiFi-light, when we remove all the processors that we do not want to expose > to non-technical people? The idea is to remove all the processors that > consume cpu/ram to force them benefit from our Big Data systems and not use > NiFi to do the actual processing. We would like to leave these capabilities > to our data engineering team while shift our analysts to ELT/ELTL paradigm > to let them run SQL and benefit from Big Data engines. > > 3) what would be recommended set up for multiple decentralized teams? > separate NiFi instances when they can support their own jobs while our > admin supports all these instances? or one large NiFi cluster when everyone > works on the same NiFi cluster? We do not want them to step on each other > jobs, see each other failure alerts/bulletins etc. We want to make it look > like their team's own environment. Not sure if NiFi policies are mature > enough to provide this sort of isolation. > > Thanks, > Boris > > >
