As far as I can tell, Kylo is dead based on their public github activity. Mark,
Would it make sense for us to start modularizing nifi-assembly with more profiles? That way people like Boris could run something like this: mvn install -Pinclude-grpc,include-graph,!include-kafka,!include-mongodb On Mon, Jun 29, 2020 at 11:20 AM Boris Tyukin <[email protected]> wrote: > Hi Mark, thanks for the great comments and for working on these > improvements. these are great enhancements that we > can certainly benefit from - I am thinking of two projects at least we > support today. > > As far as making it more user-friendly, at some point I looked at Kylo.io > and it was quite an interesting project - not sure if it is alive still - > but I liked how they created their own UI/tooling around NiFi. > > I am going to toy with this idea to have a "dumb down" version of NiFi. > > On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[email protected]> wrote: > >> Hey Boris, >> >> There’s a good bit to unpack here but I’ll try to answer each question. >> >> 1) I would say that the target audience for NiFi really is a person with >> a pretty technical role. Not developers, necessarily, though. We do see a >> lot of developers using it, as well as data scientists, data engineers, sys >> admins, etc. So while there may be quite a few tasks that a non-technical >> person can achieve, it may be hard to expose the platform to someone >> without a technical background. >> >> That said, I do believe that you’re right about the notion of flow >> dependencies. I’ve done some work recently to help improve this. For >> example, NIFI-7476 [1] makes it possible to configure a Process Group in >> such a way that only a single FlowFile at a time is allowed into the group. >> And the data is optionally held within the group until that FlowFile has >> completed processing, even if it’s split up into many parts. Additionally, >> NIFI-7509 [2] updates the List* processors so that they can use an optional >> Record Writer. This makes it possible to get a full listing of a directory >> from ListFile as a single FlowFile. Or a listing of all items in an S3 >> bucket or an Azure Blob Store, etc. So when that is combined with >> NIFI-7476, it makes it very easy to process an entire directory of files or >> an entire bucket, etc. and wait until all processing is complete before >> data is transferred on to the next task. (Additionally, NIFI-7552 updates >> this to add attributes indicating FlowFile counts for each Output Port so >> it’s easy to determine if there were any “processing failures” etc.). >> >> So with all of the above said, I don’t think that it necessarily solves >> in a simple and generic sense the requirement to complete Task A, then Task >> B, and then Task C. But it does put us far closer. This may be achievable >> still with some nesting of Process Groups, etc. but it won’t be completely >> as straight-forward as I’d like and would perhaps add significantly latency >> if it’s allowing only a single FlowFile at a time though the Process Group. >> Perhaps that can be addressed in the future by having the ability to bulk >> transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch >> Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile >> at a Time.” I do think there will be some future improvements along these >> lines, though. >> >> 2) This should be fairly straight-forward. It would basically be just >> creating an assembly like the nifi-assembly module but one that doesn’t >> include all of the nar’s. >> >> 3) This probably boils down to some trade-offs and what makes most sense >> for your organization. A single, large NiFi deployment makes it much easier >> for the sys admins, generally. The NiFi policies should provide the needed >> multi-tenancy in terms of authorization. But it doesn’t really offer much >> in terms of resource isolation. So, if resource isolation is important to >> you, then using separate NiFi deployments is likely desirable. >> >> Hope this helps! >> -Mark >> >> >> [1] https://issues.apache.org/jira/browse/NIFI-7476 >> [2] https://issues.apache.org/jira/browse/NIFI-7509 >> [3] https://issues.apache.org/jira/browse/NIFI-7552 >> >> >> >> On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[email protected]> wrote: >> >> Hi guys, >> >> I am thinking to increase the footprint of NiFi in my org to extend it to >> less technical roles. I have a few questions: >> >> 1) is there any plans to support easy dependencies at some point? We are >> aware of all the current options (wait-notify, kafka, >> mergerecord/mergecontent etc.) and all of them are still hard and not >> reliable. For non-technical roles, we really need very stupid simple way to >> define classical dependencies like run task C only after task A and B are >> finished. I realize it is a challenge because of the whole concept of NiFi >> with flowfiles (which we do love being on a technical side of the house), >> but I really do not want to get another ETL/scheduling tool. >> >> 2) is it fairly easy to build and support our custom version of >> NiFi-light, when we remove all the processors that we do not want to expose >> to non-technical people? The idea is to remove all the processors that >> consume cpu/ram to force them benefit from our Big Data systems and not use >> NiFi to do the actual processing. We would like to leave these capabilities >> to our data engineering team while shift our analysts to ELT/ELTL paradigm >> to let them run SQL and benefit from Big Data engines. >> >> 3) what would be recommended set up for multiple decentralized teams? >> separate NiFi instances when they can support their own jobs while our >> admin supports all these instances? or one large NiFi cluster when everyone >> works on the same NiFi cluster? We do not want them to step on each other >> jobs, see each other failure alerts/bulletins etc. We want to make it look >> like their team's own environment. Not sure if NiFi policies are mature >> enough to provide this sort of isolation. >> >> Thanks, >> Boris >> >> >>
