Re: NiFi-light for analysts

Boris Tyukin Mon, 29 Jun 2020 08:20:41 -0700

Hi Mark, thanks for the great comments and for working on these
improvements. these are great enhancements that we
can certainly benefit from - I am thinking of two projects at least we
support today.


As far as making it more user-friendly, at some point I looked at Kylo.io
and it was quite an interesting project - not sure if it is alive still -
but I liked how they created their own UI/tooling around NiFi.

I am going to toy with this idea to have a "dumb down" version of NiFi.

On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[email protected]> wrote:

> Hey Boris,
>
> There’s a good bit to unpack here but I’ll try to answer each question.
>
> 1) I would say that the target audience for NiFi really is a person with a
> pretty technical role. Not developers, necessarily, though. We do see a lot
> of developers using it, as well as data scientists, data engineers, sys
> admins, etc. So while there may be quite a few tasks that a non-technical
> person can achieve, it may be hard to expose the platform to someone
> without a technical background.
>
> That said, I do believe that you’re right about the notion of flow
> dependencies. I’ve done some work recently to help improve this. For
> example, NIFI-7476 [1] makes it possible to configure a Process Group in
> such a way that only a single FlowFile at a time is allowed into the group.
> And the data is optionally held within the group until that FlowFile has
> completed processing, even if it’s split up into many parts. Additionally,
> NIFI-7509 [2] updates the List* processors so that they can use an optional
> Record Writer. This makes it possible to get a full listing of a directory
> from ListFile as a single FlowFile. Or a listing of all items in an S3
> bucket or an Azure Blob Store, etc. So when that is combined with
> NIFI-7476, it makes it very easy to process an entire directory of files or
> an entire bucket, etc. and wait until all processing is complete before
> data is transferred on to the next task. (Additionally, NIFI-7552 updates
> this to add attributes indicating FlowFile counts for each Output Port so
> it’s easy to determine if there were any “processing failures” etc.).
>
> So with all of the above said, I don’t think that it necessarily solves in
> a simple and generic sense the requirement to complete Task A, then Task B,
> and then Task C. But it does put us far closer. This may be achievable
> still with some nesting of Process Groups, etc. but it won’t be completely
> as straight-forward as I’d like and would perhaps add significantly latency
> if it’s allowing only a single FlowFile at a time though the Process Group.
> Perhaps that can be addressed in the future by having the ability to bulk
> transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch
> Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile
> at a Time.” I do think there will be some future improvements along these
> lines, though.
>
> 2) This should be fairly straight-forward. It would basically be just
> creating an assembly like the nifi-assembly module but one that doesn’t
> include all of the nar’s.
>
> 3) This probably boils down to some trade-offs and what makes most sense
> for your organization. A single, large NiFi deployment makes it much easier
> for the sys admins, generally. The NiFi policies should provide the needed
> multi-tenancy in terms of authorization. But it doesn’t really offer much
> in terms of resource isolation. So, if resource isolation is important to
> you, then using separate NiFi deployments is likely desirable.
>
> Hope this helps!
> -Mark
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-7476
> [2] https://issues.apache.org/jira/browse/NIFI-7509
> [3] https://issues.apache.org/jira/browse/NIFI-7552
>
>
>
> On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[email protected]> wrote:
>
> Hi guys,
>
> I am thinking to increase the footprint of NiFi in my org to extend it to
> less technical roles. I have a few questions:
>
> 1) is there any plans to support easy dependencies at some point? We are
> aware of all the current options (wait-notify, kafka,
> mergerecord/mergecontent etc.) and all of them are still hard and not
> reliable. For non-technical roles, we really need very stupid simple way to
> define classical dependencies like run task C only after task A and B are
> finished. I realize it is a challenge because of the whole concept of NiFi
> with flowfiles (which we do love being on a technical side of the house),
> but I really do not want to get another ETL/scheduling tool.
>
> 2) is it fairly easy to build and support our custom version of
> NiFi-light, when we remove all the processors that we do not want to expose
> to non-technical people? The idea is to remove all the processors that
> consume cpu/ram to force them benefit from our Big Data systems and not use
> NiFi to do the actual processing. We would like to leave these capabilities
> to our data engineering team while shift our analysts to ELT/ELTL paradigm
> to let them run SQL and benefit from Big Data engines.
>
> 3) what would be recommended set up for multiple decentralized teams?
> separate NiFi instances when they can support their own jobs while our
> admin supports all these instances? or one large NiFi cluster when everyone
> works on the same NiFi cluster? We do not want them to step on each other
> jobs, see each other failure alerts/bulletins etc. We want to make it look
> like their team's own environment. Not sure if NiFi policies are mature
> enough to provide this sort of isolation.
>
> Thanks,
> Boris
>
>
>

Re: NiFi-light for analysts

Reply via email to