Re: NiFi-light for analysts

Mike Thomsen Mon, 29 Jun 2020 09:57:56 -0700

As far as I can tell, Kylo is dead based on their public github activity.

Mark,


Would it make sense for us to start modularizing nifi-assembly with more
profiles? That way people like Boris could run something like this:

mvn install -Pinclude-grpc,include-graph,!include-kafka,!include-mongodb

On Mon, Jun 29, 2020 at 11:20 AM Boris Tyukin <[email protected]> wrote:

> Hi Mark, thanks for the great comments and for working on these
> improvements. these are great enhancements that we
> can certainly benefit from - I am thinking of two projects at least we
> support today.
>
> As far as making it more user-friendly, at some point I looked at Kylo.io
> and it was quite an interesting project - not sure if it is alive still -
> but I liked how they created their own UI/tooling around NiFi.
>
> I am going to toy with this idea to have a "dumb down" version of NiFi.
>
> On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[email protected]> wrote:
>
>> Hey Boris,
>>
>> There’s a good bit to unpack here but I’ll try to answer each question.
>>
>> 1) I would say that the target audience for NiFi really is a person with
>> a pretty technical role. Not developers, necessarily, though. We do see a
>> lot of developers using it, as well as data scientists, data engineers, sys
>> admins, etc. So while there may be quite a few tasks that a non-technical
>> person can achieve, it may be hard to expose the platform to someone
>> without a technical background.
>>
>> That said, I do believe that you’re right about the notion of flow
>> dependencies. I’ve done some work recently to help improve this. For
>> example, NIFI-7476 [1] makes it possible to configure a Process Group in
>> such a way that only a single FlowFile at a time is allowed into the group.
>> And the data is optionally held within the group until that FlowFile has
>> completed processing, even if it’s split up into many parts. Additionally,
>> NIFI-7509 [2] updates the List* processors so that they can use an optional
>> Record Writer. This makes it possible to get a full listing of a directory
>> from ListFile as a single FlowFile. Or a listing of all items in an S3
>> bucket or an Azure Blob Store, etc. So when that is combined with
>> NIFI-7476, it makes it very easy to process an entire directory of files or
>> an entire bucket, etc. and wait until all processing is complete before
>> data is transferred on to the next task. (Additionally, NIFI-7552 updates
>> this to add attributes indicating FlowFile counts for each Output Port so
>> it’s easy to determine if there were any “processing failures” etc.).
>>
>> So with all of the above said, I don’t think that it necessarily solves
>> in a simple and generic sense the requirement to complete Task A, then Task
>> B, and then Task C. But it does put us far closer. This may be achievable
>> still with some nesting of Process Groups, etc. but it won’t be completely
>> as straight-forward as I’d like and would perhaps add significantly latency
>> if it’s allowing only a single FlowFile at a time though the Process Group.
>> Perhaps that can be addressed in the future by having the ability to bulk
>> transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch
>> Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile
>> at a Time.” I do think there will be some future improvements along these
>> lines, though.
>>
>> 2) This should be fairly straight-forward. It would basically be just
>> creating an assembly like the nifi-assembly module but one that doesn’t
>> include all of the nar’s.
>>
>> 3) This probably boils down to some trade-offs and what makes most sense
>> for your organization. A single, large NiFi deployment makes it much easier
>> for the sys admins, generally. The NiFi policies should provide the needed
>> multi-tenancy in terms of authorization. But it doesn’t really offer much
>> in terms of resource isolation. So, if resource isolation is important to
>> you, then using separate NiFi deployments is likely desirable.
>>
>> Hope this helps!
>> -Mark
>>
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-7476
>> [2] https://issues.apache.org/jira/browse/NIFI-7509
>> [3] https://issues.apache.org/jira/browse/NIFI-7552
>>
>>
>>
>> On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[email protected]> wrote:
>>
>> Hi guys,
>>
>> I am thinking to increase the footprint of NiFi in my org to extend it to
>> less technical roles. I have a few questions:
>>
>> 1) is there any plans to support easy dependencies at some point? We are
>> aware of all the current options (wait-notify, kafka,
>> mergerecord/mergecontent etc.) and all of them are still hard and not
>> reliable. For non-technical roles, we really need very stupid simple way to
>> define classical dependencies like run task C only after task A and B are
>> finished. I realize it is a challenge because of the whole concept of NiFi
>> with flowfiles (which we do love being on a technical side of the house),
>> but I really do not want to get another ETL/scheduling tool.
>>
>> 2) is it fairly easy to build and support our custom version of
>> NiFi-light, when we remove all the processors that we do not want to expose
>> to non-technical people? The idea is to remove all the processors that
>> consume cpu/ram to force them benefit from our Big Data systems and not use
>> NiFi to do the actual processing. We would like to leave these capabilities
>> to our data engineering team while shift our analysts to ELT/ELTL paradigm
>> to let them run SQL and benefit from Big Data engines.
>>
>> 3) what would be recommended set up for multiple decentralized teams?
>> separate NiFi instances when they can support their own jobs while our
>> admin supports all these instances? or one large NiFi cluster when everyone
>> works on the same NiFi cluster? We do not want them to step on each other
>> jobs, see each other failure alerts/bulletins etc. We want to make it look
>> like their team's own environment. Not sure if NiFi policies are mature
>> enough to provide this sort of isolation.
>>
>> Thanks,
>> Boris
>>
>>
>>

Re: NiFi-light for analysts

Reply via email to