Hey Boris,

There’s a good bit to unpack here but I’ll try to answer each question.

1) I would say that the target audience for NiFi really is a person with a 
pretty technical role. Not developers, necessarily, though. We do see a lot of 
developers using it, as well as data scientists, data engineers, sys admins, 
etc. So while there may be quite a few tasks that a non-technical person can 
achieve, it may be hard to expose the platform to someone without a technical 
background.

That said, I do believe that you’re right about the notion of flow 
dependencies. I’ve done some work recently to help improve this. For example, 
NIFI-7476 [1] makes it possible to configure a Process Group in such a way that 
only a single FlowFile at a time is allowed into the group. And the data is 
optionally held within the group until that FlowFile has completed processing, 
even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the 
List* processors so that they can use an optional Record Writer. This makes it 
possible to get a full listing of a directory from ListFile as a single 
FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, 
etc. So when that is combined with NIFI-7476, it makes it very easy to process 
an entire directory of files or an entire bucket, etc. and wait until all 
processing is complete before data is transferred on to the next task. 
(Additionally, NIFI-7552 updates this to add attributes indicating FlowFile 
counts for each Output Port so it’s easy to determine if there were any 
“processing failures” etc.).

So with all of the above said, I don’t think that it necessarily solves in a 
simple and generic sense the requirement to complete Task A, then Task B, and 
then Task C. But it does put us far closer. This may be achievable still with 
some nesting of Process Groups, etc. but it won’t be completely as 
straight-forward as I’d like and would perhaps add significantly latency if 
it’s allowing only a single FlowFile at a time though the Process Group. 
Perhaps that can be addressed in the future by having the ability to bulk 
transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch 
Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a 
Time.” I do think there will be some future improvements along these lines, 
though.

2) This should be fairly straight-forward. It would basically be just creating 
an assembly like the nifi-assembly module but one that doesn’t include all of 
the nar’s.

3) This probably boils down to some trade-offs and what makes most sense for 
your organization. A single, large NiFi deployment makes it much easier for the 
sys admins, generally. The NiFi policies should provide the needed 
multi-tenancy in terms of authorization. But it doesn’t really offer much in 
terms of resource isolation. So, if resource isolation is important to you, 
then using separate NiFi deployments is likely desirable.

Hope this helps!
-Mark


[1] https://issues.apache.org/jira/browse/NIFI-7476
[2] https://issues.apache.org/jira/browse/NIFI-7509
[3] https://issues.apache.org/jira/browse/NIFI-7552



On Jun 28, 2020, at 1:04 PM, Boris Tyukin 
<[email protected]<mailto:[email protected]>> wrote:

Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less 
technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware 
of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) 
and all of them are still hard and not reliable. For non-technical roles, we 
really need very stupid simple way to define classical dependencies like run 
task C only after task A and B are finished. I realize it is a challenge 
because of the whole concept of NiFi with flowfiles (which we do love being on 
a technical side of the house), but I really do not want to get another 
ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, 
when we remove all the processors that we do not want to expose to 
non-technical people? The idea is to remove all the processors that consume 
cpu/ram to force them benefit from our Big Data systems and not use NiFi to do 
the actual processing. We would like to leave these capabilities to our data 
engineering team while shift our analysts to ELT/ELTL paradigm to let them run 
SQL and benefit from Big Data engines.

3) what would be recommended set up for multiple decentralized teams? separate 
NiFi instances when they can support their own jobs while our admin supports 
all these instances? or one large NiFi cluster when everyone works on the same 
NiFi cluster? We do not want them to step on each other jobs, see each other 
failure alerts/bulletins etc. We want to make it look like their team's own 
environment. Not sure if NiFi policies are mature enough to provide this sort 
of isolation.

Thanks,
Boris

Reply via email to