Hi Dweep, I would like to add to Pierre Villard's insightful answer. 2) NiFi having at least 3 filesystem repositories, multiple write and read occur on same record on different stages of a single pipeline. This demands for high IOPS. Vertical scaling of IOPS is very costly/leads to roadblock sometimes which can be handled better in clustered mode by load balancing of flowfiles.
Regards, Purushotham Pushpavanth On Mon, 5 Aug 2019 at 15:37, Pierre Villard <[email protected]> wrote: > Hi Dweep, > > I'll let other chime in, but here are some answers to your questions: > > 1) Yes - NiFi supports a very fine-grained authorizations model and > authentication mechanisms. > Authentication: > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication > Authorization: > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization > > You can also find resources on the Internet on how to setup authentication > & authorization. > > 2) I'd say that it is up to your requirements and if you need high > availability. From a pure performance standpoint, vertical scaling is > probably enough for your use case unless you have very huge amounts of > data. Clustering will help you achieve even better performance (millions of > events per second), and will improve reliability in case of failure. > > 3) Yes the data is persisted. There are some parameters that you can tune > based on your tolerance against data loss. > Example: nifi.flowfile.repository.always.sync - If set to true, any change > to the repository will be synchronized to the disk, meaning that NiFi will > ask the operating system not to cache the information. This is very > expensive and can significantly reduce NiFi performance. However, if it is > false, there could be the potential for data loss if either there is a > sudden power loss or the operating system crashes. The default value is > false. > > In other words, unless you have serious hardware/OS failures, you should > not lose any data. And everything will be persisted/restart upon NiFi > restart. In case data loss is a critical part of your system, using a > broker like Kafka with the ability to replay events could be a possible > solution. > > 4) I recommend this awesome post by Bryan: > https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka > > 5) There are some options available for the metrics. You can have a look > at reporting tasks for this purpose. A set or articles you can read is > available here: > https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/ > > Hope this helps! > Pierre > > > > > > Le lun. 5 août 2019 à 07:11, Dweep Sharma <[email protected]> a > écrit : > >> Hi All, >> >> I have been using Nifi to setup some pipelines now. Before I can absorb >> more use cases into this, I need to understand a few capabilities >> >> 1) Can we setup an user authentication before the web application. If >> yes, is there a way we can have role based access for processor groups. I >> would like certain teams working on only specific groups and not control >> all. >> >> 2) If the major use case would only involve reading from RMQ, KAFKA >> convert to parquet and store in S3, does it make sense to setup a cluster >> or just vertical scaling is good ? >> >> 3) Are the flow files in the queues (connections between processors) >> persisted?. Any machine failure or restart would cause a loss of data ? For >> instance messages are dequeued form RMQ and lost due to failure. Which >> would be a best way to handle this ? I think maintaining a low back >> pressure (threshold) can help mitigate the loss >> >> 4) Does the Kafka consumer, by default consume all partitions or is there >> a way to control that. >> >> 5) Can we have some of the metrics of processors pushed out as >> notifications or alerts (flow file count in / out or errors etc) >> >> It would be great, if someone could share resources that address these. >> >> Thanks in advance. >> >> -Dweep >> >> >> >> >> *::DISCLAIMER::----------------------------------------------------------------------------------------------------------------------------------------------------The >> contents of this e-mail and any attachments are confidential and intended >> for the named recipient(s) only.E-mail transmission is not guaranteed to be >> secure or error-free as information could be intercepted, corrupted,lost, >> destroyed, arrive late or incomplete, or may contain viruses in >> transmission. The e mail and its contents(with or without referred errors) >> shall therefore not attach any liability on the originator or redBus.com. >> Views or opinions, if any, presented in this email are solely those of the >> author and may not necessarily reflect the views or opinions of redBus.com. >> Any form of reproduction, dissemination, copying, disclosure, >> modification,distribution and / or publication of this message without the >> prior written consent of authorized representative of redbus. >> <http://redbus.in/>com is strictly prohibited. If you have received this >> email in error please delete it and notify the sender immediately.Before >> opening any email and/or attachments, please check them for viruses and >> other defects.* > >
