Hey Noel, As Mike mentioned, performance tuning in NiFi is largely dependent on the type of data you are processing and the types of Processors that make up your flow. In general, there are a few points that I can offer that will help to ensure that you're operating at max efficiency:
- Most dataflows are IO-bound, rather than CPU bound. This means that the disks are often the bottleneck. Using SSD's or multiple hard disks can make a massive performance improvement. NiFi does a good job of taking advantage of large amounts of hardware to scale linearly. I.e., if you can process 1,000 FlowFiles per second and your bottleneck is your disk, adding a second disk may allow you to process 2,000 FlowFiles. Adding a 3rd and 4th disk may get you up to 4,000 FlowFiles per second, etc. You can use tools like 'top' to determine how busy your CPU is vs. how many cores you have, and tools like 'iostat' to determine how heavily your disks are utilized. - NiFi does perform better with fewer numbers of large FlowFiles instead of lots of tiny FlowFiles. You said that your file sizes range from 1 MB to 60 MB, so that's good. However, if you are using any sort of Split processors like SplitText, SplitJson, etc. then that can certainly degrade performance. Instead try to use the new Record-oriented processors in your flow, as they are able to operate on many records within a single FlowFile without the need to split the records up. - In the Scheduling tab of many of the processors, you can configure the "Run Duration" using the slider on the right. Most of the time setting this value to "25 ms" will yield the best results. For processors that do little computation, such as UpdateAttribute, this can make a really big difference. - I do see that you configured NiFi to use the VolatileProvenanceRepository. Are your FlowFile repository and ContentRepository on the same disk? If possible, I'd recommend that you use 1 disk for FlowFile repository; 1 for the app, logs, etc.; and 1 or more for the Content Repository. Separating these repositories onto separate disks can make a big difference. You can also look at the Summary table (Click the menu in the top-right corner and go to Summary). Then go to the Connections tab and sort by the "Queue / Size" column. This will help you to see where in your flow you have a lot of data queued up. This can help narrow down the bottlenecks. If you see one or more processors whose incoming connections always have a lot of data, then you may need to give that processor more concurrent tasks in the Configure/Settings dialog. If you are able to share more information about your flow, that will help to further offer guidance. Thanks -Mark On Aug 18, 2017, at 8:45 AM, Mike Thomsen <[email protected]<mailto:[email protected]>> wrote: It's very hard to give you any advice without any knowledge of what your data looks like and what processors you're using. On Thu, Aug 17, 2017 at 7:05 PM, Noel Alex Makumuli <[email protected]<mailto:[email protected]>> wrote: Hello guys, Ever since I tried Apache Nifi, I was amazed with its powerful features and perfomance. Every day, I would like to understand and learn more of Nifi and I try to improve my knowledge of by reading articles and following on the forums. The applications of nifi have simplied a lot of work as ordinary user and developer [ I have yet to explore the advance use cases, and i will soon] Nifi is doing a fantastic job and I am wondering if there anything which I can do to improve its performance from my current settings. I have thousands and thousands [ more than many 20million+] files which need ETL and I am wondering what are the two cents of valuable advice would advice me on how i can improve the performance [ meaning transform more files per day/ hour]. The files size ranges from 1MB to 60MB. My set up is very simple. Everything reside in the same location [ location]. However, i need to know if there is anything i can do to improve the performance. I am sure there one or two twiks I can do achieve the best performance. Please kindly do share some valuable tips for me. <code> openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) Nifi 1.3 Server RAM 128 GB </code> File open limits.. /etc/security/limits.conf <code> * hard nofile 500000 * soft nofile 500000 root hard nofile 500000 root soft nofile 500000 session required pam_limits.so </code> Anything in particular i am doing wrong in particular..? Please find the attached files in mail. -- NOEL ALEX MAKUMULI
