Hey Noel,

As Mike mentioned, performance tuning in NiFi is largely dependent on the type 
of data you are
processing and the types of Processors that make up your flow. In general, 
there are a few points
that I can offer that will help to ensure that you're operating at max 
efficiency:

- Most dataflows are IO-bound, rather than CPU bound. This means that the disks 
are often the bottleneck.
Using SSD's or multiple hard disks can make a massive performance improvement. 
NiFi does a good job
of taking advantage of large amounts of hardware to scale linearly. I.e., if 
you can process 1,000 FlowFiles
per second and your bottleneck is your disk, adding a second disk may allow you 
to process 2,000 FlowFiles.
Adding a 3rd and 4th disk may get you up to 4,000 FlowFiles per second, etc. 
You can use tools like 'top' to determine
how busy your CPU is vs. how many cores you have, and tools like 'iostat' to 
determine how heavily your disks are
utilized.

- NiFi does perform better with fewer numbers of large FlowFiles instead of 
lots of tiny FlowFiles. You said that your file sizes
range from 1 MB to 60 MB, so that's good. However, if you are using any sort of 
Split processors like SplitText, SplitJson, etc.
then that can certainly degrade performance. Instead try to use the new 
Record-oriented processors in your flow, as they are
able to operate on many records within a single FlowFile without the need to 
split the records up.

- In the Scheduling tab of many of the processors, you can configure the "Run 
Duration" using the slider on the right. Most of
the time setting this value to "25 ms" will yield the best results. For 
processors that do little computation, such as UpdateAttribute,
this can make a really big difference.

- I do see that you configured NiFi to use the VolatileProvenanceRepository. 
Are your FlowFile repository and ContentRepository
on the same disk? If possible, I'd recommend that you use 1 disk for FlowFile 
repository; 1 for the app, logs, etc.; and 1 or more
for the Content Repository. Separating these repositories onto separate disks 
can make a big difference.

You can also look at the Summary table (Click the menu in the top-right corner 
and go to Summary). Then go to the Connections
tab and sort by the "Queue / Size" column. This will help you to see where in 
your flow you have a lot of data queued up. This can
help narrow down the bottlenecks. If you see one or more processors whose 
incoming connections always have a lot of data, then
you may need to give that processor more concurrent tasks in the 
Configure/Settings dialog.

If you are able to share more information about your flow, that will help to 
further offer guidance.

Thanks
-Mark



On Aug 18, 2017, at 8:45 AM, Mike Thomsen 
<[email protected]<mailto:[email protected]>> wrote:

It's very hard to give you any advice without any knowledge of what your data 
looks like and what processors you're using.

On Thu, Aug 17, 2017 at 7:05 PM, Noel Alex Makumuli 
<[email protected]<mailto:[email protected]>> wrote:
Hello guys,

Ever since I tried Apache Nifi, I was amazed with its powerful features and
perfomance.
Every day, I would like to understand and learn more of Nifi and I try to
improve my knowledge of by reading articles and following on the forums.
The applications of nifi have simplied a lot of work as ordinary user and
developer [ I have yet to explore the advance use cases, and i will soon]

Nifi is doing a fantastic job and I am wondering if there anything which I can
do to improve its performance from my current settings.

I have thousands and thousands [ more than many 20million+] files which need
ETL and I am wondering what are the two cents of valuable advice would advice
me on how i can improve the performance [ meaning transform more files per
day/ hour]. The files size ranges from 1MB to 60MB.

My set up is very simple. Everything reside in the same location [ location].
However, i need to know if there is anything i can do to improve the
performance. I am sure there one or two twiks I can do achieve the best
performance. Please kindly do share some valuable tips for me.

<code>
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Nifi 1.3

Server RAM 128 GB
</code>
File open limits..
/etc/security/limits.conf
<code>
*         hard    nofile      500000
*         soft    nofile      500000
root      hard    nofile      500000
root      soft    nofile      500000

session required pam_limits.so
</code>
Anything in particular i am doing wrong in particular..?

Please find the attached files in mail.
--
NOEL ALEX MAKUMULI


Reply via email to