Re: NiFi cluster goes 100% CPU in no time

Mark Payne Mon, 10 Jun 2019 06:47:47 -0700

I don't know that this is actually unexpected. What you observed is that you 
had million of FlowFiles queued up to be processed. NiFi was not processing 
them with 100% CPU utilization. This typically indicates one of two things: a) 
You haven't allocated enough threads, or b) you have a bottleneck other than 
CPU - likely Disk I/O.


Once you restarted NiFi, you were in a situation where you had improved your 
disk I/O. If you were previously not at 100% CPU utilization due to a Disk I/O 
bottleneck, and you then removed that bottleneck by improving disk I/O like you 
mentioned, then it makes sense that NiFi would now start consuming more CPU - 
even up to 100% - to handle those millions of FlowFiles that are queued up.



On Jun 10, 2019, at 9:07 AM, Joe Witt 
<[email protected]<mailto:[email protected]>> wrote:

buffering flowfiles like that is supported by design and common so it would be 
ideal to figure out what happened.

On Mon, Jun 10, 2019, 9:02 AM Shanker Sneh 
<[email protected]<mailto:[email protected]>> wrote:
Flowfiles were close to ~7 million .. 8 threads (as I have 4 vCPU in 1 box). 
Max heap allocated is 12Gb. So the usage was ~60%

Joe, I think it has something to do with what Wookcock suggested. Clearing up 
content & FlowFiles seem to have CPU manageable.
Allow me 1-2 days and I shall report back if it solves the problem.

On Mon, Jun 10, 2019 at 6:23 PM Joe Witt 
<[email protected]<mailto:[email protected]>> wrote:
how many flowfiles were in queue?  how many threads for nifi to use?   how was 
heap?

On Mon, Jun 10, 2019, 8:44 AM Shanker Sneh 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Joe for reading through and helping me. :)


  *   NiFi hasn't been upgraded. its 1.8.0 (community version of Horton works 
data flow).
  *   OS/Kernel is the same. Just that I have added more capacity to disk (with 
better IO).
  *   JVM continues to be the same. Java 8.
  *   When CPU is 100%, top shoes just NiFi java process. When I provided with 
more cores (as high as 16), NiFi used all 16 nodes and throttled at 1600%.

Meanwhile, I am trying to clear up all FlowFiles from disk and start the flows 
afresh.


On Mon, Jun 10, 2019 at 5:42 PM Joe Witt 
<[email protected]<mailto:[email protected]>> wrote:
Sneh

It was stable for months but now is high...

has nifi been upgraded?  what version before vs now?

has the os/kernel been changed?

has the jvm been updated?

when cpu is 100 what does top show?

thanks

On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the suggestions Joe.
Actually the issue is persistent even after reverting to the 
'older-regular-incremental-load' of the data flow (which used to work fine 
since months on similarly-configured hardware a few days back by utilising just 
~50% of resources).

These days, one of the 2-node cluster gets out of NiFi every now and then as 
the CPU peaks 100% for that particular machine. And subsequently the other node 
reaches 100% CPU too.
When I restart NiFi on a particular node, CPU tanks to 0 and then spikes to 
100% within few minutes - the data flowing through the pipeline is just too 
less to throttle my CPU ideally.

The machine config and NiFi config remains untouched - this has left me 
confused where the problem might be. Something which had been running smoothly 
since months, has become a challenge now.

On Fri, Jun 7, 2019 at 8:16 PM Joe Witt 
<[email protected]<mailto:[email protected]>> wrote:
Shanker

It sounds like you've gone through some changes in general and have worked 
through those.  Now you have a flow running with a high volume of data (history 
load) and want to know which parts of the flow are most expensive/consuming the 
CPU.

You should be able to look at the statistics provided on the processors to see 
where the majority of CPU time is spent.  You can usually very easily reason 
over this if it is doing compression/encryption/etc.. and determine if you want 
to give it more threads/less threads/batch data together better, etc..

The configuration of the VMs, the NiFi instance itself, the flow, and the 
nature of the data are all important to see/understand to be of much help here.

THanks

On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh 
<[email protected]<mailto:[email protected]>> wrote:
Hello all,

I am facing strange issue with NiFi 1.8.0 (2 nodes)
My flows had been running fine since months.

Yesterday I had to do some history load which filled up my both disks (I have 
FlowFile repository as separate disk).

I increased the size of the root & flowflile disk both. And 'grow' the disk 
partition and 'extended' the file system (it's an EC2 linux).
But post that my CPU has been spiking to complete 100% - even at regular load 
(earlier it used to be somewhere around 50%)
Also I did no change to the config values or thread count etc.

I upgraded the 2 nodes to see if that solves the problem - from 16 Gb box (4 
core) to 64 Gb (16 core).
But even the larger box is throttling on the CPU at 100%.

I tried clearing all repositories and restarted NiFi application and the EC2 - 
but no improvement.

Kindly point me in the right direction. I am unable to pinpoint anything.

--
Best,
Sneh


--
Best,
Sneh


--
Best,
Sneh


--
Best,
Sneh

Re: NiFi cluster goes 100% CPU in no time

Reply via email to