Re: Finding slow down in processing

2024-01-10 Thread Aaron Rich
Hi Joe,

Nothing is load balanced- it's all basic queues.

Mark,
I'm using NiFi 1.19.1.

nifi.performance.tracking.percentage sounds exactly what I might need. I'll
give that a shot.

Richard,
I hadn't looked at the rotating logs and/or cleared them out. I'll give
that a shot too.

Thank you all. Please keep the suggestions coming.

-Aaron

On Wed, Jan 10, 2024 at 1:34 PM Richard Beare 
wrote:

> I had a similar sounding issue, although not in a Kube cluster. Nifi was
> running in a docker container and the issue was the log rotation
> interacting with the log file being mounted from the host. The mounted log
> file was not deleted on rotation, meaning that once rotation was triggered
> by log file size it would be continually triggered because the new log file
> was never emptied. The clue was that the content of rotated logfiles was
> mostly the same, with only a small number of messages appended to each new
> one. Rotating multi GB logs was enough to destroy performance, especially
> if it was being triggered frequently by debug messages.
>
> On Thu, Jan 11, 2024 at 7:14 AM Aaron Rich  wrote:
>
>> Hi Joe,
>>
>> It's a pretty fixed size objects at a fixed interval- One 5mb-ish file,
>> we break down to individual rows.
>>
>> I went so far as to create a "stress test" where I have a generateFlow(
>> creating a fix, 100k fille, in batches of 1000, every .1s) feeding right
>> into a putFile. I wanted to see the sustained max. It was very stable, fast
>> for over a week running - but now it's extremely slow. That was able as
>> simple of a data flow I could think of to hit all the different resources
>> (CPU, memory
>>
>> I was thinking too, maybe it was memory but it's slow right at the start
>> when starting NiFi. I would expect the memory to cause it to be slower over
>> time, and the stress test showed it wasn't something that was fluenting
>> over time.
>>
>> I'm happy to make other flows that anyone can suggest to help
>> troubleshoot, diagnose issue.
>>
>> Lars,
>>
>> We haven't changed it between when performance was good and now when it's
>> slow. That is what is throwing me - nothing changed from NiFi configuration
>> standby.
>> My guess is we are having some throttling/resource contention from our
>> provider but I can't determine what/where/how. The Grafana cluster
>> dashboards I have don't indicate issues. If there are suggestions for
>> specific cluster metrics to plot/dashboards to use, I'm happy to build them
>> and contribute them back (I do have a dashboard I need to figure out how to
>> share for creating the "status history" plots in Grafana).
>> The repos aren't full and I tried even blowing them away just to see if
>> that made a difference.
>> I'm not seeing anything new in the logs that indicate an issue...but
>> maybe I'm missing it so I will try to look again
>>
>> By chance, are there any low level debugging metrics/observability/etc
>> that would show how long things like writing to the repository disks is
>> taking? There is a part of me that feels this could be a Disk I/O resource
>> issue but I don't know how I can verify that is/isn't the issue.
>>
>> Thank you all for the help and suggestions - please keep them coming as
>> I'm grasping at straws right now.
>>
>> -Aaron
>>
>>
>> On Wed, Jan 10, 2024 at 10:10 AM Joe Witt  wrote:
>>
>>> Aaron,
>>>
>>> The usual suspects are memory consumption leading to high GC leading to
>>> lower performance over time, or back pressure in the flow, etc.. But your
>>> description does not really fit either exactly.  Does your flow see a mix
>>> of large objects and smaller objects?
>>>
>>> Thanks
>>>
>>> On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich 
>>> wrote:
>>>
 Hi all,



 I’m running into an odd issue and hoping someone can point me in the
 right direction.



 I have NiFi 1.19 deployed in a Kube cluster with all the repositories
 volume mounted out. It was processing great with processors like
 UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.



 With nothing changing in the deployment, the performance has dropped to
 UpdateAttribute doing 350/5m and Putfile to 200/5m.



 I’m trying to determine what resource is suddenly dropping our
 performance like this. I don’t see anything on the Kube monitoring that
 stands out and I have restarted, cleaned repos, changed nodes but nothing
 is helping.



 I was hoping there is something from the NiFi POV that can help
 identify the limiting resource. I'm not sure if there is additional
 diagnostic/debug/etc information available beyond the node status graphs.



 Any help would be greatly appreciated.



 Thanks.



 -Aaron

>>>


Re: Finding slow down in processing

2024-01-10 Thread Richard Beare
I had a similar sounding issue, although not in a Kube cluster. Nifi was
running in a docker container and the issue was the log rotation
interacting with the log file being mounted from the host. The mounted log
file was not deleted on rotation, meaning that once rotation was triggered
by log file size it would be continually triggered because the new log file
was never emptied. The clue was that the content of rotated logfiles was
mostly the same, with only a small number of messages appended to each new
one. Rotating multi GB logs was enough to destroy performance, especially
if it was being triggered frequently by debug messages.

On Thu, Jan 11, 2024 at 7:14 AM Aaron Rich  wrote:

> Hi Joe,
>
> It's a pretty fixed size objects at a fixed interval- One 5mb-ish file, we
> break down to individual rows.
>
> I went so far as to create a "stress test" where I have a generateFlow(
> creating a fix, 100k fille, in batches of 1000, every .1s) feeding right
> into a putFile. I wanted to see the sustained max. It was very stable, fast
> for over a week running - but now it's extremely slow. That was able as
> simple of a data flow I could think of to hit all the different resources
> (CPU, memory
>
> I was thinking too, maybe it was memory but it's slow right at the start
> when starting NiFi. I would expect the memory to cause it to be slower over
> time, and the stress test showed it wasn't something that was fluenting
> over time.
>
> I'm happy to make other flows that anyone can suggest to help
> troubleshoot, diagnose issue.
>
> Lars,
>
> We haven't changed it between when performance was good and now when it's
> slow. That is what is throwing me - nothing changed from NiFi configuration
> standby.
> My guess is we are having some throttling/resource contention from our
> provider but I can't determine what/where/how. The Grafana cluster
> dashboards I have don't indicate issues. If there are suggestions for
> specific cluster metrics to plot/dashboards to use, I'm happy to build them
> and contribute them back (I do have a dashboard I need to figure out how to
> share for creating the "status history" plots in Grafana).
> The repos aren't full and I tried even blowing them away just to see if
> that made a difference.
> I'm not seeing anything new in the logs that indicate an issue...but maybe
> I'm missing it so I will try to look again
>
> By chance, are there any low level debugging metrics/observability/etc
> that would show how long things like writing to the repository disks is
> taking? There is a part of me that feels this could be a Disk I/O resource
> issue but I don't know how I can verify that is/isn't the issue.
>
> Thank you all for the help and suggestions - please keep them coming as
> I'm grasping at straws right now.
>
> -Aaron
>
>
> On Wed, Jan 10, 2024 at 10:10 AM Joe Witt  wrote:
>
>> Aaron,
>>
>> The usual suspects are memory consumption leading to high GC leading to
>> lower performance over time, or back pressure in the flow, etc.. But your
>> description does not really fit either exactly.  Does your flow see a mix
>> of large objects and smaller objects?
>>
>> Thanks
>>
>> On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich  wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I’m running into an odd issue and hoping someone can point me in the
>>> right direction.
>>>
>>>
>>>
>>> I have NiFi 1.19 deployed in a Kube cluster with all the repositories
>>> volume mounted out. It was processing great with processors like
>>> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.
>>>
>>>
>>>
>>> With nothing changing in the deployment, the performance has dropped to
>>> UpdateAttribute doing 350/5m and Putfile to 200/5m.
>>>
>>>
>>>
>>> I’m trying to determine what resource is suddenly dropping our
>>> performance like this. I don’t see anything on the Kube monitoring that
>>> stands out and I have restarted, cleaned repos, changed nodes but nothing
>>> is helping.
>>>
>>>
>>>
>>> I was hoping there is something from the NiFi POV that can help identify
>>> the limiting resource. I'm not sure if there is additional
>>> diagnostic/debug/etc information available beyond the node status graphs.
>>>
>>>
>>>
>>> Any help would be greatly appreciated.
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> -Aaron
>>>
>>


Re: Finding slow down in processing

2024-01-10 Thread Mark Payne
Aaron,

What version of NiFi are you running? One thing that you can look into if 
you’re running a pretty recent version, (though the user friendliness is not 
great) is to update nifi.properties and set the 
“nifi.performance.tracking.percentage” property from 0 to something like 5 or 
10. Restart NiFi and let it run for a while.

Then you can run “bin/nifi.sh diagnostics diagnostics1.txt”
In that diagnostics1.txt it will give a rather detailed breakdown of where 
you’re spending your time. Each processor will show how much of your CPU it’s 
using, as well as how much CPU time it’s using. It’ll also show how much time 
was spent committing transactions, reading from disk, writing to disk, how much 
time that processor was paused for garbage collection.  Lots of really detailed 
metrics in there. That might help you to have an “aha” moment as to what 
exactly the resource is that’s being causing poor performance.

Though I will warn you that setting the value above 0, in and of itself, might 
make the system slower if you have a huge graph with many processors. But it 
should definitely help you to narrow down what the resource constraint is. You 
can then turn it back off if necessary.

Thanks
-Mark


On Jan 10, 2024, at 3:13 PM, Aaron Rich  wrote:

Hi Joe,

It's a pretty fixed size objects at a fixed interval- One 5mb-ish file, we 
break down to individual rows.

I went so far as to create a "stress test" where I have a generateFlow( 
creating a fix, 100k fille, in batches of 1000, every .1s) feeding right into a 
putFile. I wanted to see the sustained max. It was very stable, fast for over a 
week running - but now it's extremely slow. That was able as simple of a data 
flow I could think of to hit all the different resources (CPU, memory

I was thinking too, maybe it was memory but it's slow right at the start when 
starting NiFi. I would expect the memory to cause it to be slower over time, 
and the stress test showed it wasn't something that was fluenting over time.

I'm happy to make other flows that anyone can suggest to help troubleshoot, 
diagnose issue.

Lars,

We haven't changed it between when performance was good and now when it's slow. 
That is what is throwing me - nothing changed from NiFi configuration standby.
My guess is we are having some throttling/resource contention from our provider 
but I can't determine what/where/how. The Grafana cluster dashboards I have 
don't indicate issues. If there are suggestions for specific cluster metrics to 
plot/dashboards to use, I'm happy to build them and contribute them back (I do 
have a dashboard I need to figure out how to share for creating the "status 
history" plots in Grafana).
The repos aren't full and I tried even blowing them away just to see if that 
made a difference.
I'm not seeing anything new in the logs that indicate an issue...but maybe I'm 
missing it so I will try to look again

By chance, are there any low level debugging metrics/observability/etc that 
would show how long things like writing to the repository disks is taking? 
There is a part of me that feels this could be a Disk I/O resource issue but I 
don't know how I can verify that is/isn't the issue.

Thank you all for the help and suggestions - please keep them coming as I'm 
grasping at straws right now.

-Aaron


On Wed, Jan 10, 2024 at 10:10 AM Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
Aaron,

The usual suspects are memory consumption leading to high GC leading to lower 
performance over time, or back pressure in the flow, etc.. But your description 
does not really fit either exactly.  Does your flow see a mix of large objects 
and smaller objects?

Thanks

On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich 
mailto:aaron.r...@gmail.com>> wrote:
Hi all,

I’m running into an odd issue and hoping someone can point me in the right 
direction.

I have NiFi 1.19 deployed in a Kube cluster with all the repositories volume 
mounted out. It was processing great with processors like UpdateAttribute 
sending through 15K/5m PutFile sending through 3K/5m.

With nothing changing in the deployment, the performance has dropped to 
UpdateAttribute doing 350/5m and Putfile to 200/5m.

I’m trying to determine what resource is suddenly dropping our performance like 
this. I don’t see anything on the Kube monitoring that stands out and I have 
restarted, cleaned repos, changed nodes but nothing is helping.

I was hoping there is something from the NiFi POV that can help identify the 
limiting resource. I'm not sure if there is additional diagnostic/debug/etc 
information available beyond the node status graphs.

Any help would be greatly appreciated.

Thanks.

-Aaron



Re: Finding slow down in processing

2024-01-10 Thread Aaron Rich
Hi Joe,

It's a pretty fixed size objects at a fixed interval- One 5mb-ish file, we
break down to individual rows.

I went so far as to create a "stress test" where I have a generateFlow(
creating a fix, 100k fille, in batches of 1000, every .1s) feeding right
into a putFile. I wanted to see the sustained max. It was very stable, fast
for over a week running - but now it's extremely slow. That was able as
simple of a data flow I could think of to hit all the different resources
(CPU, memory

I was thinking too, maybe it was memory but it's slow right at the start
when starting NiFi. I would expect the memory to cause it to be slower over
time, and the stress test showed it wasn't something that was fluenting
over time.

I'm happy to make other flows that anyone can suggest to help troubleshoot,
diagnose issue.

Lars,

We haven't changed it between when performance was good and now when it's
slow. That is what is throwing me - nothing changed from NiFi configuration
standby.
My guess is we are having some throttling/resource contention from our
provider but I can't determine what/where/how. The Grafana cluster
dashboards I have don't indicate issues. If there are suggestions for
specific cluster metrics to plot/dashboards to use, I'm happy to build them
and contribute them back (I do have a dashboard I need to figure out how to
share for creating the "status history" plots in Grafana).
The repos aren't full and I tried even blowing them away just to see if
that made a difference.
I'm not seeing anything new in the logs that indicate an issue...but maybe
I'm missing it so I will try to look again

By chance, are there any low level debugging metrics/observability/etc that
would show how long things like writing to the repository disks is taking?
There is a part of me that feels this could be a Disk I/O resource issue
but I don't know how I can verify that is/isn't the issue.

Thank you all for the help and suggestions - please keep them coming as I'm
grasping at straws right now.

-Aaron


On Wed, Jan 10, 2024 at 10:10 AM Joe Witt  wrote:

> Aaron,
>
> The usual suspects are memory consumption leading to high GC leading to
> lower performance over time, or back pressure in the flow, etc.. But your
> description does not really fit either exactly.  Does your flow see a mix
> of large objects and smaller objects?
>
> Thanks
>
> On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich  wrote:
>
>> Hi all,
>>
>>
>>
>> I’m running into an odd issue and hoping someone can point me in the
>> right direction.
>>
>>
>>
>> I have NiFi 1.19 deployed in a Kube cluster with all the repositories
>> volume mounted out. It was processing great with processors like
>> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.
>>
>>
>>
>> With nothing changing in the deployment, the performance has dropped to
>> UpdateAttribute doing 350/5m and Putfile to 200/5m.
>>
>>
>>
>> I’m trying to determine what resource is suddenly dropping our
>> performance like this. I don’t see anything on the Kube monitoring that
>> stands out and I have restarted, cleaned repos, changed nodes but nothing
>> is helping.
>>
>>
>>
>> I was hoping there is something from the NiFi POV that can help identify
>> the limiting resource. I'm not sure if there is additional
>> diagnostic/debug/etc information available beyond the node status graphs.
>>
>>
>>
>> Any help would be greatly appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> -Aaron
>>
>


Re: Finding slow down in processing

2024-01-10 Thread Joe Obernberger
You can also set the processors scheduling -> run duration to something 
other than 0ms.
I've found NiFi will do heavy disk IO when things have been running for 
a while / queue sizes are large.  Been using tools like atop to watch 
disk IO.  Check settings for flow, content, and provenance repos.

Are queues leading into processors load balanced?

-Joe

On 1/10/2024 1:08 PM, Lars Winderling wrote:
Hi Aaron, is the number of threads set sufficiently high? Once I set 
it too low by accident on a very powerful machine, and when we got 
more and more flows, at some point NiFi slowed down tremendously. By 
increasing threads to the recommend setting (a few per core, cf. admin 
docs) we got NiFi back to speed.
Another cause of performance loss might be other workloads in the same 
cluster. In case of some cloud provider, you might also get throttled 
down for high disk/resource/... usage. Just a thought.
Anything in the logs? Maybe your repositories for content, flowfiles 
etc are full, and NiFi cannot cope with archiving and shuffling in the 
background. But there should be an indication in the logs.

Good luck, Lars


On 10 January 2024 18:09:07 CET, Joe Witt  wrote:

Aaron,

The usual suspects are memory consumption leading to high GC
leading to lower performance over time, or back pressure in the
flow, etc.. But your description does not really fit either
exactly.  Does your flow see a mix of large objects and smaller
objects?

Thanks

On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich 
wrote:

Hi all,

I’m running into an odd issue and hoping someone can point me
in the right direction.

I have NiFi 1.19 deployed in a Kube cluster with all the
repositories volume mounted out. It was processing great with
processors like UpdateAttribute sending through 15K/5m PutFile
sending through 3K/5m.

With nothing changing in the deployment, the performance has
dropped to UpdateAttribute doing 350/5m and Putfile to 200/5m.

I’m trying to determine what resource is suddenly dropping our
performance like this. I don’t see anything on the Kube
monitoring that stands out and I have restarted, cleaned
repos, changed nodes but nothing is helping.

I was hoping there is something from the NiFi POV that can
help identify the limiting resource. I'm not sure if there is
additional diagnostic/debug/etc information available beyond
the node status graphs.

Any help would be greatly appreciated.

Thanks.

-Aaron



--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Re: Finding slow down in processing

2024-01-10 Thread Lars Winderling
Hi Aaron, is the number of threads set sufficiently high? Once I set it too low 
by accident on a very powerful machine, and when we got more and more flows, at 
some point NiFi slowed down tremendously. By increasing threads to the 
recommend setting (a few per core, cf. admin docs) we got NiFi back to speed.
Another cause of performance loss might be other workloads in the same cluster. 
In case of some cloud provider, you might also get throttled down for high 
disk/resource/... usage. Just a thought.
Anything in the logs? Maybe your repositories for content, flowfiles etc are 
full, and NiFi cannot cope with archiving and shuffling in the background. But 
there should be an indication in the logs.
Good luck, Lars

On 10 January 2024 18:09:07 CET, Joe Witt  wrote:
>Aaron,
>
>The usual suspects are memory consumption leading to high GC leading to
>lower performance over time, or back pressure in the flow, etc.. But your
>description does not really fit either exactly.  Does your flow see a mix
>of large objects and smaller objects?
>
>Thanks
>
>On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich  wrote:
>
>> Hi all,
>>
>>
>>
>> I’m running into an odd issue and hoping someone can point me in the right
>> direction.
>>
>>
>>
>> I have NiFi 1.19 deployed in a Kube cluster with all the repositories
>> volume mounted out. It was processing great with processors like
>> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.
>>
>>
>>
>> With nothing changing in the deployment, the performance has dropped to
>> UpdateAttribute doing 350/5m and Putfile to 200/5m.
>>
>>
>>
>> I’m trying to determine what resource is suddenly dropping our performance
>> like this. I don’t see anything on the Kube monitoring that stands out and
>> I have restarted, cleaned repos, changed nodes but nothing is helping.
>>
>>
>>
>> I was hoping there is something from the NiFi POV that can help identify
>> the limiting resource. I'm not sure if there is additional
>> diagnostic/debug/etc information available beyond the node status graphs.
>>
>>
>>
>> Any help would be greatly appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> -Aaron
>>


Re: Finding slow down in processing

2024-01-10 Thread Joe Witt
Aaron,

The usual suspects are memory consumption leading to high GC leading to
lower performance over time, or back pressure in the flow, etc.. But your
description does not really fit either exactly.  Does your flow see a mix
of large objects and smaller objects?

Thanks

On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich  wrote:

> Hi all,
>
>
>
> I’m running into an odd issue and hoping someone can point me in the right
> direction.
>
>
>
> I have NiFi 1.19 deployed in a Kube cluster with all the repositories
> volume mounted out. It was processing great with processors like
> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.
>
>
>
> With nothing changing in the deployment, the performance has dropped to
> UpdateAttribute doing 350/5m and Putfile to 200/5m.
>
>
>
> I’m trying to determine what resource is suddenly dropping our performance
> like this. I don’t see anything on the Kube monitoring that stands out and
> I have restarted, cleaned repos, changed nodes but nothing is helping.
>
>
>
> I was hoping there is something from the NiFi POV that can help identify
> the limiting resource. I'm not sure if there is additional
> diagnostic/debug/etc information available beyond the node status graphs.
>
>
>
> Any help would be greatly appreciated.
>
>
>
> Thanks.
>
>
>
> -Aaron
>


Finding slow down in processing

2024-01-10 Thread Aaron Rich
Hi all,



I’m running into an odd issue and hoping someone can point me in the right
direction.



I have NiFi 1.19 deployed in a Kube cluster with all the repositories
volume mounted out. It was processing great with processors like
UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.



With nothing changing in the deployment, the performance has dropped to
UpdateAttribute doing 350/5m and Putfile to 200/5m.



I’m trying to determine what resource is suddenly dropping our performance
like this. I don’t see anything on the Kube monitoring that stands out and
I have restarted, cleaned repos, changed nodes but nothing is helping.



I was hoping there is something from the NiFi POV that can help identify
the limiting resource. I'm not sure if there is additional
diagnostic/debug/etc information available beyond the node status graphs.



Any help would be greatly appreciated.



Thanks.



-Aaron