Re: Lagging worker nodes

Joe Witt Thu, 28 Jan 2021 12:19:34 -0800

I'm assuming also this is the same thing Maksym was asking about
yesterday.  Let's try to keep the thread together as this gets discussed.


On Thu, Jan 28, 2021 at 1:10 PM Pierre Villard <[email protected]>
wrote:

> Hi Zilvinas,
>
> I'm afraid we would need more details to help you out here.
>
> My first question by quickly looking at the graph would be: there is a
> host (green line) where the number of queued flow files is more or less
> constantly growing. Where in the flow are the flow files accumulating for
> this node? What processor is creating back pressure? Do we have anything in
> the log for this node around the time where flow files start accumulating?
>
> Thanks,
> Pierre
>
> Le ven. 29 janv. 2021 à 00:02, Zilvinas Saltys <
> [email protected]> a écrit :
>
>> Hi,
>>
>> We run a 25 node Nifi cluster on version 1.12. We're processing about
>> 2000 files per 5 mins where each file is from 100 to 500 megabytes.
>>
>> What I notice is that some workers degrade in performance and keep
>> accumulating a queued files delay. See attached screenshots where it shows
>> two hosts where one is degraded.
>>
>> One seemingly dead give away is that the degraded node starts doing heavy
>> and intensive disk read io while the other node keeps doing none. I ran
>> iostat on those nodes and I know that the read IOs are on the
>> content_repository directory. But it makes no sense to me how some of the
>> nodes who are doing these heavy tasks are doing no disk read io. In this
>> example I know that both nodes are processing roughly the same amount of
>> files and of same size.
>>
>> The pipeline is somewhat simple:
>> 1) Read from SQS 2) Fetch file contents from S3 3) Publish file contents
>> to Kafka 4) Compress file contents 5) Put compressed contents back to S3
>>
>> All of these operations to my understanding should require heavy reads
>> from local disk to fetch file contents from content repository? How is such
>> a thing possible that some nodes are processing lots of files and are not
>> showing any disk reads and then suddenly spike in disk reads and degrade?
>>
>> Any clues would be really helpful.
>> Thanks.
>>
>

Re: Lagging worker nodes

Reply via email to