Hi Joe, Yes it is the same issue. We have used your advice and reduced the amount of threads on our large processors: fetch/compress/publish to a minimum and then increased gradually to 4 until the processing rate became acceptable (about 2000 files per 5 min). This is a cluster of 25 nodes of 36 cores each.
On Thu, Jan 28, 2021 at 8:19 PM Joe Witt <[email protected]> wrote: > I'm assuming also this is the same thing Maksym was asking about > yesterday. Let's try to keep the thread together as this gets discussed. > > On Thu, Jan 28, 2021 at 1:10 PM Pierre Villard < > [email protected]> wrote: > >> Hi Zilvinas, >> >> I'm afraid we would need more details to help you out here. >> >> My first question by quickly looking at the graph would be: there is a >> host (green line) where the number of queued flow files is more or less >> constantly growing. Where in the flow are the flow files accumulating for >> this node? What processor is creating back pressure? Do we have anything in >> the log for this node around the time where flow files start accumulating? >> >> Thanks, >> Pierre >> >> Le ven. 29 janv. 2021 à 00:02, Zilvinas Saltys < >> [email protected]> a écrit : >> >>> Hi, >>> >>> We run a 25 node Nifi cluster on version 1.12. We're processing about >>> 2000 files per 5 mins where each file is from 100 to 500 megabytes. >>> >>> What I notice is that some workers degrade in performance and keep >>> accumulating a queued files delay. See attached screenshots where it shows >>> two hosts where one is degraded. >>> >>> One seemingly dead give away is that the degraded node starts doing >>> heavy and intensive disk read io while the other node keeps doing none. I >>> ran iostat on those nodes and I know that the read IOs are on the >>> content_repository directory. But it makes no sense to me how some of the >>> nodes who are doing these heavy tasks are doing no disk read io. In this >>> example I know that both nodes are processing roughly the same amount of >>> files and of same size. >>> >>> The pipeline is somewhat simple: >>> 1) Read from SQS 2) Fetch file contents from S3 3) Publish file contents >>> to Kafka 4) Compress file contents 5) Put compressed contents back to S3 >>> >>> All of these operations to my understanding should require heavy reads >>> from local disk to fetch file contents from content repository? How is such >>> a thing possible that some nodes are processing lots of files and are not >>> showing any disk reads and then suddenly spike in disk reads and degrade? >>> >>> Any clues would be really helpful. >>> Thanks. >>> >>
