Aaron, I ran into an issue where the Execute Stream Command (ESC) processor with many threads would run a legacy script that would hang if the incoming file was 'inconsistent'. It appeared that ESC slowly collected stuck threads as malformed data randomly streamed through it. Eventually I ran out of threads as the system was just waiting for a thread to become available.
It was apparent in the processor statistics where the flowfiles-out statistic would eventually step down to zero as threads became stuck. It might be worth trying InvokeScriptedProcessor or building custom processors as they provide a means to handle these inconsistencies more gracefully. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html Thanks, Lee On Fri, Jul 15, 2016 at 6:50 AM, Aaron Longfield <[email protected]> wrote: > Hi Mark, > > I've been using the G1 garbage collector. I brought the nodes down to 8GB > heap and let it run overnight, but processing still got stuck and requiring > NiFi to be restarted on all nodes. It took longer to happen, but they went > down after a few hours. Are there any other things I can look into? > > Thanks! > > -Aaron > > On Thu, Jul 14, 2016 at 2:33 PM, Mark Payne <[email protected]> wrote: > >> Aaron, >> >> My guess would be that you are hitting a Full Garbage Collection. With >> such a huge Java heap, that will cause a "stop the world" pause for quite a >> long time. >> Which garbage collector are you using? Have you tried reducing the heap >> from 48 GB to say 4 or 8 GB? >> >> Thanks >> -Mark >> >> >> > On Jul 14, 2016, at 11:14 AM, Aaron Longfield <[email protected]> >> wrote: >> > >> > Hi, >> > >> > I'm having an issue with a small (two node) NiFi cluster where the >> nodes will stop processing any queued flowfiles. I haven't seen any error >> messages logged related to it, and when attempting to restart the service, >> NiFi doesn't respond and the script forcibly kills it. This causes >> multiple flowfile version to hang around, and generally makes me feel like >> it might be causing data loss. >> > >> > I'm running the web UI on a different box, and when things stop >> working, it stops showing changes to counts in any queues, and the thread >> count never changes. It still thinks the nodes are connecting and >> responding, though. >> > >> > My environment is two 8 cpu systems w/ 60GB memory with 48GB given to >> the NiFi JVM in bootstrap.conf. I have timer threads limited to 12, and >> event threads to 4. Install is on the current Amazon Linux AMI and using >> OpenJDK 1.8.0.91 x64. >> > >> > Any idea, other debug steps, or changes that I can try? I'm running >> 0.7.0, having upgraded from 0.6.1, but this has been occurring with both >> versions. The higher the flowfile volume I push through, the faster this >> happens. >> > >> > Thanks for any help there is to give! >> > >> > -Aaron Longfield >> >> >
