Re: Nifi cluster nodes regularly stop processing any flowfiles

Lee Laim Fri, 15 Jul 2016 09:08:15 -0700

Aaron,

I ran into an issue where the Execute Stream Command (ESC) processor with
many threads would run a legacy script that would hang if the incoming file
was 'inconsistent'.  It appeared that ESC slowly collected stuck threads as
malformed data randomly streamed through it. Eventually I ran out of
threads as the system was just waiting for a thread to become available.


It was apparent in the processor statistics where the flowfiles-out
statistic would eventually step down to zero as threads became stuck.

It might be worth trying InvokeScriptedProcessor or building custom
processors as they provide a means to handle these inconsistencies more
gracefully.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html

Thanks,
Lee





On Fri, Jul 15, 2016 at 6:50 AM, Aaron Longfield <[email protected]>
wrote:

> Hi Mark,
>
> I've been using the G1 garbage collector.  I brought the nodes down to 8GB
> heap and let it run overnight, but processing still got stuck and requiring
> NiFi to be restarted on all nodes.  It took longer to happen, but they went
> down after a few hours.  Are there any other things I can look into?
>
> Thanks!
>
> -Aaron
>
> On Thu, Jul 14, 2016 at 2:33 PM, Mark Payne <[email protected]> wrote:
>
>> Aaron,
>>
>> My guess would be that you are hitting a Full Garbage Collection. With
>> such a huge Java heap, that will cause a "stop the world" pause for quite a
>> long time.
>> Which garbage collector are you using? Have you tried reducing the heap
>> from 48 GB to say 4 or 8 GB?
>>
>> Thanks
>> -Mark
>>
>>
>> > On Jul 14, 2016, at 11:14 AM, Aaron Longfield <[email protected]>
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm having an issue with a small (two node) NiFi cluster where the
>> nodes will stop processing any queued flowfiles.  I haven't seen any error
>> messages logged related to it, and when attempting to restart the service,
>> NiFi doesn't respond and the script forcibly kills it.  This causes
>> multiple flowfile version to hang around, and generally makes me feel like
>> it might be causing data loss.
>> >
>> > I'm running the web UI on a different box, and when things stop
>> working, it stops showing changes to counts in any queues, and the thread
>> count never changes.  It still thinks the nodes are connecting and
>> responding, though.
>> >
>> > My environment is two 8 cpu systems w/ 60GB memory with 48GB given to
>> the NiFi JVM in bootstrap.conf.  I have timer threads limited to 12, and
>> event threads to 4.  Install is on the current Amazon Linux AMI and using
>> OpenJDK 1.8.0.91 x64.
>> >
>> > Any idea, other debug steps, or changes that I can try?  I'm running
>> 0.7.0, having upgraded from 0.6.1, but this has been occurring with both
>> versions.  The higher the flowfile volume I push through, the faster this
>> happens.
>> >
>> > Thanks for any help there is to give!
>> >
>> > -Aaron Longfield
>>
>>
>

Re: Nifi cluster nodes regularly stop processing any flowfiles

Reply via email to