Re: Deadlocks after upgrade from 0.6.1 to 1.1.1

Mikhail Sosonkin Thu, 16 Feb 2017 20:37:09 -0800

Hi Joe,

Thank you for your quick response. The system is currently in the deadlock
state with 10 worker threads spinning. So, I'll gather the info you
requested.


- The available space on the partition is 223G free of 500G (same as was
available for 0.6.1)
- java.arg.3=-Xmx4096m in bootstrap.conf
- thread dump and jstats are here
https://gist.github.com/nologic/1ac064cb42cc16ca45d6ccd1239ce085

Unfortunately, it's hard to predict when the decay starts and it takes too
long to have to monitor the system manually. However, if you still need,
after seeing the attached dumps, the thread dumps while it decays I can set
up a timer script.

Let me know if you need any more info.

Thanks,
Mike.


On Thu, Feb 16, 2017 at 9:54 PM, Joe Witt <[email protected]> wrote:

> Mike,
>
> Can you capture a series of thread dumps as the gradual decay occurs
> and signal at what point they were generated specifically calling out
> the "now the system is doing nothing" point.  Can you check for space
> available on the system during these times as well.  Also, please
> advise on the behavior of the heap/garbage collection.  Often (not
> always) a gradual decay in performance can suggest an issue with GC as
> you know.  Can you run something like
>
> jstat -gcutil -h5 <pid> 1000
>
> And capture those rules in these chunks as well.
>
> This would give us a pretty good picture of the health of the system/
> and JVM around these times.  It is probably too much for the mailing
> list for the info so feel free to create a JIRA for this and put
> attachments there or link to gists in github/etc.
>
> Pretty confident we can get to the bottom of what you're seeing quickly.
>
> Thanks
> Joe
>
> On Thu, Feb 16, 2017 at 9:43 PM, Mikhail Sosonkin <[email protected]>
> wrote:
> > Hello,
> >
> > Recently, we've upgraded from 0.6.1 to 1.1.1 and at first everything was
> > working well. However, a few hours later none of the processors were
> showing
> > any activity. Then, I tried restarting nifi which caused some flowfiles
> to
> > get corrupted evidenced by exceptions thrown in the nifi-app.log, however
> > the processors still continue to produce no activity. Next, I stop the
> > service and delete all state (content_repository database_repository
> > flowfile_repository provenance_repository work). Then the processors
> start
> > working for a few hours (maybe a day) until the deadlock occurs again.
> >
> > So, this cycle continues where I have to periodically reset the service
> and
> > delete the state to get things moving. Obviously, that's not great. I'll
> > note that the flow.xml file has been changed, as I added/removed
> processors,
> > by the new version of nifi but 95% of the flow configuration is the same
> as
> > before the upgrade. So, I'm wondering if there is a configuration setting
> > that causes these deadlocks.
> >
> > What I've been able to observe is that the deadlock is "gradual" in that
> my
> > flow usually takes about 4-5 threads to execute. The deadlock causes the
> > worker threads to max out at the limit and I'm not even able to stop any
> > processors or list queues. I also, have not seen this behavior in a fresh
> > install of Nifi where the flow.xml would start out empty.
> >
> > Can you give me some advise on what to do about this? Would the problem
> be
> > resolved if I manually rebuild the flow with the new version of Nifi (not
> > looking forward to that)?
> >
> > Much appreciated.
> >
> > Mike.
> >
> > This email may contain material that is confidential for the sole use of
> the
> > intended recipient(s).  Any review, reliance or distribution or
> disclosure
> > by others without express permission is strictly prohibited.  If you are
> not
> > the intended recipient, please contact the sender and delete all copies
> of
> > this message.
>

-- 
This email may contain material that is confidential for the sole use of 
the intended recipient(s).  Any review, reliance or distribution or 
disclosure by others without express permission is strictly prohibited.  If 
you are not the intended recipient, please contact the sender and delete 
all copies of this message.

Re: Deadlocks after upgrade from 0.6.1 to 1.1.1

Reply via email to