k - very happy to mention there is a PR out that is under review which
will offer an alternative provenance implementation.  Sustained high
rate testing has shown out of the box 2.5x improvement with immediate
indexing/results so all kinds of fun there.

On Fri, Feb 17, 2017 at 12:13 AM, Mikhail Sosonkin <[email protected]> wrote:
> Let me look through the log. I've not seen anything too weird there before,
> but I'll check again. In the UI, I quite normally see flows getting slowed
> because provenance can't keep up. But it hasn't been too slow for us, so I
> didn't pay much attention.
>
> On Fri, Feb 17, 2017 at 12:03 AM, Joe Witt <[email protected]> wrote:
>>
>> when I said one more thing i definitely lied.
>>
>> Can you see anything in the UI indicating provenance backpressure is
>> being applied and if you look in the app log is there anything
>> interesting that isn't too sensitive to share?
>>
>> On Thu, Feb 16, 2017 at 11:56 PM, Joe Witt <[email protected]> wrote:
>> > Mike
>> >
>> > One more thing...can you please grab a couple more thread dumps for us
>> > with 5 to 10 mins between?
>> >
>> > I don't see a deadlock but do suspect either just crazy slow IO going
>> > on or a possible livelock.  The thread dump will help narrow that down
>> > a bit.
>> >
>> > Can you run 'iostat -xmh 20' for a bit (or its equivalent) on the
>> > system too please.
>> >
>> > Thanks
>> > Joe
>> >
>> > On Thu, Feb 16, 2017 at 11:52 PM, Joe Witt <[email protected]> wrote:
>> >> Mike,
>> >>
>> >> No need for more info.  Heap/GC looks beautiful.
>> >>
>> >> The thread dump however, shows some problems.  The provenance
>> >> repository is locked up.  Numerous threads are sitting here
>> >>
>> >> at
>> >> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>> >> at
>> >> org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:757)
>> >>
>> >> This means these are processors committing their sessions and updating
>> >> provenance but they're waiting on a readlock to provenance.  This lock
>> >> cannot be obtained because a provenance maintenance thread is
>> >> attempting to purge old events and cannot.
>> >>
>> >> I recall us having addressed this so am looking to see when that was
>> >> addressed.  If provenance is not critical for you right now you can
>> >> swap out the persistent implementation with the volatile provenance
>> >> repository.  In nifi.properties change this line
>> >>
>> >>
>> >> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
>> >>
>> >> to
>> >>
>> >>
>> >> nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository
>> >>
>> >> The behavior reminds me of this issue which was fixed in 1.x
>> >> https://issues.apache.org/jira/browse/NIFI-2395
>> >>
>> >> Need to dig into this more...
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Thu, Feb 16, 2017 at 11:36 PM, Mikhail Sosonkin <[email protected]>
>> >> wrote:
>> >>> Hi Joe,
>> >>>
>> >>> Thank you for your quick response. The system is currently in the
>> >>> deadlock
>> >>> state with 10 worker threads spinning. So, I'll gather the info you
>> >>> requested.
>> >>>
>> >>> - The available space on the partition is 223G free of 500G (same as
>> >>> was
>> >>> available for 0.6.1)
>> >>> - java.arg.3=-Xmx4096m in bootstrap.conf
>> >>> - thread dump and jstats are here
>> >>> https://gist.github.com/nologic/1ac064cb42cc16ca45d6ccd1239ce085
>> >>>
>> >>> Unfortunately, it's hard to predict when the decay starts and it takes
>> >>> too
>> >>> long to have to monitor the system manually. However, if you still
>> >>> need,
>> >>> after seeing the attached dumps, the thread dumps while it decays I
>> >>> can set
>> >>> up a timer script.
>> >>>
>> >>> Let me know if you need any more info.
>> >>>
>> >>> Thanks,
>> >>> Mike.
>> >>>
>> >>>
>> >>> On Thu, Feb 16, 2017 at 9:54 PM, Joe Witt <[email protected]> wrote:
>> >>>>
>> >>>> Mike,
>> >>>>
>> >>>> Can you capture a series of thread dumps as the gradual decay occurs
>> >>>> and signal at what point they were generated specifically calling out
>> >>>> the "now the system is doing nothing" point.  Can you check for space
>> >>>> available on the system during these times as well.  Also, please
>> >>>> advise on the behavior of the heap/garbage collection.  Often (not
>> >>>> always) a gradual decay in performance can suggest an issue with GC
>> >>>> as
>> >>>> you know.  Can you run something like
>> >>>>
>> >>>> jstat -gcutil -h5 <pid> 1000
>> >>>>
>> >>>> And capture those rules in these chunks as well.
>> >>>>
>> >>>> This would give us a pretty good picture of the health of the system/
>> >>>> and JVM around these times.  It is probably too much for the mailing
>> >>>> list for the info so feel free to create a JIRA for this and put
>> >>>> attachments there or link to gists in github/etc.
>> >>>>
>> >>>> Pretty confident we can get to the bottom of what you're seeing
>> >>>> quickly.
>> >>>>
>> >>>> Thanks
>> >>>> Joe
>> >>>>
>> >>>> On Thu, Feb 16, 2017 at 9:43 PM, Mikhail Sosonkin
>> >>>> <[email protected]>
>> >>>> wrote:
>> >>>> > Hello,
>> >>>> >
>> >>>> > Recently, we've upgraded from 0.6.1 to 1.1.1 and at first
>> >>>> > everything was
>> >>>> > working well. However, a few hours later none of the processors
>> >>>> > were
>> >>>> > showing
>> >>>> > any activity. Then, I tried restarting nifi which caused some
>> >>>> > flowfiles
>> >>>> > to
>> >>>> > get corrupted evidenced by exceptions thrown in the nifi-app.log,
>> >>>> > however
>> >>>> > the processors still continue to produce no activity. Next, I stop
>> >>>> > the
>> >>>> > service and delete all state (content_repository
>> >>>> > database_repository
>> >>>> > flowfile_repository provenance_repository work). Then the
>> >>>> > processors
>> >>>> > start
>> >>>> > working for a few hours (maybe a day) until the deadlock occurs
>> >>>> > again.
>> >>>> >
>> >>>> > So, this cycle continues where I have to periodically reset the
>> >>>> > service
>> >>>> > and
>> >>>> > delete the state to get things moving. Obviously, that's not great.
>> >>>> > I'll
>> >>>> > note that the flow.xml file has been changed, as I added/removed
>> >>>> > processors,
>> >>>> > by the new version of nifi but 95% of the flow configuration is the
>> >>>> > same
>> >>>> > as
>> >>>> > before the upgrade. So, I'm wondering if there is a configuration
>> >>>> > setting
>> >>>> > that causes these deadlocks.
>> >>>> >
>> >>>> > What I've been able to observe is that the deadlock is "gradual" in
>> >>>> > that
>> >>>> > my
>> >>>> > flow usually takes about 4-5 threads to execute. The deadlock
>> >>>> > causes the
>> >>>> > worker threads to max out at the limit and I'm not even able to
>> >>>> > stop any
>> >>>> > processors or list queues. I also, have not seen this behavior in a
>> >>>> > fresh
>> >>>> > install of Nifi where the flow.xml would start out empty.
>> >>>> >
>> >>>> > Can you give me some advise on what to do about this? Would the
>> >>>> > problem
>> >>>> > be
>> >>>> > resolved if I manually rebuild the flow with the new version of
>> >>>> > Nifi
>> >>>> > (not
>> >>>> > looking forward to that)?
>> >>>> >
>> >>>> > Much appreciated.
>> >>>> >
>> >>>> > Mike.
>> >>>> >
>> >>>> > This email may contain material that is confidential for the sole
>> >>>> > use of
>> >>>> > the
>> >>>> > intended recipient(s).  Any review, reliance or distribution or
>> >>>> > disclosure
>> >>>> > by others without express permission is strictly prohibited.  If
>> >>>> > you are
>> >>>> > not
>> >>>> > the intended recipient, please contact the sender and delete all
>> >>>> > copies
>> >>>> > of
>> >>>> > this message.
>> >>>
>> >>>
>> >>>
>> >>> This email may contain material that is confidential for the sole use
>> >>> of the
>> >>> intended recipient(s).  Any review, reliance or distribution or
>> >>> disclosure
>> >>> by others without express permission is strictly prohibited.  If you
>> >>> are not
>> >>> the intended recipient, please contact the sender and delete all
>> >>> copies of
>> >>> this message.
>
>
>
> This email may contain material that is confidential for the sole use of the
> intended recipient(s).  Any review, reliance or distribution or disclosure
> by others without express permission is strictly prohibited.  If you are not
> the intended recipient, please contact the sender and delete all copies of
> this message.

Reply via email to