k - very happy to mention there is a PR out that is under review which will offer an alternative provenance implementation. Sustained high rate testing has shown out of the box 2.5x improvement with immediate indexing/results so all kinds of fun there.
On Fri, Feb 17, 2017 at 12:13 AM, Mikhail Sosonkin <[email protected]> wrote: > Let me look through the log. I've not seen anything too weird there before, > but I'll check again. In the UI, I quite normally see flows getting slowed > because provenance can't keep up. But it hasn't been too slow for us, so I > didn't pay much attention. > > On Fri, Feb 17, 2017 at 12:03 AM, Joe Witt <[email protected]> wrote: >> >> when I said one more thing i definitely lied. >> >> Can you see anything in the UI indicating provenance backpressure is >> being applied and if you look in the app log is there anything >> interesting that isn't too sensitive to share? >> >> On Thu, Feb 16, 2017 at 11:56 PM, Joe Witt <[email protected]> wrote: >> > Mike >> > >> > One more thing...can you please grab a couple more thread dumps for us >> > with 5 to 10 mins between? >> > >> > I don't see a deadlock but do suspect either just crazy slow IO going >> > on or a possible livelock. The thread dump will help narrow that down >> > a bit. >> > >> > Can you run 'iostat -xmh 20' for a bit (or its equivalent) on the >> > system too please. >> > >> > Thanks >> > Joe >> > >> > On Thu, Feb 16, 2017 at 11:52 PM, Joe Witt <[email protected]> wrote: >> >> Mike, >> >> >> >> No need for more info. Heap/GC looks beautiful. >> >> >> >> The thread dump however, shows some problems. The provenance >> >> repository is locked up. Numerous threads are sitting here >> >> >> >> at >> >> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) >> >> at >> >> org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:757) >> >> >> >> This means these are processors committing their sessions and updating >> >> provenance but they're waiting on a readlock to provenance. This lock >> >> cannot be obtained because a provenance maintenance thread is >> >> attempting to purge old events and cannot. >> >> >> >> I recall us having addressed this so am looking to see when that was >> >> addressed. If provenance is not critical for you right now you can >> >> swap out the persistent implementation with the volatile provenance >> >> repository. In nifi.properties change this line >> >> >> >> >> >> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository >> >> >> >> to >> >> >> >> >> >> nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository >> >> >> >> The behavior reminds me of this issue which was fixed in 1.x >> >> https://issues.apache.org/jira/browse/NIFI-2395 >> >> >> >> Need to dig into this more... >> >> >> >> Thanks >> >> Joe >> >> >> >> On Thu, Feb 16, 2017 at 11:36 PM, Mikhail Sosonkin <[email protected]> >> >> wrote: >> >>> Hi Joe, >> >>> >> >>> Thank you for your quick response. The system is currently in the >> >>> deadlock >> >>> state with 10 worker threads spinning. So, I'll gather the info you >> >>> requested. >> >>> >> >>> - The available space on the partition is 223G free of 500G (same as >> >>> was >> >>> available for 0.6.1) >> >>> - java.arg.3=-Xmx4096m in bootstrap.conf >> >>> - thread dump and jstats are here >> >>> https://gist.github.com/nologic/1ac064cb42cc16ca45d6ccd1239ce085 >> >>> >> >>> Unfortunately, it's hard to predict when the decay starts and it takes >> >>> too >> >>> long to have to monitor the system manually. However, if you still >> >>> need, >> >>> after seeing the attached dumps, the thread dumps while it decays I >> >>> can set >> >>> up a timer script. >> >>> >> >>> Let me know if you need any more info. >> >>> >> >>> Thanks, >> >>> Mike. >> >>> >> >>> >> >>> On Thu, Feb 16, 2017 at 9:54 PM, Joe Witt <[email protected]> wrote: >> >>>> >> >>>> Mike, >> >>>> >> >>>> Can you capture a series of thread dumps as the gradual decay occurs >> >>>> and signal at what point they were generated specifically calling out >> >>>> the "now the system is doing nothing" point. Can you check for space >> >>>> available on the system during these times as well. Also, please >> >>>> advise on the behavior of the heap/garbage collection. Often (not >> >>>> always) a gradual decay in performance can suggest an issue with GC >> >>>> as >> >>>> you know. Can you run something like >> >>>> >> >>>> jstat -gcutil -h5 <pid> 1000 >> >>>> >> >>>> And capture those rules in these chunks as well. >> >>>> >> >>>> This would give us a pretty good picture of the health of the system/ >> >>>> and JVM around these times. It is probably too much for the mailing >> >>>> list for the info so feel free to create a JIRA for this and put >> >>>> attachments there or link to gists in github/etc. >> >>>> >> >>>> Pretty confident we can get to the bottom of what you're seeing >> >>>> quickly. >> >>>> >> >>>> Thanks >> >>>> Joe >> >>>> >> >>>> On Thu, Feb 16, 2017 at 9:43 PM, Mikhail Sosonkin >> >>>> <[email protected]> >> >>>> wrote: >> >>>> > Hello, >> >>>> > >> >>>> > Recently, we've upgraded from 0.6.1 to 1.1.1 and at first >> >>>> > everything was >> >>>> > working well. However, a few hours later none of the processors >> >>>> > were >> >>>> > showing >> >>>> > any activity. Then, I tried restarting nifi which caused some >> >>>> > flowfiles >> >>>> > to >> >>>> > get corrupted evidenced by exceptions thrown in the nifi-app.log, >> >>>> > however >> >>>> > the processors still continue to produce no activity. Next, I stop >> >>>> > the >> >>>> > service and delete all state (content_repository >> >>>> > database_repository >> >>>> > flowfile_repository provenance_repository work). Then the >> >>>> > processors >> >>>> > start >> >>>> > working for a few hours (maybe a day) until the deadlock occurs >> >>>> > again. >> >>>> > >> >>>> > So, this cycle continues where I have to periodically reset the >> >>>> > service >> >>>> > and >> >>>> > delete the state to get things moving. Obviously, that's not great. >> >>>> > I'll >> >>>> > note that the flow.xml file has been changed, as I added/removed >> >>>> > processors, >> >>>> > by the new version of nifi but 95% of the flow configuration is the >> >>>> > same >> >>>> > as >> >>>> > before the upgrade. So, I'm wondering if there is a configuration >> >>>> > setting >> >>>> > that causes these deadlocks. >> >>>> > >> >>>> > What I've been able to observe is that the deadlock is "gradual" in >> >>>> > that >> >>>> > my >> >>>> > flow usually takes about 4-5 threads to execute. The deadlock >> >>>> > causes the >> >>>> > worker threads to max out at the limit and I'm not even able to >> >>>> > stop any >> >>>> > processors or list queues. I also, have not seen this behavior in a >> >>>> > fresh >> >>>> > install of Nifi where the flow.xml would start out empty. >> >>>> > >> >>>> > Can you give me some advise on what to do about this? Would the >> >>>> > problem >> >>>> > be >> >>>> > resolved if I manually rebuild the flow with the new version of >> >>>> > Nifi >> >>>> > (not >> >>>> > looking forward to that)? >> >>>> > >> >>>> > Much appreciated. >> >>>> > >> >>>> > Mike. >> >>>> > >> >>>> > This email may contain material that is confidential for the sole >> >>>> > use of >> >>>> > the >> >>>> > intended recipient(s). Any review, reliance or distribution or >> >>>> > disclosure >> >>>> > by others without express permission is strictly prohibited. If >> >>>> > you are >> >>>> > not >> >>>> > the intended recipient, please contact the sender and delete all >> >>>> > copies >> >>>> > of >> >>>> > this message. >> >>> >> >>> >> >>> >> >>> This email may contain material that is confidential for the sole use >> >>> of the >> >>> intended recipient(s). Any review, reliance or distribution or >> >>> disclosure >> >>> by others without express permission is strictly prohibited. If you >> >>> are not >> >>> the intended recipient, please contact the sender and delete all >> >>> copies of >> >>> this message. > > > > This email may contain material that is confidential for the sole use of the > intended recipient(s). Any review, reliance or distribution or disclosure > by others without express permission is strictly prohibited. If you are not > the intended recipient, please contact the sender and delete all copies of > this message.
