Russ, The ticket you reference [1] is still open, and I did not see any changes in 0.7.2 or 1.1.2 that would indicate any fix was included. You can create a PR with your code in it (or ask someone to do it if you’re not comfortable with GitHub).
https://issues.apache.org/jira/browse/NIFI-3364 <https://issues.apache.org/jira/browse/NIFI-3364> Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Feb 17, 2017, at 8:31 AM, Russell Bateman <[email protected]> wrote: > > Mikhail, > > I have a short article with step-by-step information and comments on how I > profile NiFi. You'll want the latest NiFi release, however, because the Java > Flight Recorder JVM arguments are very order-dependent. (I'm assuming that > NiFi 1.1.2 and 0.7.2 have the fix for conf/bootstrap.conf numeric-argument > order.) I've been using this for a couple of months and finally got around to > writing it up from my personal notes in a more usable form: > > http://www.javahotchocolate.com/notes/jfr.html > <http://www.javahotchocolate.com/notes/jfr.html> > > I hope this is helpful. > > Russ > > On 02/16/2017 10:18 PM, Mikhail Sosonkin wrote: >> Been a while since I've used a profiler, but I'll give it a shot when I get >> to a place with faster internet link :) >> >> On Fri, Feb 17, 2017 at 12:08 AM, Tony Kurc <[email protected] >> <mailto:[email protected]>> wrote: >> Mike, also if what Joe asked with the backpressure is "not being applied", >> if you're good with a profiler, I think joe and I both gravitated to >> 0x00000006c533b770 being locked in at >> org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:757). >> It would be interesting to see if that section is taking longer over time. >> >> On Thu, Feb 16, 2017 at 11:56 PM, Joe Witt < >> <mailto:[email protected]>[email protected] <mailto:[email protected]>> >> wrote: >> Mike >> >> One more thing...can you please grab a couple more thread dumps for us >> with 5 to 10 mins between? >> >> I don't see a deadlock but do suspect either just crazy slow IO going >> on or a possible livelock. The thread dump will help narrow that down >> a bit. >> >> Can you run 'iostat -xmh 20' for a bit (or its equivalent) on the >> system too please. >> >> Thanks >> Joe >> >> On Thu, Feb 16, 2017 at 11:52 PM, Joe Witt <[email protected] >> <mailto:[email protected]>> wrote: >> > Mike, >> > >> > No need for more info. Heap/GC looks beautiful. >> > >> > The thread dump however, shows some problems. The provenance >> > repository is locked up. Numerous threads are sitting here >> > >> > at >> > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) >> > at >> > org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:757) >> > >> > This means these are processors committing their sessions and updating >> > provenance but they're waiting on a readlock to provenance. This lock >> > cannot be obtained because a provenance maintenance thread is >> > attempting to purge old events and cannot. >> > >> > I recall us having addressed this so am looking to see when that was >> > addressed. If provenance is not critical for you right now you can >> > swap out the persistent implementation with the volatile provenance >> > repository. In nifi.properties change this line >> > >> > nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository >> > >> > to >> > >> > nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository >> > >> > The behavior reminds me of this issue which was fixed in 1.x >> > https://issues.apache.org/jira/browse/NIFI-2395 >> > <https://issues.apache.org/jira/browse/NIFI-2395> >> > >> > Need to dig into this more... >> > >> > Thanks >> > Joe >> > >> > On Thu, Feb 16, 2017 at 11:36 PM, Mikhail Sosonkin <[email protected] >> > <mailto:[email protected]>> wrote: >> >> Hi Joe, >> >> >> >> Thank you for your quick response. The system is currently in the deadlock >> >> state with 10 worker threads spinning. So, I'll gather the info you >> >> requested. >> >> >> >> - The available space on the partition is 223G free of 500G (same as was >> >> available for 0.6.1) >> >> - java.arg.3=-Xmx4096m in bootstrap.conf >> >> - thread dump and jstats are here >> >> https://gist.github.com/nologic/1ac064cb42cc16ca45d6ccd1239ce085 >> >> <https://gist.github.com/nologic/1ac064cb42cc16ca45d6ccd1239ce085> >> >> >> >> Unfortunately, it's hard to predict when the decay starts and it takes too >> >> long to have to monitor the system manually. However, if you still need, >> >> after seeing the attached dumps, the thread dumps while it decays I can >> >> set >> >> up a timer script. >> >> >> >> Let me know if you need any more info. >> >> >> >> Thanks, >> >> Mike. >> >> >> >> >> >> On Thu, Feb 16, 2017 at 9:54 PM, Joe Witt <[email protected] >> >> <mailto:[email protected]>> wrote: >> >>> >> >>> Mike, >> >>> >> >>> Can you capture a series of thread dumps as the gradual decay occurs >> >>> and signal at what point they were generated specifically calling out >> >>> the "now the system is doing nothing" point. Can you check for space >> >>> available on the system during these times as well. Also, please >> >>> advise on the behavior of the heap/garbage collection. Often (not >> >>> always) a gradual decay in performance can suggest an issue with GC as >> >>> you know. Can you run something like >> >>> >> >>> jstat -gcutil -h5 <pid> 1000 >> >>> >> >>> And capture those rules in these chunks as well. >> >>> >> >>> This would give us a pretty good picture of the health of the system/ >> >>> and JVM around these times. It is probably too much for the mailing >> >>> list for the info so feel free to create a JIRA for this and put >> >>> attachments there or link to gists in github/etc. >> >>> >> >>> Pretty confident we can get to the bottom of what you're seeing quickly. >> >>> >> >>> Thanks >> >>> Joe >> >>> >> >>> On Thu, Feb 16, 2017 at 9:43 PM, Mikhail Sosonkin <[email protected] >> >>> <mailto:[email protected]>> >> >>> wrote: >> >>> > Hello, >> >>> > >> >>> > Recently, we've upgraded from 0.6.1 to 1.1.1 and at first everything >> >>> > was >> >>> > working well. However, a few hours later none of the processors were >> >>> > showing >> >>> > any activity. Then, I tried restarting nifi which caused some flowfiles >> >>> > to >> >>> > get corrupted evidenced by exceptions thrown in the nifi-app.log, >> >>> > however >> >>> > the processors still continue to produce no activity. Next, I stop the >> >>> > service and delete all state (content_repository database_repository >> >>> > flowfile_repository provenance_repository work). Then the processors >> >>> > start >> >>> > working for a few hours (maybe a day) until the deadlock occurs again. >> >>> > >> >>> > So, this cycle continues where I have to periodically reset the service >> >>> > and >> >>> > delete the state to get things moving. Obviously, that's not great. >> >>> > I'll >> >>> > note that the flow.xml file has been changed, as I added/removed >> >>> > processors, >> >>> > by the new version of nifi but 95% of the flow configuration is the >> >>> > same >> >>> > as >> >>> > before the upgrade. So, I'm wondering if there is a configuration >> >>> > setting >> >>> > that causes these deadlocks. >> >>> > >> >>> > What I've been able to observe is that the deadlock is "gradual" in >> >>> > that >> >>> > my >> >>> > flow usually takes about 4-5 threads to execute. The deadlock causes >> >>> > the >> >>> > worker threads to max out at the limit and I'm not even able to stop >> >>> > any >> >>> > processors or list queues. I also, have not seen this behavior in a >> >>> > fresh >> >>> > install of Nifi where the flow.xml would start out empty. >> >>> > >> >>> > Can you give me some advise on what to do about this? Would the problem >> >>> > be >> >>> > resolved if I manually rebuild the flow with the new version of Nifi >> >>> > (not >> >>> > looking forward to that)? >> >>> > >> >>> > Much appreciated. >> >>> > >> >>> > Mike. >> >>> > >> >>> > This email may contain material that is confidential for the sole use >> >>> > of >> >>> > the >> >>> > intended recipient(s). Any review, reliance or distribution or >> >>> > disclosure >> >>> > by others without express permission is strictly prohibited. If you >> >>> > are >> >>> > not >> >>> > the intended recipient, please contact the sender and delete all copies >> >>> > of >> >>> > this message. >> >> >> >> >> >> >> >> This email may contain material that is confidential for the sole use of >> >> the >> >> intended recipient(s). Any review, reliance or distribution or disclosure >> >> by others without express permission is strictly prohibited. If you are >> >> not >> >> the intended recipient, please contact the sender and delete all copies of >> >> this message. >> >> >> >> This email may contain material that is confidential for the sole use of the >> intended recipient(s). Any review, reliance or distribution or disclosure >> by others without express permission is strictly prohibited. If you are not >> the intended recipient, please contact the sender and delete all copies of >> this message. >
signature.asc
Description: Message signed with OpenPGP using GPGMail
