I have been able to bring Nifi UI back up with this change to the limit on number of open files. Thank you all very much for your help and insights. -Jim
On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]> wrote: > Thank you Aldrin. I do have AutoResumeState set to false currently. The > start of my jetty server fails when it tries to start the flowfile > controller. I can't bring the UI up at all. I'm hoping that the system parm > changes allow me to restart NiFi without blowing away my > flowfile_repository. I'll certainly let you know how that plays out. -Jim > > On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> wrote: > >> Jim, >> >> In terms of trying to ease NiFi at start up, you could also try setting >> nifi.flowcontroller.autoResumeState to false in your nifi.properties. >> Depending on how your flow and scripts are constructed, this may allow you >> to piecewise alleviate any large queues/processing of files that could be >> causing the issue at hand. You could additionally bypass the possible >> troublesome script processors to cache this data to disk elsewhere as a >> stop gap measure. >> >> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote: >> >>> Jim, >>> >>> It is very possible/likely that correcting the number of file handles >>> linux allows a process to have will get nifi back on track. >>> >>> Thanks >>> Joe >>> >>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]> >>> wrote: >>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for >>> all your >>> > help. My game plan is as follows: >>> > 1- speak with the admin of my Linux box about executing all the sys >>> admin >>> > "best practice" changes >>> > 2- barring doing them all, at minimum increase max permitted open >>> files from >>> > 1024 to 50000 >>> > 3- reboot my Linux box, and then attempt to start NiFi >>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start >>> nifi, >>> > get in there, and eliminate that Python logging. Find another way to >>> log >>> > results to a system file, perhaps using a NiFi processor. >>> > >>> > - Jim >>> > >>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> >>> wrote: >>> >> >>> >> Hi Jim, >>> >> >>> >> Apologies for terse response earlier, was typing from phone. >>> >> >>> >> I am assuming you are on a Linux system. >>> >> >>> >> First and foremost, do checkout the Sys Admin guide [1]. In >>> particular, >>> >> scope out the best practices [2] for configuration which will have you >>> >> increase your open file handles. >>> >> >>> >> I do suspect that your hunches are correct, and while this will aid >>> and >>> >> maybe avoid the issue, getting those resources properly closed out >>> will be >>> >> the right thing to track down. >>> >> >>> >> Regardless of state, production or dev, there are certainly ways to >>> manage >>> >> this a bit more and work files through in an iterative manner. >>> >> >>> >> Please report back if these avenues don't solve your issues and we can >>> >> dive a little deeper if needed. >>> >> >>> >> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-g >>> uide.html >>> >> [2] >>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-g >>> uide.html#configuration-best-practices >>> >> >>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]> >>> >> wrote: >>> >>> >>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my >>> >>> content, flowfile, and provenance repositories on separate >>> independent disk >>> >>> devices. In my nifi.properties file, nifi.flowfile.repository.parti >>> tions >>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold >>> is 20000. >>> >>> Since I am currently in development and so this is not a production >>> process, >>> >>> I have set nifi.flowcontroller.autoResumeState to false. In >>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and >>> -Xmx4096m. >>> >>> >>> >>> In fact I have not yet applied the best practices from the Sys Admin >>> >>> Guide. I will speak with them about doing this today. I am a little >>> hesitant >>> >>> to just jump into making the seven system changes you detail. NiFi >>> does run >>> >>> on this box, but so do other processed that may be impacted. what's >>> good for >>> >>> NiFi may not be good for these other processes, and so I want to ask >>> first. >>> >>> >>> >>> My scripts employ a Python stream callback to grab values from select >>> >>> attributes, populate those into a Python dictionary object, generate >>> a json >>> >>> object from that dictionary object, and replace the flowfile >>> contents with >>> >>> that dictionary object. These scripts are called by ExecuteScript >>> >>> processors. Similar scripts are used at various points throughout my >>> >>> workflow, near the end of each branch. Those had been working >>> without any >>> >>> problems until I tried to introduce Python logging yesterday. I >>> suspect I am >>> >>> not releasing file handler resources and logger objects as flowfiles >>> flow >>> >>> through these ExecuteScript processors - maybe? I really am only >>> making >>> >>> educated guesses at this stage. My first objective today is to get >>> NiFi to >>> >>> come back up. >>> >>> >>> >>> Please tell me: while I am in a dev state right now, had I been in a >>> >>> production state what would have been the repercussions of deleting >>> in its >>> >>> entirety the flowfile_repository, which includes all its journal >>> files? >>> >>> >>> >>> Thanks very much in advance for your help. >>> >>> >>> >>> Jim >>> >>> >>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]> >>> >>> wrote: >>> >>>> >>> >>>> Hi Jim, >>> >>>> >>> >>>> In getting to the root cause, could you please provide information >>> on >>> >>>> your environment? Did you apply the best practices listed in the >>> System >>> >>>> Administrator's guide? Could you provide some details on what your >>> scripts >>> >>>> are doing? >>> >>>> >>> >>>> If the data is not of importance, removing the Flowfile Repo should >>> get >>> >>>> you going. You can additionally remove the content repo, but this >>> should be >>> >>>> cleaned up by the framework as no flowfiles will point to said >>> content. >>> >>>> >>> >>>> >>> >>>> Aldrin Piri >>> >>>> Sent from my mobile device. >>> >>>> >>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> >>> wrote: >>> >>>> >>> >>>> I noticed, too, that I have many partitions, partition-0 to >>> >>>> partition-255 to be exact. These all have journal files in them. So >>> I >>> >>>> suspect that the journal file I cited is not specifically the >>> problem in and >>> >>>> of itself, but instead is the point where the allowable open files >>> threshold >>> >>>> is reached. I'm wondering if I have to recover by deleting all these >>> >>>> partitions? -Jim >>> >>>> >>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon < >>> [email protected]> >>> >>>> wrote: >>> >>>>> >>> >>>>> While trying to use Python logging from two scripts I call via two >>> >>>>> independent ExecuteScript processors, I seem to have inadvertently >>> created a >>> >>>>> condition where I have too many files open. This is causing a >>> serious >>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it >>> fails. >>> >>>>> >>> >>>>> The log indicates that the flow controller cannot be started, and >>> it >>> >>>>> cites the cause as this: >>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow >>> Controller >>> >>>>> . >>> >>>>> . (many stack trace entries) >>> >>>>> . >>> >>>>> Caused by: java.nio.file.FileSystemException: >>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: >>> Too many >>> >>>>> files open >>> >>>>> >>> >>>>> In a situation like this, what is the best practice for recovery? >>> Is it >>> >>>>> permissible to simply delete this journal file? What are the >>> negative >>> >>>>> repercussions of doing that? >>> >>>>> >>> >>>>> I did already try deleting my provenance_repository, but that did >>> not >>> >>>>> allow nifi to restart. (NiFi did re-establish my >>> provenance_repository at >>> >>>>> restart). >>> >>>>> >>> >>>>> Thanks very much in advance for your help. -Jim >>> >>>> >>> >>>> >>> >>> >>> >> >>> > >>> >> >> >
