Thank you Aldrin. I do have AutoResumeState set to false currently. The start of my jetty server fails when it tries to start the flowfile controller. I can't bring the UI up at all. I'm hoping that the system parm changes allow me to restart NiFi without blowing away my flowfile_repository. I'll certainly let you know how that plays out. -Jim
On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> wrote: > Jim, > > In terms of trying to ease NiFi at start up, you could also try > setting nifi.flowcontroller.autoResumeState to false in your > nifi.properties. Depending on how your flow and scripts are constructed, > this may allow you to piecewise alleviate any large queues/processing of > files that could be causing the issue at hand. You could additionally > bypass the possible troublesome script processors to cache this data to > disk elsewhere as a stop gap measure. > > On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote: > >> Jim, >> >> It is very possible/likely that correcting the number of file handles >> linux allows a process to have will get nifi back on track. >> >> Thanks >> Joe >> >> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]> >> wrote: >> > No apology necessary Aldrin. I'm much obliged to you and to Joe for all >> your >> > help. My game plan is as follows: >> > 1- speak with the admin of my Linux box about executing all the sys >> admin >> > "best practice" changes >> > 2- barring doing them all, at minimum increase max permitted open files >> from >> > 1024 to 50000 >> > 3- reboot my Linux box, and then attempt to start NiFi >> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start >> nifi, >> > get in there, and eliminate that Python logging. Find another way to log >> > results to a system file, perhaps using a NiFi processor. >> > >> > - Jim >> > >> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> >> wrote: >> >> >> >> Hi Jim, >> >> >> >> Apologies for terse response earlier, was typing from phone. >> >> >> >> I am assuming you are on a Linux system. >> >> >> >> First and foremost, do checkout the Sys Admin guide [1]. In particular, >> >> scope out the best practices [2] for configuration which will have you >> >> increase your open file handles. >> >> >> >> I do suspect that your hunches are correct, and while this will aid and >> >> maybe avoid the issue, getting those resources properly closed out >> will be >> >> the right thing to track down. >> >> >> >> Regardless of state, production or dev, there are certainly ways to >> manage >> >> this a bit more and work files through in an iterative manner. >> >> >> >> Please report back if these avenues don't solve your issues and we can >> >> dive a little deeper if needed. >> >> >> >> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-g >> uide.html >> >> [2] >> >> https://nifi.apache.org/docs/nifi-docs/html/administration-g >> uide.html#configuration-best-practices >> >> >> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]> >> >> wrote: >> >>> >> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my >> >>> content, flowfile, and provenance repositories on separate >> independent disk >> >>> devices. In my nifi.properties file, nifi.flowfile.repository.parti >> tions >> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold is >> 20000. >> >>> Since I am currently in development and so this is not a production >> process, >> >>> I have set nifi.flowcontroller.autoResumeState to false. In >> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and >> -Xmx4096m. >> >>> >> >>> In fact I have not yet applied the best practices from the Sys Admin >> >>> Guide. I will speak with them about doing this today. I am a little >> hesitant >> >>> to just jump into making the seven system changes you detail. NiFi >> does run >> >>> on this box, but so do other processed that may be impacted. what's >> good for >> >>> NiFi may not be good for these other processes, and so I want to ask >> first. >> >>> >> >>> My scripts employ a Python stream callback to grab values from select >> >>> attributes, populate those into a Python dictionary object, generate >> a json >> >>> object from that dictionary object, and replace the flowfile contents >> with >> >>> that dictionary object. These scripts are called by ExecuteScript >> >>> processors. Similar scripts are used at various points throughout my >> >>> workflow, near the end of each branch. Those had been working without >> any >> >>> problems until I tried to introduce Python logging yesterday. I >> suspect I am >> >>> not releasing file handler resources and logger objects as flowfiles >> flow >> >>> through these ExecuteScript processors - maybe? I really am only >> making >> >>> educated guesses at this stage. My first objective today is to get >> NiFi to >> >>> come back up. >> >>> >> >>> Please tell me: while I am in a dev state right now, had I been in a >> >>> production state what would have been the repercussions of deleting >> in its >> >>> entirety the flowfile_repository, which includes all its journal >> files? >> >>> >> >>> Thanks very much in advance for your help. >> >>> >> >>> Jim >> >>> >> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]> >> >>> wrote: >> >>>> >> >>>> Hi Jim, >> >>>> >> >>>> In getting to the root cause, could you please provide information on >> >>>> your environment? Did you apply the best practices listed in the >> System >> >>>> Administrator's guide? Could you provide some details on what your >> scripts >> >>>> are doing? >> >>>> >> >>>> If the data is not of importance, removing the Flowfile Repo should >> get >> >>>> you going. You can additionally remove the content repo, but this >> should be >> >>>> cleaned up by the framework as no flowfiles will point to said >> content. >> >>>> >> >>>> >> >>>> Aldrin Piri >> >>>> Sent from my mobile device. >> >>>> >> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> >> wrote: >> >>>> >> >>>> I noticed, too, that I have many partitions, partition-0 to >> >>>> partition-255 to be exact. These all have journal files in them. So I >> >>>> suspect that the journal file I cited is not specifically the >> problem in and >> >>>> of itself, but instead is the point where the allowable open files >> threshold >> >>>> is reached. I'm wondering if I have to recover by deleting all these >> >>>> partitions? -Jim >> >>>> >> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <[email protected] >> > >> >>>> wrote: >> >>>>> >> >>>>> While trying to use Python logging from two scripts I call via two >> >>>>> independent ExecuteScript processors, I seem to have inadvertently >> created a >> >>>>> condition where I have too many files open. This is causing a >> serious >> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it >> fails. >> >>>>> >> >>>>> The log indicates that the flow controller cannot be started, and it >> >>>>> cites the cause as this: >> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow >> Controller >> >>>>> . >> >>>>> . (many stack trace entries) >> >>>>> . >> >>>>> Caused by: java.nio.file.FileSystemException: >> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too >> many >> >>>>> files open >> >>>>> >> >>>>> In a situation like this, what is the best practice for recovery? >> Is it >> >>>>> permissible to simply delete this journal file? What are the >> negative >> >>>>> repercussions of doing that? >> >>>>> >> >>>>> I did already try deleting my provenance_repository, but that did >> not >> >>>>> allow nifi to restart. (NiFi did re-establish my >> provenance_repository at >> >>>>> restart). >> >>>>> >> >>>>> Thanks very much in advance for your help. -Jim >> >>>> >> >>>> >> >>> >> >> >> > >> > >
