*yikes*! Message received. We have now done all but one of them. That one being Set How long Sockets Stay in a TIMED_WAIT State When Closed. According to our system administrator we are unable to do this because - and I'm paraphrasing here - we do not have all the necessary components or libraries installed.
We did do all the others. Every one. The lone one we did not do does not sound like a showstopper. Please do let me know if you disagree. -Jim On Tue, Mar 28, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote: > jim - definitely take the time to walk through the best practices > guide. Some are more like "if you dont do this it will probably kill > the process - practices". > > On Tue, Mar 28, 2017 at 9:27 AM, James McMahon <[email protected]> > wrote: > > I have been able to bring Nifi UI back up with this change to the limit > on > > number of open files. Thank you all very much for your help and insights. > > -Jim > > > > On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]> > wrote: > >> > >> Thank you Aldrin. I do have AutoResumeState set to false currently. The > >> start of my jetty server fails when it tries to start the flowfile > >> controller. I can't bring the UI up at all. I'm hoping that the system > parm > >> changes allow me to restart NiFi without blowing away my > >> flowfile_repository. I'll certainly let you know how that plays out. > -Jim > >> > >> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> > wrote: > >>> > >>> Jim, > >>> > >>> In terms of trying to ease NiFi at start up, you could also try setting > >>> nifi.flowcontroller.autoResumeState to false in your nifi.properties. > >>> Depending on how your flow and scripts are constructed, this may allow > you > >>> to piecewise alleviate any large queues/processing of files that could > be > >>> causing the issue at hand. You could additionally bypass the possible > >>> troublesome script processors to cache this data to disk elsewhere as > a stop > >>> gap measure. > >>> > >>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote: > >>>> > >>>> Jim, > >>>> > >>>> It is very possible/likely that correcting the number of file handles > >>>> linux allows a process to have will get nifi back on track. > >>>> > >>>> Thanks > >>>> Joe > >>>> > >>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]> > >>>> wrote: > >>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for > >>>> > all your > >>>> > help. My game plan is as follows: > >>>> > 1- speak with the admin of my Linux box about executing all the sys > >>>> > admin > >>>> > "best practice" changes > >>>> > 2- barring doing them all, at minimum increase max permitted open > >>>> > files from > >>>> > 1024 to 50000 > >>>> > 3- reboot my Linux box, and then attempt to start NiFi > >>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. > Start > >>>> > nifi, > >>>> > get in there, and eliminate that Python logging. Find another way to > >>>> > log > >>>> > results to a system file, perhaps using a NiFi processor. > >>>> > > >>>> > - Jim > >>>> > > >>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> > >>>> > wrote: > >>>> >> > >>>> >> Hi Jim, > >>>> >> > >>>> >> Apologies for terse response earlier, was typing from phone. > >>>> >> > >>>> >> I am assuming you are on a Linux system. > >>>> >> > >>>> >> First and foremost, do checkout the Sys Admin guide [1]. In > >>>> >> particular, > >>>> >> scope out the best practices [2] for configuration which will have > >>>> >> you > >>>> >> increase your open file handles. > >>>> >> > >>>> >> I do suspect that your hunches are correct, and while this will aid > >>>> >> and > >>>> >> maybe avoid the issue, getting those resources properly closed out > >>>> >> will be > >>>> >> the right thing to track down. > >>>> >> > >>>> >> Regardless of state, production or dev, there are certainly ways to > >>>> >> manage > >>>> >> this a bit more and work files through in an iterative manner. > >>>> >> > >>>> >> Please report back if these avenues don't solve your issues and we > >>>> >> can > >>>> >> dive a little deeper if needed. > >>>> >> > >>>> >> [1] > >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration- > guide.html > >>>> >> [2] > >>>> >> > >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration- > guide.html#configuration-best-practices > >>>> >> > >>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon < > [email protected]> > >>>> >> wrote: > >>>> >>> > >>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have > >>>> >>> my > >>>> >>> content, flowfile, and provenance repositories on separate > >>>> >>> independent disk > >>>> >>> devices. In my nifi.properties file, > >>>> >>> nifi.flowfile.repository.partitions > >>>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold > >>>> >>> is 20000. > >>>> >>> Since I am currently in development and so this is not a > production > >>>> >>> process, > >>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In > >>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and > >>>> >>> -Xmx4096m. > >>>> >>> > >>>> >>> In fact I have not yet applied the best practices from the Sys > Admin > >>>> >>> Guide. I will speak with them about doing this today. I am a > little > >>>> >>> hesitant > >>>> >>> to just jump into making the seven system changes you detail. NiFi > >>>> >>> does run > >>>> >>> on this box, but so do other processed that may be impacted. > what's > >>>> >>> good for > >>>> >>> NiFi may not be good for these other processes, and so I want to > ask > >>>> >>> first. > >>>> >>> > >>>> >>> My scripts employ a Python stream callback to grab values from > >>>> >>> select > >>>> >>> attributes, populate those into a Python dictionary object, > generate > >>>> >>> a json > >>>> >>> object from that dictionary object, and replace the flowfile > >>>> >>> contents with > >>>> >>> that dictionary object. These scripts are called by ExecuteScript > >>>> >>> processors. Similar scripts are used at various points throughout > my > >>>> >>> workflow, near the end of each branch. Those had been working > >>>> >>> without any > >>>> >>> problems until I tried to introduce Python logging yesterday. I > >>>> >>> suspect I am > >>>> >>> not releasing file handler resources and logger objects as > flowfiles > >>>> >>> flow > >>>> >>> through these ExecuteScript processors - maybe? I really am only > >>>> >>> making > >>>> >>> educated guesses at this stage. My first objective today is to get > >>>> >>> NiFi to > >>>> >>> come back up. > >>>> >>> > >>>> >>> Please tell me: while I am in a dev state right now, had I been > in a > >>>> >>> production state what would have been the repercussions of > deleting > >>>> >>> in its > >>>> >>> entirety the flowfile_repository, which includes all its journal > >>>> >>> files? > >>>> >>> > >>>> >>> Thanks very much in advance for your help. > >>>> >>> > >>>> >>> Jim > >>>> >>> > >>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri < > [email protected]> > >>>> >>> wrote: > >>>> >>>> > >>>> >>>> Hi Jim, > >>>> >>>> > >>>> >>>> In getting to the root cause, could you please provide > information > >>>> >>>> on > >>>> >>>> your environment? Did you apply the best practices listed in the > >>>> >>>> System > >>>> >>>> Administrator's guide? Could you provide some details on what > your > >>>> >>>> scripts > >>>> >>>> are doing? > >>>> >>>> > >>>> >>>> If the data is not of importance, removing the Flowfile Repo > should > >>>> >>>> get > >>>> >>>> you going. You can additionally remove the content repo, but this > >>>> >>>> should be > >>>> >>>> cleaned up by the framework as no flowfiles will point to said > >>>> >>>> content. > >>>> >>>> > >>>> >>>> > >>>> >>>> Aldrin Piri > >>>> >>>> Sent from my mobile device. > >>>> >>>> > >>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> > >>>> >>>> wrote: > >>>> >>>> > >>>> >>>> I noticed, too, that I have many partitions, partition-0 to > >>>> >>>> partition-255 to be exact. These all have journal files in them. > So > >>>> >>>> I > >>>> >>>> suspect that the journal file I cited is not specifically the > >>>> >>>> problem in and > >>>> >>>> of itself, but instead is the point where the allowable open > files > >>>> >>>> threshold > >>>> >>>> is reached. I'm wondering if I have to recover by deleting all > >>>> >>>> these > >>>> >>>> partitions? -Jim > >>>> >>>> > >>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon > >>>> >>>> <[email protected]> > >>>> >>>> wrote: > >>>> >>>>> > >>>> >>>>> While trying to use Python logging from two scripts I call via > two > >>>> >>>>> independent ExecuteScript processors, I seem to have > inadvertently > >>>> >>>>> created a > >>>> >>>>> condition where I have too many files open. This is causing a > >>>> >>>>> serious > >>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) > it > >>>> >>>>> fails. > >>>> >>>>> > >>>> >>>>> The log indicates that the flow controller cannot be started, > and > >>>> >>>>> it > >>>> >>>>> cites the cause as this: > >>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow > >>>> >>>>> Controller > >>>> >>>>> . > >>>> >>>>> . (many stack trace entries) > >>>> >>>>> . > >>>> >>>>> Caused by: java.nio.file.FileSystemException: > >>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: > Too > >>>> >>>>> many > >>>> >>>>> files open > >>>> >>>>> > >>>> >>>>> In a situation like this, what is the best practice for > recovery? > >>>> >>>>> Is it > >>>> >>>>> permissible to simply delete this journal file? What are the > >>>> >>>>> negative > >>>> >>>>> repercussions of doing that? > >>>> >>>>> > >>>> >>>>> I did already try deleting my provenance_repository, but that > did > >>>> >>>>> not > >>>> >>>>> allow nifi to restart. (NiFi did re-establish my > >>>> >>>>> provenance_repository at > >>>> >>>>> restart). > >>>> >>>>> > >>>> >>>>> Thanks very much in advance for your help. -Jim > >>>> >>>> > >>>> >>>> > >>>> >>> > >>>> >> > >>>> > > >>> > >>> > >> > > >
