I do agree. Glad you're back on track.
On Tue, Mar 28, 2017 at 9:44 AM, James McMahon <[email protected]> wrote: > *yikes*! Message received. We have now done all but one of them. That one > being Set How long Sockets Stay in a TIMED_WAIT State When Closed. According > to our system administrator we are unable to do this because - and I'm > paraphrasing here - we do not have all the necessary components or libraries > installed. > > We did do all the others. Every one. The lone one we did not do does not > sound like a showstopper. Please do let me know if you disagree. -Jim > > On Tue, Mar 28, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote: >> >> jim - definitely take the time to walk through the best practices >> guide. Some are more like "if you dont do this it will probably kill >> the process - practices". >> >> On Tue, Mar 28, 2017 at 9:27 AM, James McMahon <[email protected]> >> wrote: >> > I have been able to bring Nifi UI back up with this change to the limit >> > on >> > number of open files. Thank you all very much for your help and >> > insights. >> > -Jim >> > >> > On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]> >> > wrote: >> >> >> >> Thank you Aldrin. I do have AutoResumeState set to false currently. The >> >> start of my jetty server fails when it tries to start the flowfile >> >> controller. I can't bring the UI up at all. I'm hoping that the system >> >> parm >> >> changes allow me to restart NiFi without blowing away my >> >> flowfile_repository. I'll certainly let you know how that plays out. >> >> -Jim >> >> >> >> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> >> >> wrote: >> >>> >> >>> Jim, >> >>> >> >>> In terms of trying to ease NiFi at start up, you could also try >> >>> setting >> >>> nifi.flowcontroller.autoResumeState to false in your nifi.properties. >> >>> Depending on how your flow and scripts are constructed, this may allow >> >>> you >> >>> to piecewise alleviate any large queues/processing of files that could >> >>> be >> >>> causing the issue at hand. You could additionally bypass the possible >> >>> troublesome script processors to cache this data to disk elsewhere as >> >>> a stop >> >>> gap measure. >> >>> >> >>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote: >> >>>> >> >>>> Jim, >> >>>> >> >>>> It is very possible/likely that correcting the number of file handles >> >>>> linux allows a process to have will get nifi back on track. >> >>>> >> >>>> Thanks >> >>>> Joe >> >>>> >> >>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]> >> >>>> wrote: >> >>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for >> >>>> > all your >> >>>> > help. My game plan is as follows: >> >>>> > 1- speak with the admin of my Linux box about executing all the sys >> >>>> > admin >> >>>> > "best practice" changes >> >>>> > 2- barring doing them all, at minimum increase max permitted open >> >>>> > files from >> >>>> > 1024 to 50000 >> >>>> > 3- reboot my Linux box, and then attempt to start NiFi >> >>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. >> >>>> > Start >> >>>> > nifi, >> >>>> > get in there, and eliminate that Python logging. Find another way >> >>>> > to >> >>>> > log >> >>>> > results to a system file, perhaps using a NiFi processor. >> >>>> > >> >>>> > - Jim >> >>>> > >> >>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> >> >>>> > wrote: >> >>>> >> >> >>>> >> Hi Jim, >> >>>> >> >> >>>> >> Apologies for terse response earlier, was typing from phone. >> >>>> >> >> >>>> >> I am assuming you are on a Linux system. >> >>>> >> >> >>>> >> First and foremost, do checkout the Sys Admin guide [1]. In >> >>>> >> particular, >> >>>> >> scope out the best practices [2] for configuration which will have >> >>>> >> you >> >>>> >> increase your open file handles. >> >>>> >> >> >>>> >> I do suspect that your hunches are correct, and while this will >> >>>> >> aid >> >>>> >> and >> >>>> >> maybe avoid the issue, getting those resources properly closed out >> >>>> >> will be >> >>>> >> the right thing to track down. >> >>>> >> >> >>>> >> Regardless of state, production or dev, there are certainly ways >> >>>> >> to >> >>>> >> manage >> >>>> >> this a bit more and work files through in an iterative manner. >> >>>> >> >> >>>> >> Please report back if these avenues don't solve your issues and we >> >>>> >> can >> >>>> >> dive a little deeper if needed. >> >>>> >> >> >>>> >> [1] >> >>>> >> >> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html >> >>>> >> [2] >> >>>> >> >> >>>> >> >> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices >> >>>> >> >> >>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon >> >>>> >> <[email protected]> >> >>>> >> wrote: >> >>>> >>> >> >>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I >> >>>> >>> have >> >>>> >>> my >> >>>> >>> content, flowfile, and provenance repositories on separate >> >>>> >>> independent disk >> >>>> >>> devices. In my nifi.properties file, >> >>>> >>> nifi.flowfile.repository.partitions >> >>>> >>> equals 256, and always.sync is false. My >> >>>> >>> nifi.queue.swap.threshold >> >>>> >>> is 20000. >> >>>> >>> Since I am currently in development and so this is not a >> >>>> >>> production >> >>>> >>> process, >> >>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In >> >>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and >> >>>> >>> -Xmx4096m. >> >>>> >>> >> >>>> >>> In fact I have not yet applied the best practices from the Sys >> >>>> >>> Admin >> >>>> >>> Guide. I will speak with them about doing this today. I am a >> >>>> >>> little >> >>>> >>> hesitant >> >>>> >>> to just jump into making the seven system changes you detail. >> >>>> >>> NiFi >> >>>> >>> does run >> >>>> >>> on this box, but so do other processed that may be impacted. >> >>>> >>> what's >> >>>> >>> good for >> >>>> >>> NiFi may not be good for these other processes, and so I want to >> >>>> >>> ask >> >>>> >>> first. >> >>>> >>> >> >>>> >>> My scripts employ a Python stream callback to grab values from >> >>>> >>> select >> >>>> >>> attributes, populate those into a Python dictionary object, >> >>>> >>> generate >> >>>> >>> a json >> >>>> >>> object from that dictionary object, and replace the flowfile >> >>>> >>> contents with >> >>>> >>> that dictionary object. These scripts are called by ExecuteScript >> >>>> >>> processors. Similar scripts are used at various points throughout >> >>>> >>> my >> >>>> >>> workflow, near the end of each branch. Those had been working >> >>>> >>> without any >> >>>> >>> problems until I tried to introduce Python logging yesterday. I >> >>>> >>> suspect I am >> >>>> >>> not releasing file handler resources and logger objects as >> >>>> >>> flowfiles >> >>>> >>> flow >> >>>> >>> through these ExecuteScript processors - maybe? I really am only >> >>>> >>> making >> >>>> >>> educated guesses at this stage. My first objective today is to >> >>>> >>> get >> >>>> >>> NiFi to >> >>>> >>> come back up. >> >>>> >>> >> >>>> >>> Please tell me: while I am in a dev state right now, had I been >> >>>> >>> in a >> >>>> >>> production state what would have been the repercussions of >> >>>> >>> deleting >> >>>> >>> in its >> >>>> >>> entirety the flowfile_repository, which includes all its journal >> >>>> >>> files? >> >>>> >>> >> >>>> >>> Thanks very much in advance for your help. >> >>>> >>> >> >>>> >>> Jim >> >>>> >>> >> >>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri >> >>>> >>> <[email protected]> >> >>>> >>> wrote: >> >>>> >>>> >> >>>> >>>> Hi Jim, >> >>>> >>>> >> >>>> >>>> In getting to the root cause, could you please provide >> >>>> >>>> information >> >>>> >>>> on >> >>>> >>>> your environment? Did you apply the best practices listed in >> >>>> >>>> the >> >>>> >>>> System >> >>>> >>>> Administrator's guide? Could you provide some details on what >> >>>> >>>> your >> >>>> >>>> scripts >> >>>> >>>> are doing? >> >>>> >>>> >> >>>> >>>> If the data is not of importance, removing the Flowfile Repo >> >>>> >>>> should >> >>>> >>>> get >> >>>> >>>> you going. You can additionally remove the content repo, but >> >>>> >>>> this >> >>>> >>>> should be >> >>>> >>>> cleaned up by the framework as no flowfiles will point to said >> >>>> >>>> content. >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> Aldrin Piri >> >>>> >>>> Sent from my mobile device. >> >>>> >>>> >> >>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> >> >>>> >>>> wrote: >> >>>> >>>> >> >>>> >>>> I noticed, too, that I have many partitions, partition-0 to >> >>>> >>>> partition-255 to be exact. These all have journal files in them. >> >>>> >>>> So >> >>>> >>>> I >> >>>> >>>> suspect that the journal file I cited is not specifically the >> >>>> >>>> problem in and >> >>>> >>>> of itself, but instead is the point where the allowable open >> >>>> >>>> files >> >>>> >>>> threshold >> >>>> >>>> is reached. I'm wondering if I have to recover by deleting all >> >>>> >>>> these >> >>>> >>>> partitions? -Jim >> >>>> >>>> >> >>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon >> >>>> >>>> <[email protected]> >> >>>> >>>> wrote: >> >>>> >>>>> >> >>>> >>>>> While trying to use Python logging from two scripts I call via >> >>>> >>>>> two >> >>>> >>>>> independent ExecuteScript processors, I seem to have >> >>>> >>>>> inadvertently >> >>>> >>>>> created a >> >>>> >>>>> condition where I have too many files open. This is causing a >> >>>> >>>>> serious >> >>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) >> >>>> >>>>> it >> >>>> >>>>> fails. >> >>>> >>>>> >> >>>> >>>>> The log indicates that the flow controller cannot be started, >> >>>> >>>>> and >> >>>> >>>>> it >> >>>> >>>>> cites the cause as this: >> >>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow >> >>>> >>>>> Controller >> >>>> >>>>> . >> >>>> >>>>> . (many stack trace entries) >> >>>> >>>>> . >> >>>> >>>>> Caused by: java.nio.file.FileSystemException: >> >>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: >> >>>> >>>>> Too >> >>>> >>>>> many >> >>>> >>>>> files open >> >>>> >>>>> >> >>>> >>>>> In a situation like this, what is the best practice for >> >>>> >>>>> recovery? >> >>>> >>>>> Is it >> >>>> >>>>> permissible to simply delete this journal file? What are the >> >>>> >>>>> negative >> >>>> >>>>> repercussions of doing that? >> >>>> >>>>> >> >>>> >>>>> I did already try deleting my provenance_repository, but that >> >>>> >>>>> did >> >>>> >>>>> not >> >>>> >>>>> allow nifi to restart. (NiFi did re-establish my >> >>>> >>>>> provenance_repository at >> >>>> >>>>> restart). >> >>>> >>>>> >> >>>> >>>>> Thanks very much in advance for your help. -Jim >> >>>> >>>> >> >>>> >>>> >> >>>> >>> >> >>>> >> >> >>>> > >> >>> >> >>> >> >> >> > > >
