jim - definitely take the time to walk through the best practices guide. Some are more like "if you dont do this it will probably kill the process - practices".
On Tue, Mar 28, 2017 at 9:27 AM, James McMahon <[email protected]> wrote: > I have been able to bring Nifi UI back up with this change to the limit on > number of open files. Thank you all very much for your help and insights. > -Jim > > On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]> wrote: >> >> Thank you Aldrin. I do have AutoResumeState set to false currently. The >> start of my jetty server fails when it tries to start the flowfile >> controller. I can't bring the UI up at all. I'm hoping that the system parm >> changes allow me to restart NiFi without blowing away my >> flowfile_repository. I'll certainly let you know how that plays out. -Jim >> >> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> wrote: >>> >>> Jim, >>> >>> In terms of trying to ease NiFi at start up, you could also try setting >>> nifi.flowcontroller.autoResumeState to false in your nifi.properties. >>> Depending on how your flow and scripts are constructed, this may allow you >>> to piecewise alleviate any large queues/processing of files that could be >>> causing the issue at hand. You could additionally bypass the possible >>> troublesome script processors to cache this data to disk elsewhere as a stop >>> gap measure. >>> >>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote: >>>> >>>> Jim, >>>> >>>> It is very possible/likely that correcting the number of file handles >>>> linux allows a process to have will get nifi back on track. >>>> >>>> Thanks >>>> Joe >>>> >>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]> >>>> wrote: >>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for >>>> > all your >>>> > help. My game plan is as follows: >>>> > 1- speak with the admin of my Linux box about executing all the sys >>>> > admin >>>> > "best practice" changes >>>> > 2- barring doing them all, at minimum increase max permitted open >>>> > files from >>>> > 1024 to 50000 >>>> > 3- reboot my Linux box, and then attempt to start NiFi >>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start >>>> > nifi, >>>> > get in there, and eliminate that Python logging. Find another way to >>>> > log >>>> > results to a system file, perhaps using a NiFi processor. >>>> > >>>> > - Jim >>>> > >>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> >>>> > wrote: >>>> >> >>>> >> Hi Jim, >>>> >> >>>> >> Apologies for terse response earlier, was typing from phone. >>>> >> >>>> >> I am assuming you are on a Linux system. >>>> >> >>>> >> First and foremost, do checkout the Sys Admin guide [1]. In >>>> >> particular, >>>> >> scope out the best practices [2] for configuration which will have >>>> >> you >>>> >> increase your open file handles. >>>> >> >>>> >> I do suspect that your hunches are correct, and while this will aid >>>> >> and >>>> >> maybe avoid the issue, getting those resources properly closed out >>>> >> will be >>>> >> the right thing to track down. >>>> >> >>>> >> Regardless of state, production or dev, there are certainly ways to >>>> >> manage >>>> >> this a bit more and work files through in an iterative manner. >>>> >> >>>> >> Please report back if these avenues don't solve your issues and we >>>> >> can >>>> >> dive a little deeper if needed. >>>> >> >>>> >> [1] >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html >>>> >> [2] >>>> >> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices >>>> >> >>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]> >>>> >> wrote: >>>> >>> >>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have >>>> >>> my >>>> >>> content, flowfile, and provenance repositories on separate >>>> >>> independent disk >>>> >>> devices. In my nifi.properties file, >>>> >>> nifi.flowfile.repository.partitions >>>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold >>>> >>> is 20000. >>>> >>> Since I am currently in development and so this is not a production >>>> >>> process, >>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In >>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and >>>> >>> -Xmx4096m. >>>> >>> >>>> >>> In fact I have not yet applied the best practices from the Sys Admin >>>> >>> Guide. I will speak with them about doing this today. I am a little >>>> >>> hesitant >>>> >>> to just jump into making the seven system changes you detail. NiFi >>>> >>> does run >>>> >>> on this box, but so do other processed that may be impacted. what's >>>> >>> good for >>>> >>> NiFi may not be good for these other processes, and so I want to ask >>>> >>> first. >>>> >>> >>>> >>> My scripts employ a Python stream callback to grab values from >>>> >>> select >>>> >>> attributes, populate those into a Python dictionary object, generate >>>> >>> a json >>>> >>> object from that dictionary object, and replace the flowfile >>>> >>> contents with >>>> >>> that dictionary object. These scripts are called by ExecuteScript >>>> >>> processors. Similar scripts are used at various points throughout my >>>> >>> workflow, near the end of each branch. Those had been working >>>> >>> without any >>>> >>> problems until I tried to introduce Python logging yesterday. I >>>> >>> suspect I am >>>> >>> not releasing file handler resources and logger objects as flowfiles >>>> >>> flow >>>> >>> through these ExecuteScript processors - maybe? I really am only >>>> >>> making >>>> >>> educated guesses at this stage. My first objective today is to get >>>> >>> NiFi to >>>> >>> come back up. >>>> >>> >>>> >>> Please tell me: while I am in a dev state right now, had I been in a >>>> >>> production state what would have been the repercussions of deleting >>>> >>> in its >>>> >>> entirety the flowfile_repository, which includes all its journal >>>> >>> files? >>>> >>> >>>> >>> Thanks very much in advance for your help. >>>> >>> >>>> >>> Jim >>>> >>> >>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]> >>>> >>> wrote: >>>> >>>> >>>> >>>> Hi Jim, >>>> >>>> >>>> >>>> In getting to the root cause, could you please provide information >>>> >>>> on >>>> >>>> your environment? Did you apply the best practices listed in the >>>> >>>> System >>>> >>>> Administrator's guide? Could you provide some details on what your >>>> >>>> scripts >>>> >>>> are doing? >>>> >>>> >>>> >>>> If the data is not of importance, removing the Flowfile Repo should >>>> >>>> get >>>> >>>> you going. You can additionally remove the content repo, but this >>>> >>>> should be >>>> >>>> cleaned up by the framework as no flowfiles will point to said >>>> >>>> content. >>>> >>>> >>>> >>>> >>>> >>>> Aldrin Piri >>>> >>>> Sent from my mobile device. >>>> >>>> >>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> >>>> >>>> wrote: >>>> >>>> >>>> >>>> I noticed, too, that I have many partitions, partition-0 to >>>> >>>> partition-255 to be exact. These all have journal files in them. So >>>> >>>> I >>>> >>>> suspect that the journal file I cited is not specifically the >>>> >>>> problem in and >>>> >>>> of itself, but instead is the point where the allowable open files >>>> >>>> threshold >>>> >>>> is reached. I'm wondering if I have to recover by deleting all >>>> >>>> these >>>> >>>> partitions? -Jim >>>> >>>> >>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon >>>> >>>> <[email protected]> >>>> >>>> wrote: >>>> >>>>> >>>> >>>>> While trying to use Python logging from two scripts I call via two >>>> >>>>> independent ExecuteScript processors, I seem to have inadvertently >>>> >>>>> created a >>>> >>>>> condition where I have too many files open. This is causing a >>>> >>>>> serious >>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it >>>> >>>>> fails. >>>> >>>>> >>>> >>>>> The log indicates that the flow controller cannot be started, and >>>> >>>>> it >>>> >>>>> cites the cause as this: >>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow >>>> >>>>> Controller >>>> >>>>> . >>>> >>>>> . (many stack trace entries) >>>> >>>>> . >>>> >>>>> Caused by: java.nio.file.FileSystemException: >>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too >>>> >>>>> many >>>> >>>>> files open >>>> >>>>> >>>> >>>>> In a situation like this, what is the best practice for recovery? >>>> >>>>> Is it >>>> >>>>> permissible to simply delete this journal file? What are the >>>> >>>>> negative >>>> >>>>> repercussions of doing that? >>>> >>>>> >>>> >>>>> I did already try deleting my provenance_repository, but that did >>>> >>>>> not >>>> >>>>> allow nifi to restart. (NiFi did re-establish my >>>> >>>>> provenance_repository at >>>> >>>>> restart). >>>> >>>>> >>>> >>>>> Thanks very much in advance for your help. -Jim >>>> >>>> >>>> >>>> >>>> >>> >>>> >> >>>> > >>> >>> >> >
