Thank you Joe. The command ulimit -a tells me that my open files are limited to 1024. So I am not currently allowing more than that as you indicated. Aldrin referred me to the sys admin "best practices", which seems to call for 50000. So I need to get with the sys administrator of this box this morning and address this deficiency.
If I crank that up to 50000 and then restart my box, is it possible I will then be able to start NiFi *without *having to blow away my flowfile repository? Because if that is the case, then that is the much preferred path I will follow. -Jim On Tue, Mar 28, 2017 at 7:50 AM, Joe Witt <[email protected]> wrote: > It would mean lost data. It should not he necessary. > > As far as system config changes and specifically open file handles this > one is very important. Run 'ulimit -a' and see what it says for open file > handles. It must be larger than 1024. > > On Mar 28, 2017 7:46 AM, "James McMahon" <[email protected]> wrote: > > Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my > content, flowfile, and provenance repositories on separate independent disk > devices. In my nifi.properties file, nifi.flowfile.repository.partitions > equals 256, and always.sync is false. My nifi.queue.swap.threshold is > 20000. Since I am currently in development and so this is not a production > process, I have set nifi.flowcontroller.autoResumeState to false. > In conf/bootstrap.conf, my JVM memory settings are -Xms1024m and -Xmx4096m. > > In fact I have not yet applied the best practices from the Sys Admin > Guide. I will speak with them about doing this today. I am a little > hesitant to just jump into making the seven system changes you detail. NiFi > does run on this box, but so do other processed that may be impacted. > what's good for NiFi may not be good for these other processes, and so I > want to ask first. > > My scripts employ a Python stream callback to grab values from select > attributes, populate those into a Python dictionary object, generate a json > object from that dictionary object, and replace the flowfile contents with > that dictionary object. These scripts are called by ExecuteScript > processors. Similar scripts are used at various points throughout my > workflow, near the end of each branch. Those had been working without any > problems until I tried to introduce Python logging yesterday. I suspect I > am not releasing file handler resources and logger objects as flowfiles > flow through these ExecuteScript processors - maybe? I really am only > making educated guesses at this stage. My first objective today is to get > NiFi to come back up. > > Please tell me: while I am in a dev state right now, had I been in a > production state what would have been the repercussions of deleting in its > entirety the flowfile_repository, which includes all its journal files? > > Thanks very much in advance for your help. > > Jim > > On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]> wrote: > >> Hi Jim, >> >> In getting to the root cause, could you please provide information on >> your environment? Did you apply the best practices listed in the System >> Administrator's guide? Could you provide some details on what your scripts >> are doing? >> >> If the data is not of importance, removing the Flowfile Repo should get >> you going. You can additionally remove the content repo, but this should be >> cleaned up by the framework as no flowfiles will point to said content. >> >> >> Aldrin Piri >> Sent from my mobile device. >> >> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> wrote: >> >> I noticed, too, that I have many partitions, partition-0 to partition-255 >> to be exact. These all have journal files in them. So I suspect that the >> journal file I cited is not specifically the problem in and of itself, but >> instead is the point where the allowable open files threshold is reached. >> I'm wondering if I have to recover by deleting all these partitions? -Jim >> >> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <[email protected]> >> wrote: >> >>> While trying to use Python logging from two scripts I call via two >>> independent ExecuteScript processors, I seem to have inadvertently created >>> a condition where I have too many files open. This is causing a serious >>> challenge for me, because when I attempt to start nifi (v0.7.1) it fails. >>> >>> The log indicates that the flow controller cannot be started, and it >>> cites the cause as this: >>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller >>> . >>> . (many stack trace entries) >>> . >>> Caused by: java.nio.file.FileSystemException: >>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too many >>> files open >>> >>> In a situation like this, what is the best practice for recovery? Is it >>> permissible to simply delete this journal file? What are the negative >>> repercussions of doing that? >>> >>> I did already try deleting my provenance_repository, but that did not >>> allow nifi to restart. (NiFi did re-establish my provenance_repository at >>> restart). >>> >>> Thanks very much in advance for your help. -Jim >>> >> >> > >
