No apology necessary Aldrin. I'm much obliged to you and to Joe for all your help. My game plan is as follows: 1- speak with the admin of my Linux box about executing all the sys admin "best practice" changes 2- barring doing them all, at minimum increase max permitted open files from 1024 to 50000 3- reboot my Linux box, and then attempt to start NiFi 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start nifi, get in there, and eliminate that Python logging. Find another way to log results to a system file, perhaps using a NiFi processor.
- Jim On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> wrote: > Hi Jim, > > Apologies for terse response earlier, was typing from phone. > > I am assuming you are on a Linux system. > > First and foremost, do checkout the Sys Admin guide [1]. In particular, > scope out the best practices [2] for configuration which will have you > increase your open file handles. > > I do suspect that your hunches are correct, and while this will aid and > maybe avoid the issue, getting those resources properly closed out will be > the right thing to track down. > > Regardless of state, production or dev, there are certainly ways to manage > this a bit more and work files through in an iterative manner. > > Please report back if these avenues don't solve your issues and we can > dive a little deeper if needed. > > [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html > [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html# > configuration-best-practices > > On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]> > wrote: > >> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my >> content, flowfile, and provenance repositories on separate independent disk >> devices. In my nifi.properties file, nifi.flowfile.repository.partitions >> equals 256, and always.sync is false. My nifi.queue.swap.threshold is >> 20000. Since I am currently in development and so this is not a production >> process, I have set nifi.flowcontroller.autoResumeState to false. >> In conf/bootstrap.conf, my JVM memory settings are -Xms1024m and -Xmx4096m. >> >> In fact I have not yet applied the best practices from the Sys Admin >> Guide. I will speak with them about doing this today. I am a little >> hesitant to just jump into making the seven system changes you detail. NiFi >> does run on this box, but so do other processed that may be impacted. >> what's good for NiFi may not be good for these other processes, and so I >> want to ask first. >> >> My scripts employ a Python stream callback to grab values from select >> attributes, populate those into a Python dictionary object, generate a json >> object from that dictionary object, and replace the flowfile contents with >> that dictionary object. These scripts are called by ExecuteScript >> processors. Similar scripts are used at various points throughout my >> workflow, near the end of each branch. Those had been working without any >> problems until I tried to introduce Python logging yesterday. I suspect I >> am not releasing file handler resources and logger objects as flowfiles >> flow through these ExecuteScript processors - maybe? I really am only >> making educated guesses at this stage. My first objective today is to get >> NiFi to come back up. >> >> Please tell me: while I am in a dev state right now, had I been in a >> production state what would have been the repercussions of deleting in its >> entirety the flowfile_repository, which includes all its journal files? >> >> Thanks very much in advance for your help. >> >> Jim >> >> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]> >> wrote: >> >>> Hi Jim, >>> >>> In getting to the root cause, could you please provide information on >>> your environment? Did you apply the best practices listed in the System >>> Administrator's guide? Could you provide some details on what your scripts >>> are doing? >>> >>> If the data is not of importance, removing the Flowfile Repo should get >>> you going. You can additionally remove the content repo, but this should be >>> cleaned up by the framework as no flowfiles will point to said content. >>> >>> >>> Aldrin Piri >>> Sent from my mobile device. >>> >>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> wrote: >>> >>> I noticed, too, that I have many partitions, partition-0 to >>> partition-255 to be exact. These all have journal files in them. So I >>> suspect that the journal file I cited is not specifically the problem in >>> and of itself, but instead is the point where the allowable open files >>> threshold is reached. I'm wondering if I have to recover by deleting all >>> these partitions? -Jim >>> >>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <[email protected]> >>> wrote: >>> >>>> While trying to use Python logging from two scripts I call via two >>>> independent ExecuteScript processors, I seem to have inadvertently created >>>> a condition where I have too many files open. This is causing a serious >>>> challenge for me, because when I attempt to start nifi (v0.7.1) it fails. >>>> >>>> The log indicates that the flow controller cannot be started, and it >>>> cites the cause as this: >>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller >>>> . >>>> . (many stack trace entries) >>>> . >>>> Caused by: java.nio.file.FileSystemException: >>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too >>>> many files open >>>> >>>> In a situation like this, what is the best practice for recovery? Is it >>>> permissible to simply delete this journal file? What are the negative >>>> repercussions of doing that? >>>> >>>> I did already try deleting my provenance_repository, but that did not >>>> allow nifi to restart. (NiFi did re-establish my provenance_repository at >>>> restart). >>>> >>>> Thanks very much in advance for your help. -Jim >>>> >>> >>> >> >
