I have been able to bring Nifi UI back up with this change to the limit on
number of open files. Thank you all very much for your help and insights.
-Jim

On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]> wrote:

> Thank you Aldrin. I do have AutoResumeState set to false currently. The
> start of my jetty server fails when it tries to start the flowfile
> controller. I can't bring the UI up at all. I'm hoping that the system parm
> changes allow me to restart NiFi without blowing away my
> flowfile_repository. I'll certainly let you know how that plays out. -Jim
>
> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> wrote:
>
>> Jim,
>>
>> In terms of trying to ease NiFi at start up, you could also try setting
>> nifi.flowcontroller.autoResumeState to false in your nifi.properties.
>> Depending on how your flow and scripts are constructed, this may allow you
>> to piecewise alleviate any large queues/processing of files that could be
>> causing the issue at hand.  You could additionally bypass the possible
>> troublesome script processors to cache this data to disk elsewhere as a
>> stop gap measure.
>>
>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote:
>>
>>> Jim,
>>>
>>> It is very possible/likely that correcting the number of file handles
>>> linux allows a process to have will get nifi back on track.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]>
>>> wrote:
>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for
>>> all your
>>> > help. My game plan is as follows:
>>> > 1- speak with the admin of my Linux box about executing all the sys
>>> admin
>>> > "best practice" changes
>>> > 2- barring doing them all, at minimum increase max permitted open
>>> files from
>>> > 1024 to 50000
>>> > 3- reboot my Linux box, and then attempt to start NiFi
>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start
>>> nifi,
>>> > get in there, and eliminate that Python logging. Find another way to
>>> log
>>> > results to a system file, perhaps using a NiFi processor.
>>> >
>>> > - Jim
>>> >
>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]>
>>> wrote:
>>> >>
>>> >> Hi Jim,
>>> >>
>>> >> Apologies for terse response earlier, was typing from phone.
>>> >>
>>> >> I am assuming you are on a Linux system.
>>> >>
>>> >> First and foremost, do checkout the Sys Admin guide [1]. In
>>> particular,
>>> >> scope out the best practices [2] for configuration which will have you
>>> >> increase your open file handles.
>>> >>
>>> >> I do suspect that your hunches are correct, and while this will aid
>>> and
>>> >> maybe avoid the issue, getting those resources properly closed out
>>> will be
>>> >> the right thing to track down.
>>> >>
>>> >> Regardless of state, production or dev, there are certainly ways to
>>> manage
>>> >> this a bit more and work files through in an iterative manner.
>>> >>
>>> >> Please report back if these avenues don't solve your issues and we can
>>> >> dive a little deeper if needed.
>>> >>
>>> >> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-g
>>> uide.html
>>> >> [2]
>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-g
>>> uide.html#configuration-best-practices
>>> >>
>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my
>>> >>> content, flowfile, and provenance repositories on separate
>>> independent disk
>>> >>> devices. In my nifi.properties file, nifi.flowfile.repository.parti
>>> tions
>>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold
>>> is 20000.
>>> >>> Since I am currently in development and so this is not a production
>>> process,
>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In
>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and
>>> -Xmx4096m.
>>> >>>
>>> >>> In fact I have not yet applied the best practices from the Sys Admin
>>> >>> Guide. I will speak with them about doing this today. I am a little
>>> hesitant
>>> >>> to just jump into making the seven system changes you detail. NiFi
>>> does run
>>> >>> on this box, but so do other processed that may be impacted. what's
>>> good for
>>> >>> NiFi may not be good for these other processes, and so I want to ask
>>> first.
>>> >>>
>>> >>> My scripts employ a Python stream callback to grab values from select
>>> >>> attributes, populate those into a Python dictionary object, generate
>>> a json
>>> >>> object from that dictionary object, and replace the flowfile
>>> contents with
>>> >>> that dictionary object. These scripts are called by ExecuteScript
>>> >>> processors. Similar scripts are used at various points throughout my
>>> >>> workflow, near the end of each branch. Those had been working
>>> without any
>>> >>> problems until I tried to introduce Python logging yesterday. I
>>> suspect I am
>>> >>> not releasing file handler resources and logger objects as flowfiles
>>> flow
>>> >>> through these ExecuteScript processors - maybe? I really am only
>>> making
>>> >>> educated guesses at this stage. My first objective today is to get
>>> NiFi to
>>> >>> come back up.
>>> >>>
>>> >>> Please tell me: while I am in a dev state right now, had I been in a
>>> >>> production state what would have been the repercussions of deleting
>>> in its
>>> >>> entirety the flowfile_repository, which includes all its journal
>>> files?
>>> >>>
>>> >>> Thanks very much in advance for your help.
>>> >>>
>>> >>> Jim
>>> >>>
>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi Jim,
>>> >>>>
>>> >>>> In getting to the root cause, could you please provide information
>>> on
>>> >>>> your environment?  Did you apply the best practices listed in the
>>> System
>>> >>>> Administrator's guide?  Could you provide some details on what your
>>> scripts
>>> >>>> are doing?
>>> >>>>
>>> >>>> If the data is not of importance, removing the Flowfile Repo should
>>> get
>>> >>>> you going. You can additionally remove the content repo, but this
>>> should be
>>> >>>> cleaned up by the framework as no flowfiles will point to said
>>> content.
>>> >>>>
>>> >>>>
>>> >>>> Aldrin Piri
>>> >>>> Sent from my mobile device.
>>> >>>>
>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>> I noticed, too, that I have many partitions, partition-0 to
>>> >>>> partition-255 to be exact. These all have journal files in them. So
>>> I
>>> >>>> suspect that the journal file I cited is not specifically the
>>> problem in and
>>> >>>> of itself, but instead is the point where the allowable open files
>>> threshold
>>> >>>> is reached. I'm wondering if I have to recover by deleting all these
>>> >>>> partitions? -Jim
>>> >>>>
>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <
>>> [email protected]>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> While trying to use Python logging from two scripts I call via two
>>> >>>>> independent ExecuteScript processors, I seem to have inadvertently
>>> created a
>>> >>>>> condition where I have too many files open. This is causing a
>>> serious
>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it
>>> fails.
>>> >>>>>
>>> >>>>> The log indicates that the flow controller cannot be started, and
>>> it
>>> >>>>> cites the cause as this:
>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow
>>> Controller
>>> >>>>> .
>>> >>>>> . (many stack trace entries)
>>> >>>>> .
>>> >>>>> Caused by: java.nio.file.FileSystemException:
>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal:
>>> Too many
>>> >>>>> files open
>>> >>>>>
>>> >>>>> In a situation like this, what is the best practice for recovery?
>>> Is it
>>> >>>>> permissible to simply delete this journal file? What are the
>>> negative
>>> >>>>> repercussions of doing that?
>>> >>>>>
>>> >>>>> I did already try deleting my provenance_repository, but that did
>>> not
>>> >>>>> allow nifi to restart. (NiFi did re-establish my
>>> provenance_repository at
>>> >>>>> restart).
>>> >>>>>
>>> >>>>> Thanks very much in advance for your help. -Jim
>>> >>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Reply via email to