Re: Cannot Restart Nifi

James McMahon Tue, 28 Mar 2017 05:52:10 -0700

Thank you Aldrin. I do have AutoResumeState set to false currently. The
start of my jetty server fails when it tries to start the flowfile
controller. I can't bring the UI up at all. I'm hoping that the system parm
changes allow me to restart NiFi without blowing away my
flowfile_repository. I'll certainly let you know how that plays out. -Jim


On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> wrote:

> Jim,
>
> In terms of trying to ease NiFi at start up, you could also try
> setting nifi.flowcontroller.autoResumeState to false in your
> nifi.properties.  Depending on how your flow and scripts are constructed,
> this may allow you to piecewise alleviate any large queues/processing of
> files that could be causing the issue at hand.  You could additionally
> bypass the possible troublesome script processors to cache this data to
> disk elsewhere as a stop gap measure.
>
> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote:
>
>> Jim,
>>
>> It is very possible/likely that correcting the number of file handles
>> linux allows a process to have will get nifi back on track.
>>
>> Thanks
>> Joe
>>
>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]>
>> wrote:
>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for all
>> your
>> > help. My game plan is as follows:
>> > 1- speak with the admin of my Linux box about executing all the sys
>> admin
>> > "best practice" changes
>> > 2- barring doing them all, at minimum increase max permitted open files
>> from
>> > 1024 to 50000
>> > 3- reboot my Linux box, and then attempt to start NiFi
>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start
>> nifi,
>> > get in there, and eliminate that Python logging. Find another way to log
>> > results to a system file, perhaps using a NiFi processor.
>> >
>> > - Jim
>> >
>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]>
>> wrote:
>> >>
>> >> Hi Jim,
>> >>
>> >> Apologies for terse response earlier, was typing from phone.
>> >>
>> >> I am assuming you are on a Linux system.
>> >>
>> >> First and foremost, do checkout the Sys Admin guide [1]. In particular,
>> >> scope out the best practices [2] for configuration which will have you
>> >> increase your open file handles.
>> >>
>> >> I do suspect that your hunches are correct, and while this will aid and
>> >> maybe avoid the issue, getting those resources properly closed out
>> will be
>> >> the right thing to track down.
>> >>
>> >> Regardless of state, production or dev, there are certainly ways to
>> manage
>> >> this a bit more and work files through in an iterative manner.
>> >>
>> >> Please report back if these avenues don't solve your issues and we can
>> >> dive a little deeper if needed.
>> >>
>> >> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-g
>> uide.html
>> >> [2]
>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-g
>> uide.html#configuration-best-practices
>> >>
>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]>
>> >> wrote:
>> >>>
>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my
>> >>> content, flowfile, and provenance repositories on separate
>> independent disk
>> >>> devices. In my nifi.properties file, nifi.flowfile.repository.parti
>> tions
>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold is
>> 20000.
>> >>> Since I am currently in development and so this is not a production
>> process,
>> >>> I have set nifi.flowcontroller.autoResumeState to false. In
>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and
>> -Xmx4096m.
>> >>>
>> >>> In fact I have not yet applied the best practices from the Sys Admin
>> >>> Guide. I will speak with them about doing this today. I am a little
>> hesitant
>> >>> to just jump into making the seven system changes you detail. NiFi
>> does run
>> >>> on this box, but so do other processed that may be impacted. what's
>> good for
>> >>> NiFi may not be good for these other processes, and so I want to ask
>> first.
>> >>>
>> >>> My scripts employ a Python stream callback to grab values from select
>> >>> attributes, populate those into a Python dictionary object, generate
>> a json
>> >>> object from that dictionary object, and replace the flowfile contents
>> with
>> >>> that dictionary object. These scripts are called by ExecuteScript
>> >>> processors. Similar scripts are used at various points throughout my
>> >>> workflow, near the end of each branch. Those had been working without
>> any
>> >>> problems until I tried to introduce Python logging yesterday. I
>> suspect I am
>> >>> not releasing file handler resources and logger objects as flowfiles
>> flow
>> >>> through these ExecuteScript processors - maybe? I really am only
>> making
>> >>> educated guesses at this stage. My first objective today is to get
>> NiFi to
>> >>> come back up.
>> >>>
>> >>> Please tell me: while I am in a dev state right now, had I been in a
>> >>> production state what would have been the repercussions of deleting
>> in its
>> >>> entirety the flowfile_repository, which includes all its journal
>> files?
>> >>>
>> >>> Thanks very much in advance for your help.
>> >>>
>> >>> Jim
>> >>>
>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Hi Jim,
>> >>>>
>> >>>> In getting to the root cause, could you please provide information on
>> >>>> your environment?  Did you apply the best practices listed in the
>> System
>> >>>> Administrator's guide?  Could you provide some details on what your
>> scripts
>> >>>> are doing?
>> >>>>
>> >>>> If the data is not of importance, removing the Flowfile Repo should
>> get
>> >>>> you going. You can additionally remove the content repo, but this
>> should be
>> >>>> cleaned up by the framework as no flowfiles will point to said
>> content.
>> >>>>
>> >>>>
>> >>>> Aldrin Piri
>> >>>> Sent from my mobile device.
>> >>>>
>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]>
>> wrote:
>> >>>>
>> >>>> I noticed, too, that I have many partitions, partition-0 to
>> >>>> partition-255 to be exact. These all have journal files in them. So I
>> >>>> suspect that the journal file I cited is not specifically the
>> problem in and
>> >>>> of itself, but instead is the point where the allowable open files
>> threshold
>> >>>> is reached. I'm wondering if I have to recover by deleting all these
>> >>>> partitions? -Jim
>> >>>>
>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <[email protected]
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> While trying to use Python logging from two scripts I call via two
>> >>>>> independent ExecuteScript processors, I seem to have inadvertently
>> created a
>> >>>>> condition where I have too many files open. This is causing a
>> serious
>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it
>> fails.
>> >>>>>
>> >>>>> The log indicates that the flow controller cannot be started, and it
>> >>>>> cites the cause as this:
>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow
>> Controller
>> >>>>> .
>> >>>>> . (many stack trace entries)
>> >>>>> .
>> >>>>> Caused by: java.nio.file.FileSystemException:
>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too
>> many
>> >>>>> files open
>> >>>>>
>> >>>>> In a situation like this, what is the best practice for recovery?
>> Is it
>> >>>>> permissible to simply delete this journal file? What are the
>> negative
>> >>>>> repercussions of doing that?
>> >>>>>
>> >>>>> I did already try deleting my provenance_repository, but that did
>> not
>> >>>>> allow nifi to restart. (NiFi did re-establish my
>> provenance_repository at
>> >>>>> restart).
>> >>>>>
>> >>>>> Thanks very much in advance for your help. -Jim
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Re: Cannot Restart Nifi

Reply via email to