Re: Cannot Restart Nifi

James McMahon Tue, 28 Mar 2017 06:44:51 -0700

*yikes*! Message received. We have now done all but one of them. That one
being Set How long Sockets Stay in a TIMED_WAIT State When Closed.
According to our system administrator we are unable to do this because -
and I'm paraphrasing here - we do not have all the necessary components or
libraries installed.


We did do all the others. Every one. The lone one we did not do does not
sound like a showstopper. Please do let me know if you disagree. -Jim

On Tue, Mar 28, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote:

> jim - definitely take the time to walk through the best practices
> guide.  Some are more like "if you dont do this it will probably kill
> the process - practices".
>
> On Tue, Mar 28, 2017 at 9:27 AM, James McMahon <[email protected]>
> wrote:
> > I have been able to bring Nifi UI back up with this change to the limit
> on
> > number of open files. Thank you all very much for your help and insights.
> > -Jim
> >
> > On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]>
> wrote:
> >>
> >> Thank you Aldrin. I do have AutoResumeState set to false currently. The
> >> start of my jetty server fails when it tries to start the flowfile
> >> controller. I can't bring the UI up at all. I'm hoping that the system
> parm
> >> changes allow me to restart NiFi without blowing away my
> >> flowfile_repository. I'll certainly let you know how that plays out.
> -Jim
> >>
> >> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]>
> wrote:
> >>>
> >>> Jim,
> >>>
> >>> In terms of trying to ease NiFi at start up, you could also try setting
> >>> nifi.flowcontroller.autoResumeState to false in your nifi.properties.
> >>> Depending on how your flow and scripts are constructed, this may allow
> you
> >>> to piecewise alleviate any large queues/processing of files that could
> be
> >>> causing the issue at hand.  You could additionally bypass the possible
> >>> troublesome script processors to cache this data to disk elsewhere as
> a stop
> >>> gap measure.
> >>>
> >>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote:
> >>>>
> >>>> Jim,
> >>>>
> >>>> It is very possible/likely that correcting the number of file handles
> >>>> linux allows a process to have will get nifi back on track.
> >>>>
> >>>> Thanks
> >>>> Joe
> >>>>
> >>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]>
> >>>> wrote:
> >>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for
> >>>> > all your
> >>>> > help. My game plan is as follows:
> >>>> > 1- speak with the admin of my Linux box about executing all the sys
> >>>> > admin
> >>>> > "best practice" changes
> >>>> > 2- barring doing them all, at minimum increase max permitted open
> >>>> > files from
> >>>> > 1024 to 50000
> >>>> > 3- reboot my Linux box, and then attempt to start NiFi
> >>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box.
> Start
> >>>> > nifi,
> >>>> > get in there, and eliminate that Python logging. Find another way to
> >>>> > log
> >>>> > results to a system file, perhaps using a NiFi processor.
> >>>> >
> >>>> > - Jim
> >>>> >
> >>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]>
> >>>> > wrote:
> >>>> >>
> >>>> >> Hi Jim,
> >>>> >>
> >>>> >> Apologies for terse response earlier, was typing from phone.
> >>>> >>
> >>>> >> I am assuming you are on a Linux system.
> >>>> >>
> >>>> >> First and foremost, do checkout the Sys Admin guide [1]. In
> >>>> >> particular,
> >>>> >> scope out the best practices [2] for configuration which will have
> >>>> >> you
> >>>> >> increase your open file handles.
> >>>> >>
> >>>> >> I do suspect that your hunches are correct, and while this will aid
> >>>> >> and
> >>>> >> maybe avoid the issue, getting those resources properly closed out
> >>>> >> will be
> >>>> >> the right thing to track down.
> >>>> >>
> >>>> >> Regardless of state, production or dev, there are certainly ways to
> >>>> >> manage
> >>>> >> this a bit more and work files through in an iterative manner.
> >>>> >>
> >>>> >> Please report back if these avenues don't solve your issues and we
> >>>> >> can
> >>>> >> dive a little deeper if needed.
> >>>> >>
> >>>> >> [1]
> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html
> >>>> >> [2]
> >>>> >>
> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html#configuration-best-practices
> >>>> >>
> >>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <
> [email protected]>
> >>>> >> wrote:
> >>>> >>>
> >>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have
> >>>> >>> my
> >>>> >>> content, flowfile, and provenance repositories on separate
> >>>> >>> independent disk
> >>>> >>> devices. In my nifi.properties file,
> >>>> >>> nifi.flowfile.repository.partitions
> >>>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold
> >>>> >>> is 20000.
> >>>> >>> Since I am currently in development and so this is not a
> production
> >>>> >>> process,
> >>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In
> >>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and
> >>>> >>> -Xmx4096m.
> >>>> >>>
> >>>> >>> In fact I have not yet applied the best practices from the Sys
> Admin
> >>>> >>> Guide. I will speak with them about doing this today. I am a
> little
> >>>> >>> hesitant
> >>>> >>> to just jump into making the seven system changes you detail. NiFi
> >>>> >>> does run
> >>>> >>> on this box, but so do other processed that may be impacted.
> what's
> >>>> >>> good for
> >>>> >>> NiFi may not be good for these other processes, and so I want to
> ask
> >>>> >>> first.
> >>>> >>>
> >>>> >>> My scripts employ a Python stream callback to grab values from
> >>>> >>> select
> >>>> >>> attributes, populate those into a Python dictionary object,
> generate
> >>>> >>> a json
> >>>> >>> object from that dictionary object, and replace the flowfile
> >>>> >>> contents with
> >>>> >>> that dictionary object. These scripts are called by ExecuteScript
> >>>> >>> processors. Similar scripts are used at various points throughout
> my
> >>>> >>> workflow, near the end of each branch. Those had been working
> >>>> >>> without any
> >>>> >>> problems until I tried to introduce Python logging yesterday. I
> >>>> >>> suspect I am
> >>>> >>> not releasing file handler resources and logger objects as
> flowfiles
> >>>> >>> flow
> >>>> >>> through these ExecuteScript processors - maybe? I really am only
> >>>> >>> making
> >>>> >>> educated guesses at this stage. My first objective today is to get
> >>>> >>> NiFi to
> >>>> >>> come back up.
> >>>> >>>
> >>>> >>> Please tell me: while I am in a dev state right now, had I been
> in a
> >>>> >>> production state what would have been the repercussions of
> deleting
> >>>> >>> in its
> >>>> >>> entirety the flowfile_repository, which includes all its journal
> >>>> >>> files?
> >>>> >>>
> >>>> >>> Thanks very much in advance for your help.
> >>>> >>>
> >>>> >>> Jim
> >>>> >>>
> >>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <
> [email protected]>
> >>>> >>> wrote:
> >>>> >>>>
> >>>> >>>> Hi Jim,
> >>>> >>>>
> >>>> >>>> In getting to the root cause, could you please provide
> information
> >>>> >>>> on
> >>>> >>>> your environment?  Did you apply the best practices listed in the
> >>>> >>>> System
> >>>> >>>> Administrator's guide?  Could you provide some details on what
> your
> >>>> >>>> scripts
> >>>> >>>> are doing?
> >>>> >>>>
> >>>> >>>> If the data is not of importance, removing the Flowfile Repo
> should
> >>>> >>>> get
> >>>> >>>> you going. You can additionally remove the content repo, but this
> >>>> >>>> should be
> >>>> >>>> cleaned up by the framework as no flowfiles will point to said
> >>>> >>>> content.
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> Aldrin Piri
> >>>> >>>> Sent from my mobile device.
> >>>> >>>>
> >>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]>
> >>>> >>>> wrote:
> >>>> >>>>
> >>>> >>>> I noticed, too, that I have many partitions, partition-0 to
> >>>> >>>> partition-255 to be exact. These all have journal files in them.
> So
> >>>> >>>> I
> >>>> >>>> suspect that the journal file I cited is not specifically the
> >>>> >>>> problem in and
> >>>> >>>> of itself, but instead is the point where the allowable open
> files
> >>>> >>>> threshold
> >>>> >>>> is reached. I'm wondering if I have to recover by deleting all
> >>>> >>>> these
> >>>> >>>> partitions? -Jim
> >>>> >>>>
> >>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon
> >>>> >>>> <[email protected]>
> >>>> >>>> wrote:
> >>>> >>>>>
> >>>> >>>>> While trying to use Python logging from two scripts I call via
> two
> >>>> >>>>> independent ExecuteScript processors, I seem to have
> inadvertently
> >>>> >>>>> created a
> >>>> >>>>> condition where I have too many files open. This is causing a
> >>>> >>>>> serious
> >>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1)
> it
> >>>> >>>>> fails.
> >>>> >>>>>
> >>>> >>>>> The log indicates that the flow controller cannot be started,
> and
> >>>> >>>>> it
> >>>> >>>>> cites the cause as this:
> >>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow
> >>>> >>>>> Controller
> >>>> >>>>> .
> >>>> >>>>> . (many stack trace entries)
> >>>> >>>>> .
> >>>> >>>>> Caused by: java.nio.file.FileSystemException:
> >>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal:
> Too
> >>>> >>>>> many
> >>>> >>>>> files open
> >>>> >>>>>
> >>>> >>>>> In a situation like this, what is the best practice for
> recovery?
> >>>> >>>>> Is it
> >>>> >>>>> permissible to simply delete this journal file? What are the
> >>>> >>>>> negative
> >>>> >>>>> repercussions of doing that?
> >>>> >>>>>
> >>>> >>>>> I did already try deleting my provenance_repository, but that
> did
> >>>> >>>>> not
> >>>> >>>>> allow nifi to restart. (NiFi did re-establish my
> >>>> >>>>> provenance_repository at
> >>>> >>>>> restart).
> >>>> >>>>>
> >>>> >>>>> Thanks very much in advance for your help. -Jim
> >>>> >>>>
> >>>> >>>>
> >>>> >>>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>

Re: Cannot Restart Nifi

Reply via email to