I do agree.  Glad you're back on track.

On Tue, Mar 28, 2017 at 9:44 AM, James McMahon <[email protected]> wrote:
> *yikes*! Message received. We have now done all but one of them. That one
> being Set How long Sockets Stay in a TIMED_WAIT State When Closed. According
> to our system administrator we are unable to do this because - and I'm
> paraphrasing here - we do not have all the necessary components or libraries
> installed.
>
> We did do all the others. Every one. The lone one we did not do does not
> sound like a showstopper. Please do let me know if you disagree. -Jim
>
> On Tue, Mar 28, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote:
>>
>> jim - definitely take the time to walk through the best practices
>> guide.  Some are more like "if you dont do this it will probably kill
>> the process - practices".
>>
>> On Tue, Mar 28, 2017 at 9:27 AM, James McMahon <[email protected]>
>> wrote:
>> > I have been able to bring Nifi UI back up with this change to the limit
>> > on
>> > number of open files. Thank you all very much for your help and
>> > insights.
>> > -Jim
>> >
>> > On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]>
>> > wrote:
>> >>
>> >> Thank you Aldrin. I do have AutoResumeState set to false currently. The
>> >> start of my jetty server fails when it tries to start the flowfile
>> >> controller. I can't bring the UI up at all. I'm hoping that the system
>> >> parm
>> >> changes allow me to restart NiFi without blowing away my
>> >> flowfile_repository. I'll certainly let you know how that plays out.
>> >> -Jim
>> >>
>> >> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]>
>> >> wrote:
>> >>>
>> >>> Jim,
>> >>>
>> >>> In terms of trying to ease NiFi at start up, you could also try
>> >>> setting
>> >>> nifi.flowcontroller.autoResumeState to false in your nifi.properties.
>> >>> Depending on how your flow and scripts are constructed, this may allow
>> >>> you
>> >>> to piecewise alleviate any large queues/processing of files that could
>> >>> be
>> >>> causing the issue at hand.  You could additionally bypass the possible
>> >>> troublesome script processors to cache this data to disk elsewhere as
>> >>> a stop
>> >>> gap measure.
>> >>>
>> >>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote:
>> >>>>
>> >>>> Jim,
>> >>>>
>> >>>> It is very possible/likely that correcting the number of file handles
>> >>>> linux allows a process to have will get nifi back on track.
>> >>>>
>> >>>> Thanks
>> >>>> Joe
>> >>>>
>> >>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]>
>> >>>> wrote:
>> >>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for
>> >>>> > all your
>> >>>> > help. My game plan is as follows:
>> >>>> > 1- speak with the admin of my Linux box about executing all the sys
>> >>>> > admin
>> >>>> > "best practice" changes
>> >>>> > 2- barring doing them all, at minimum increase max permitted open
>> >>>> > files from
>> >>>> > 1024 to 50000
>> >>>> > 3- reboot my Linux box, and then attempt to start NiFi
>> >>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box.
>> >>>> > Start
>> >>>> > nifi,
>> >>>> > get in there, and eliminate that Python logging. Find another way
>> >>>> > to
>> >>>> > log
>> >>>> > results to a system file, perhaps using a NiFi processor.
>> >>>> >
>> >>>> > - Jim
>> >>>> >
>> >>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> Hi Jim,
>> >>>> >>
>> >>>> >> Apologies for terse response earlier, was typing from phone.
>> >>>> >>
>> >>>> >> I am assuming you are on a Linux system.
>> >>>> >>
>> >>>> >> First and foremost, do checkout the Sys Admin guide [1]. In
>> >>>> >> particular,
>> >>>> >> scope out the best practices [2] for configuration which will have
>> >>>> >> you
>> >>>> >> increase your open file handles.
>> >>>> >>
>> >>>> >> I do suspect that your hunches are correct, and while this will
>> >>>> >> aid
>> >>>> >> and
>> >>>> >> maybe avoid the issue, getting those resources properly closed out
>> >>>> >> will be
>> >>>> >> the right thing to track down.
>> >>>> >>
>> >>>> >> Regardless of state, production or dev, there are certainly ways
>> >>>> >> to
>> >>>> >> manage
>> >>>> >> this a bit more and work files through in an iterative manner.
>> >>>> >>
>> >>>> >> Please report back if these avenues don't solve your issues and we
>> >>>> >> can
>> >>>> >> dive a little deeper if needed.
>> >>>> >>
>> >>>> >> [1]
>> >>>> >>
>> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
>> >>>> >> [2]
>> >>>> >>
>> >>>> >>
>> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices
>> >>>> >>
>> >>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon
>> >>>> >> <[email protected]>
>> >>>> >> wrote:
>> >>>> >>>
>> >>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I
>> >>>> >>> have
>> >>>> >>> my
>> >>>> >>> content, flowfile, and provenance repositories on separate
>> >>>> >>> independent disk
>> >>>> >>> devices. In my nifi.properties file,
>> >>>> >>> nifi.flowfile.repository.partitions
>> >>>> >>> equals 256, and always.sync is false. My
>> >>>> >>> nifi.queue.swap.threshold
>> >>>> >>> is 20000.
>> >>>> >>> Since I am currently in development and so this is not a
>> >>>> >>> production
>> >>>> >>> process,
>> >>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In
>> >>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and
>> >>>> >>> -Xmx4096m.
>> >>>> >>>
>> >>>> >>> In fact I have not yet applied the best practices from the Sys
>> >>>> >>> Admin
>> >>>> >>> Guide. I will speak with them about doing this today. I am a
>> >>>> >>> little
>> >>>> >>> hesitant
>> >>>> >>> to just jump into making the seven system changes you detail.
>> >>>> >>> NiFi
>> >>>> >>> does run
>> >>>> >>> on this box, but so do other processed that may be impacted.
>> >>>> >>> what's
>> >>>> >>> good for
>> >>>> >>> NiFi may not be good for these other processes, and so I want to
>> >>>> >>> ask
>> >>>> >>> first.
>> >>>> >>>
>> >>>> >>> My scripts employ a Python stream callback to grab values from
>> >>>> >>> select
>> >>>> >>> attributes, populate those into a Python dictionary object,
>> >>>> >>> generate
>> >>>> >>> a json
>> >>>> >>> object from that dictionary object, and replace the flowfile
>> >>>> >>> contents with
>> >>>> >>> that dictionary object. These scripts are called by ExecuteScript
>> >>>> >>> processors. Similar scripts are used at various points throughout
>> >>>> >>> my
>> >>>> >>> workflow, near the end of each branch. Those had been working
>> >>>> >>> without any
>> >>>> >>> problems until I tried to introduce Python logging yesterday. I
>> >>>> >>> suspect I am
>> >>>> >>> not releasing file handler resources and logger objects as
>> >>>> >>> flowfiles
>> >>>> >>> flow
>> >>>> >>> through these ExecuteScript processors - maybe? I really am only
>> >>>> >>> making
>> >>>> >>> educated guesses at this stage. My first objective today is to
>> >>>> >>> get
>> >>>> >>> NiFi to
>> >>>> >>> come back up.
>> >>>> >>>
>> >>>> >>> Please tell me: while I am in a dev state right now, had I been
>> >>>> >>> in a
>> >>>> >>> production state what would have been the repercussions of
>> >>>> >>> deleting
>> >>>> >>> in its
>> >>>> >>> entirety the flowfile_repository, which includes all its journal
>> >>>> >>> files?
>> >>>> >>>
>> >>>> >>> Thanks very much in advance for your help.
>> >>>> >>>
>> >>>> >>> Jim
>> >>>> >>>
>> >>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri
>> >>>> >>> <[email protected]>
>> >>>> >>> wrote:
>> >>>> >>>>
>> >>>> >>>> Hi Jim,
>> >>>> >>>>
>> >>>> >>>> In getting to the root cause, could you please provide
>> >>>> >>>> information
>> >>>> >>>> on
>> >>>> >>>> your environment?  Did you apply the best practices listed in
>> >>>> >>>> the
>> >>>> >>>> System
>> >>>> >>>> Administrator's guide?  Could you provide some details on what
>> >>>> >>>> your
>> >>>> >>>> scripts
>> >>>> >>>> are doing?
>> >>>> >>>>
>> >>>> >>>> If the data is not of importance, removing the Flowfile Repo
>> >>>> >>>> should
>> >>>> >>>> get
>> >>>> >>>> you going. You can additionally remove the content repo, but
>> >>>> >>>> this
>> >>>> >>>> should be
>> >>>> >>>> cleaned up by the framework as no flowfiles will point to said
>> >>>> >>>> content.
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> Aldrin Piri
>> >>>> >>>> Sent from my mobile device.
>> >>>> >>>>
>> >>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]>
>> >>>> >>>> wrote:
>> >>>> >>>>
>> >>>> >>>> I noticed, too, that I have many partitions, partition-0 to
>> >>>> >>>> partition-255 to be exact. These all have journal files in them.
>> >>>> >>>> So
>> >>>> >>>> I
>> >>>> >>>> suspect that the journal file I cited is not specifically the
>> >>>> >>>> problem in and
>> >>>> >>>> of itself, but instead is the point where the allowable open
>> >>>> >>>> files
>> >>>> >>>> threshold
>> >>>> >>>> is reached. I'm wondering if I have to recover by deleting all
>> >>>> >>>> these
>> >>>> >>>> partitions? -Jim
>> >>>> >>>>
>> >>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon
>> >>>> >>>> <[email protected]>
>> >>>> >>>> wrote:
>> >>>> >>>>>
>> >>>> >>>>> While trying to use Python logging from two scripts I call via
>> >>>> >>>>> two
>> >>>> >>>>> independent ExecuteScript processors, I seem to have
>> >>>> >>>>> inadvertently
>> >>>> >>>>> created a
>> >>>> >>>>> condition where I have too many files open. This is causing a
>> >>>> >>>>> serious
>> >>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1)
>> >>>> >>>>> it
>> >>>> >>>>> fails.
>> >>>> >>>>>
>> >>>> >>>>> The log indicates that the flow controller cannot be started,
>> >>>> >>>>> and
>> >>>> >>>>> it
>> >>>> >>>>> cites the cause as this:
>> >>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow
>> >>>> >>>>> Controller
>> >>>> >>>>> .
>> >>>> >>>>> . (many stack trace entries)
>> >>>> >>>>> .
>> >>>> >>>>> Caused by: java.nio.file.FileSystemException:
>> >>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal:
>> >>>> >>>>> Too
>> >>>> >>>>> many
>> >>>> >>>>> files open
>> >>>> >>>>>
>> >>>> >>>>> In a situation like this, what is the best practice for
>> >>>> >>>>> recovery?
>> >>>> >>>>> Is it
>> >>>> >>>>> permissible to simply delete this journal file? What are the
>> >>>> >>>>> negative
>> >>>> >>>>> repercussions of doing that?
>> >>>> >>>>>
>> >>>> >>>>> I did already try deleting my provenance_repository, but that
>> >>>> >>>>> did
>> >>>> >>>>> not
>> >>>> >>>>> allow nifi to restart. (NiFi did re-establish my
>> >>>> >>>>> provenance_repository at
>> >>>> >>>>> restart).
>> >>>> >>>>>
>> >>>> >>>>> Thanks very much in advance for your help. -Jim
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>
>> >>>> >>
>> >>>> >
>> >>>
>> >>>
>> >>
>> >
>
>

Reply via email to