jim - definitely take the time to walk through the best practices
guide.  Some are more like "if you dont do this it will probably kill
the process - practices".

On Tue, Mar 28, 2017 at 9:27 AM, James McMahon <[email protected]> wrote:
> I have been able to bring Nifi UI back up with this change to the limit on
> number of open files. Thank you all very much for your help and insights.
> -Jim
>
> On Tue, Mar 28, 2017 at 8:51 AM, James McMahon <[email protected]> wrote:
>>
>> Thank you Aldrin. I do have AutoResumeState set to false currently. The
>> start of my jetty server fails when it tries to start the flowfile
>> controller. I can't bring the UI up at all. I'm hoping that the system parm
>> changes allow me to restart NiFi without blowing away my
>> flowfile_repository. I'll certainly let you know how that plays out. -Jim
>>
>> On Tue, Mar 28, 2017 at 8:46 AM, Aldrin Piri <[email protected]> wrote:
>>>
>>> Jim,
>>>
>>> In terms of trying to ease NiFi at start up, you could also try setting
>>> nifi.flowcontroller.autoResumeState to false in your nifi.properties.
>>> Depending on how your flow and scripts are constructed, this may allow you
>>> to piecewise alleviate any large queues/processing of files that could be
>>> causing the issue at hand.  You could additionally bypass the possible
>>> troublesome script processors to cache this data to disk elsewhere as a stop
>>> gap measure.
>>>
>>> On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <[email protected]> wrote:
>>>>
>>>> Jim,
>>>>
>>>> It is very possible/likely that correcting the number of file handles
>>>> linux allows a process to have will get nifi back on track.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <[email protected]>
>>>> wrote:
>>>> > No apology necessary Aldrin. I'm much obliged to you and to Joe for
>>>> > all your
>>>> > help. My game plan is as follows:
>>>> > 1- speak with the admin of my Linux box about executing all the sys
>>>> > admin
>>>> > "best practice" changes
>>>> > 2- barring doing them all, at minimum increase max permitted open
>>>> > files from
>>>> > 1024 to 50000
>>>> > 3- reboot my Linux box, and then attempt to start NiFi
>>>> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start
>>>> > nifi,
>>>> > get in there, and eliminate that Python logging. Find another way to
>>>> > log
>>>> > results to a system file, perhaps using a NiFi processor.
>>>> >
>>>> > - Jim
>>>> >
>>>> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]>
>>>> > wrote:
>>>> >>
>>>> >> Hi Jim,
>>>> >>
>>>> >> Apologies for terse response earlier, was typing from phone.
>>>> >>
>>>> >> I am assuming you are on a Linux system.
>>>> >>
>>>> >> First and foremost, do checkout the Sys Admin guide [1]. In
>>>> >> particular,
>>>> >> scope out the best practices [2] for configuration which will have
>>>> >> you
>>>> >> increase your open file handles.
>>>> >>
>>>> >> I do suspect that your hunches are correct, and while this will aid
>>>> >> and
>>>> >> maybe avoid the issue, getting those resources properly closed out
>>>> >> will be
>>>> >> the right thing to track down.
>>>> >>
>>>> >> Regardless of state, production or dev, there are certainly ways to
>>>> >> manage
>>>> >> this a bit more and work files through in an iterative manner.
>>>> >>
>>>> >> Please report back if these avenues don't solve your issues and we
>>>> >> can
>>>> >> dive a little deeper if needed.
>>>> >>
>>>> >> [1]
>>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
>>>> >> [2]
>>>> >>
>>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices
>>>> >>
>>>> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have
>>>> >>> my
>>>> >>> content, flowfile, and provenance repositories on separate
>>>> >>> independent disk
>>>> >>> devices. In my nifi.properties file,
>>>> >>> nifi.flowfile.repository.partitions
>>>> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold
>>>> >>> is 20000.
>>>> >>> Since I am currently in development and so this is not a production
>>>> >>> process,
>>>> >>> I have set nifi.flowcontroller.autoResumeState to false. In
>>>> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and
>>>> >>> -Xmx4096m.
>>>> >>>
>>>> >>> In fact I have not yet applied the best practices from the Sys Admin
>>>> >>> Guide. I will speak with them about doing this today. I am a little
>>>> >>> hesitant
>>>> >>> to just jump into making the seven system changes you detail. NiFi
>>>> >>> does run
>>>> >>> on this box, but so do other processed that may be impacted. what's
>>>> >>> good for
>>>> >>> NiFi may not be good for these other processes, and so I want to ask
>>>> >>> first.
>>>> >>>
>>>> >>> My scripts employ a Python stream callback to grab values from
>>>> >>> select
>>>> >>> attributes, populate those into a Python dictionary object, generate
>>>> >>> a json
>>>> >>> object from that dictionary object, and replace the flowfile
>>>> >>> contents with
>>>> >>> that dictionary object. These scripts are called by ExecuteScript
>>>> >>> processors. Similar scripts are used at various points throughout my
>>>> >>> workflow, near the end of each branch. Those had been working
>>>> >>> without any
>>>> >>> problems until I tried to introduce Python logging yesterday. I
>>>> >>> suspect I am
>>>> >>> not releasing file handler resources and logger objects as flowfiles
>>>> >>> flow
>>>> >>> through these ExecuteScript processors - maybe? I really am only
>>>> >>> making
>>>> >>> educated guesses at this stage. My first objective today is to get
>>>> >>> NiFi to
>>>> >>> come back up.
>>>> >>>
>>>> >>> Please tell me: while I am in a dev state right now, had I been in a
>>>> >>> production state what would have been the repercussions of deleting
>>>> >>> in its
>>>> >>> entirety the flowfile_repository, which includes all its journal
>>>> >>> files?
>>>> >>>
>>>> >>> Thanks very much in advance for your help.
>>>> >>>
>>>> >>> Jim
>>>> >>>
>>>> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Hi Jim,
>>>> >>>>
>>>> >>>> In getting to the root cause, could you please provide information
>>>> >>>> on
>>>> >>>> your environment?  Did you apply the best practices listed in the
>>>> >>>> System
>>>> >>>> Administrator's guide?  Could you provide some details on what your
>>>> >>>> scripts
>>>> >>>> are doing?
>>>> >>>>
>>>> >>>> If the data is not of importance, removing the Flowfile Repo should
>>>> >>>> get
>>>> >>>> you going. You can additionally remove the content repo, but this
>>>> >>>> should be
>>>> >>>> cleaned up by the framework as no flowfiles will point to said
>>>> >>>> content.
>>>> >>>>
>>>> >>>>
>>>> >>>> Aldrin Piri
>>>> >>>> Sent from my mobile device.
>>>> >>>>
>>>> >>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>> I noticed, too, that I have many partitions, partition-0 to
>>>> >>>> partition-255 to be exact. These all have journal files in them. So
>>>> >>>> I
>>>> >>>> suspect that the journal file I cited is not specifically the
>>>> >>>> problem in and
>>>> >>>> of itself, but instead is the point where the allowable open files
>>>> >>>> threshold
>>>> >>>> is reached. I'm wondering if I have to recover by deleting all
>>>> >>>> these
>>>> >>>> partitions? -Jim
>>>> >>>>
>>>> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon
>>>> >>>> <[email protected]>
>>>> >>>> wrote:
>>>> >>>>>
>>>> >>>>> While trying to use Python logging from two scripts I call via two
>>>> >>>>> independent ExecuteScript processors, I seem to have inadvertently
>>>> >>>>> created a
>>>> >>>>> condition where I have too many files open. This is causing a
>>>> >>>>> serious
>>>> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it
>>>> >>>>> fails.
>>>> >>>>>
>>>> >>>>> The log indicates that the flow controller cannot be started, and
>>>> >>>>> it
>>>> >>>>> cites the cause as this:
>>>> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow
>>>> >>>>> Controller
>>>> >>>>> .
>>>> >>>>> . (many stack trace entries)
>>>> >>>>> .
>>>> >>>>> Caused by: java.nio.file.FileSystemException:
>>>> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too
>>>> >>>>> many
>>>> >>>>> files open
>>>> >>>>>
>>>> >>>>> In a situation like this, what is the best practice for recovery?
>>>> >>>>> Is it
>>>> >>>>> permissible to simply delete this journal file? What are the
>>>> >>>>> negative
>>>> >>>>> repercussions of doing that?
>>>> >>>>>
>>>> >>>>> I did already try deleting my provenance_repository, but that did
>>>> >>>>> not
>>>> >>>>> allow nifi to restart. (NiFi did re-establish my
>>>> >>>>> provenance_repository at
>>>> >>>>> restart).
>>>> >>>>>
>>>> >>>>> Thanks very much in advance for your help. -Jim
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >
>>>
>>>
>>
>

Reply via email to