Re: Cannot Restart Nifi

James McMahon Tue, 28 Mar 2017 05:13:53 -0700

No apology necessary Aldrin. I'm much obliged to you and to Joe for all
your help. My game plan is as follows:
1- speak with the admin of my Linux box about executing all the sys admin
"best practice" changes
2- barring doing them all, at minimum increase max permitted open files
from 1024 to 50000
3- reboot my Linux box, and then attempt to start NiFi
4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start
nifi, get in there, and eliminate that Python logging. Find another way to
log results to a system file, perhaps using a NiFi processor.


- Jim

On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <[email protected]> wrote:

> Hi Jim,
>
> Apologies for terse response earlier, was typing from phone.
>
> I am assuming you are on a Linux system.
>
> First and foremost, do checkout the Sys Admin guide [1]. In particular,
> scope out the best practices [2] for configuration which will have you
> increase your open file handles.
>
> I do suspect that your hunches are correct, and while this will aid and
> maybe avoid the issue, getting those resources properly closed out will be
> the right thing to track down.
>
> Regardless of state, production or dev, there are certainly ways to manage
> this a bit more and work files through in an iterative manner.
>
> Please report back if these avenues don't solve your issues and we can
> dive a little deeper if needed.
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
> [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#
> configuration-best-practices
>
> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <[email protected]>
> wrote:
>
>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my
>> content, flowfile, and provenance repositories on separate independent disk
>> devices. In my nifi.properties file, nifi.flowfile.repository.partitions
>> equals 256, and always.sync is false. My nifi.queue.swap.threshold is
>> 20000. Since I am currently in development and so this is not a production
>> process, I have set nifi.flowcontroller.autoResumeState to false.
>> In conf/bootstrap.conf, my JVM memory settings are -Xms1024m and -Xmx4096m.
>>
>> In fact I have not yet applied the best practices from the Sys Admin
>> Guide. I will speak with them about doing this today. I am a little
>> hesitant to just jump into making the seven system changes you detail. NiFi
>> does run on this box, but so do other processed that may be impacted.
>> what's good for NiFi may not be good for these other processes, and so I
>> want to ask first.
>>
>> My scripts employ a Python stream callback to grab values from select
>> attributes, populate those into a Python dictionary object, generate a json
>> object from that dictionary object, and replace the flowfile contents with
>> that dictionary object. These scripts are called by ExecuteScript
>> processors. Similar scripts are used at various points throughout my
>> workflow, near the end of each branch. Those had been working without any
>> problems until I tried to introduce Python logging yesterday. I suspect I
>> am not releasing file handler resources and logger objects as flowfiles
>> flow through these ExecuteScript processors - maybe? I really am only
>> making educated guesses at this stage. My first objective today is to get
>> NiFi to come back up.
>>
>> Please tell me: while I am in a dev state right now, had I been in a
>> production state what would have been the repercussions of deleting in its
>> entirety the flowfile_repository, which includes all its journal files?
>>
>> Thanks very much in advance for your help.
>>
>> Jim
>>
>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <[email protected]>
>> wrote:
>>
>>> Hi Jim,
>>>
>>> In getting to the root cause, could you please provide information on
>>> your environment?  Did you apply the best practices listed in the System
>>> Administrator's guide?  Could you provide some details on what your scripts
>>> are doing?
>>>
>>> If the data is not of importance, removing the Flowfile Repo should get
>>> you going. You can additionally remove the content repo, but this should be
>>> cleaned up by the framework as no flowfiles will point to said content.
>>>
>>>
>>> Aldrin Piri
>>> Sent from my mobile device.
>>>
>>> On Mar 28, 2017, at 06:12, James McMahon <[email protected]> wrote:
>>>
>>> I noticed, too, that I have many partitions, partition-0 to
>>> partition-255 to be exact. These all have journal files in them. So I
>>> suspect that the journal file I cited is not specifically the problem in
>>> and of itself, but instead is the point where the allowable open files
>>> threshold is reached. I'm wondering if I have to recover by deleting all
>>> these partitions? -Jim
>>>
>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <[email protected]>
>>> wrote:
>>>
>>>> While trying to use Python logging from two scripts I call via two
>>>> independent ExecuteScript processors, I seem to have inadvertently created
>>>> a condition where I have too many files open. This is causing a serious
>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it fails.
>>>>
>>>> The log indicates that the flow controller cannot be started, and it
>>>> cites the cause as this:
>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller
>>>> .
>>>> . (many stack trace entries)
>>>> .
>>>> Caused by: java.nio.file.FileSystemException:
>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too
>>>> many files open
>>>>
>>>> In a situation like this, what is the best practice for recovery? Is it
>>>> permissible to simply delete this journal file? What are the negative
>>>> repercussions of doing that?
>>>>
>>>> I did already try deleting my provenance_repository, but that did not
>>>> allow nifi to restart. (NiFi did re-establish my provenance_repository at
>>>> restart).
>>>>
>>>> Thanks very much in advance for your help. -Jim
>>>>
>>>
>>>
>>
>

Re: Cannot Restart Nifi

Reply via email to