Re: NiFi will not restart, missing status file message in bootstrap

James McMahon Thu, 25 May 2017 08:12:59 -0700

Thank you Joe. Do you advise, then, that we tune some parameters now or is
it acceptable to allow NiFi to ..... self-regulate .... as it appears to be
doing? If you suggest tuning, which ones should I look at - index.threads?
I notice that at present I have that set to a robust 1.


That improvement with 1.2.0 sounds like it will make a big difference.
Sadly as you recall it may be some time before 1.2.x is available to me.

On Thu, May 25, 2017 at 11:05 AM, Joe Witt <[email protected]> wrote:

> jim
>
> that provenance warning is not related to archive/retention.  It is
> provenance telling you it can only index events so fast and at present
> it is falling behind so will slow the flow to ensure things dont get
> too far out of balance.  However, there are configuration properties
> that let you give provenance indexing more threads.  Also, we created
> a new provenance implementation available in niFi 1.2.0 which is
> multiple times faster with immediate indexing.
>
> Thanks
>
> On Thu, May 25, 2017 at 11:03 AM, James McMahon <[email protected]>
> wrote:
> > Absolutely. Thank you for looking into this Aldrin.
> >
> > I do indeed have NiFi configured as a service. I've stopped an started it
> > dozens of times through the life of my workflow development these recent
> > months. It's always previously started up like a champ. On this
> particular
> > occasion I did this:
> > service nifi stop
> > as user nifi. It shutdown, and the logs presented no errors.
> > I then did this:
> > service nifi start
> > as user nifi. The bootstrap log contained the INFO messages I shared with
> > you above.
> >
> > Our data flow has not taxed NiFi much at all. There was no data
> processing
> > through at the time. We had recently done two bulk ingests of large data
> > directories. The content repo had indicated 46% full, but after I let it
> sit
> > overnight it had dropped back down to a typical level of 3-6%. As I
> learned
> > yesterday, with my archive retention set to 12 hours it explained why I
> was
> > seeing the content repo  hold on to all that capacity after all my
> 100,000
> > files had processed through late yesterday.
> >
> > Early this morning I modified my conf/nifi.properties to drop my archive
> > retention to 1 day from 12 days. This was when I tried and failed to
> > restart.
> >
> > We've since rebooted the host and NiFi came right up. With my new archive
> > retention value in place, I tried processing about 16,000 files through.
> > They flew through, but I have noticed a Warning that I believe is caused
> by
> > my change to archive retention:   WARNING The rate of the dataflow is
> > exceeding the provenance recording rate. Slowing down flow to
> accommodate.
> >
> > What else can I tell you? I suppose it would help to mention that my
> three
> > major repos - content, flowfile, provenance - are on separate local disk
> > devices.
> >
> > My workflow load peaks when I try to process approximately 100,000 files
> > totaling 50 GB through the flow. The content repo maxes out at 46% of our
> > 50GB capacity. The provenance and flowfile repos never peak into the
> double
> > digits. I do some custom parsing and custom logging in
> > InvokeScriptedProcessors. I employ HandleHttpResponse and
> HandleHttpRequests
> > processors.
> >
> > I've not yet watched memory usage on the box as I run, but I'll try to
> use a
> > 'watch -n [#] free -m'  later to see what happens. My nifi instance runs
> > with JVM memory parms in bootstrap.conf of -Xms4096m and -Xmx8192m.
> >
> > Jim
> >
> > On Thu, May 25, 2017 at 10:38 AM, Aldrin Piri <[email protected]>
> wrote:
> >>
> >> If you happen to remember, could you get more specific into your
> sequence
> >> of operations?  Is nifi installed as a service? If so, was it restarted
> >> Did you just issue a nifi.sh restart?
> >>
> >> Do you have any CM tooling (Puppet, Chef, Salt, etc) that is managing
> this
> >> process/system?
> >>
> >> Could you tell us what the bootstrap log says prior to those lines in
> >> terms of shutting down?
> >>
> >> Would you be able to describe the load exerted on the system by the
> flow?
> >> A bit of an amorphous question, but is/was the system heavily taxed
> running
> >> NiFi?
> >>
> >> The section you hit _should_ only be hit if NiFi (the flow process and
> not
> >> the bootstrap) terminates for some reason (e.g. - Hit an out of memory
> >> case).  I have a few notions as to how the right confluence of events
> could
> >> have gotten you otherwise, so any additional details would be great to
> vet
> >> their possible culpability.
> >>
> >> Thanks!
> >>
> >> On Thu, May 25, 2017 at 10:10 AM, James McMahon <[email protected]>
> >> wrote:
> >>>
> >>> I did inspect the log more closely. It offers little additional
> insight.
> >>> Here is what it says (unable to export, had to transcribe myself):
> >>>
> >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi Status
> >>> File no longer exists. Will not restart NiFi
> >>> [date] [time],### INFO [main] o.a.n.b.NotificationServiceManager
> >>> Successfully loaded the following 0 services: [ ]
> >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi
> >>> Registered no Notification Services for Notification Type NIFI_STARTED
> >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi
> >>> Registered no Notification Services for Notification Type NIFI_STOPPED
> >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi
> >>> Registered no Notification Services for Notification Type NIFI_DIED
> >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.Command Apache
> >>> NiFi is not running
> >>>
> >>> My hope is that we can figure out what happens to this status file, and
> >>> how I can prevent it from nonexistence.
> >>>
> >>> Jim
> >>>
> >>> On Thu, May 25, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote:
> >>>>
> >>>> I don't think rebooting the system had anything to do with NiFi's
> >>>> ability to startup.  But i'm not sure I understand that particular
> >>>> part of logic in the code in terms of the case it was defending
> >>>> against.
> >>>>
> >>>> On Thu, May 25, 2017 at 9:34 AM, James McMahon <[email protected]>
> >>>> wrote:
> >>>> > Will do Joe. I'll dig for that now.
> >>>> >
> >>>> > Infrastructure Group did reboot the box, which had been up and
> running
> >>>> > for
> >>>> > nearly two months. NiFi did indeed come up following the reboot. I
> >>>> > still
> >>>> > want to try and get you this log information so that I can learn
> what
> >>>> > triggers such a situation, and whether there is a more refined way
> to
> >>>> > solve
> >>>> > it than full system reboot. There are other things running on the
> >>>> > resource
> >>>> > and I should try to minimize impact to them by fully rebooting.
> >>>> >
> >>>> > Let me see about that log content. Thank you again.
> >>>> >
> >>>> > On Thu, May 25, 2017 at 9:25 AM, Joe Witt <[email protected]>
> wrote:
> >>>> >>
> >>>> >> Jim,
> >>>> >>
> >>>> >> The code relevant to that log output is here [1].  Can you share
> the
> >>>> >> bootstrap output before/after that output?
> >>>> >>
> >>>> >> [1]
> >>>> >>
> >>>> >> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-
> bootstrap/src/main/java/org/apache/nifi/bootstrap/RunNiFi.java
> >>>> >>
> >>>> >> Thanks
> >>>> >> Joe
> >>>> >>
> >>>> >> On Thu, May 25, 2017 at 9:11 AM, James McMahon <
> [email protected]>
> >>>> >> wrote:
> >>>> >> > Am running NiFi 0.7.x. Have been running with great stability
> for a
> >>>> >> > long
> >>>> >> > period of time. Tried this morning to make this change in my
> >>>> >> > nifi.properties
> >>>> >> > conf file:
> >>>> >> >
> >>>> >> > nifi.content.repository.archive.max.retention.period=1 hour
> >>>> >> >
> >>>> >> > Reduced from the default of 12 hours. Relatively simple change,
> >>>> >> > requires
> >>>> >> > a
> >>>> >> > nifi restart to take effect.
> >>>> >> >
> >>>> >> > My restart attempt throws no errors to the nifi app log, but in
> the
> >>>> >> > bootstrap log I do see this:
> >>>> >> > org.apache.nifi.bootstrap.RunNiFi Status file no longer exists.
> >>>> >> > Will not
> >>>> >> > restart NiFi
> >>>> >> >
> >>>> >> > I've done some digging and all I could find is rebooting the box
> in
> >>>> >> > hopes of
> >>>> >> > resolving. Am reaching out to the infrastructure group that owns
> >>>> >> > the
> >>>> >> > server
> >>>> >> > now, asking them to do so. Would like to also in parallel
> >>>> >> > understand why
> >>>> >> > this happened, and where, exactly, this status file should be?
> >>>> >> >
> >>>> >> > Can I resolve this by manually recreating such a status file with
> >>>> >> > certain
> >>>> >> > permissions and ownership?
> >>>> >> >
> >>>> >> > Thanks in advance for your help.  -Jim
> >>>> >> >
> >>>> >> >
> >>>> >
> >>>> >
> >>>
> >>>
> >>
> >
>

Re: NiFi will not restart, missing status file message in bootstrap

Reply via email to