Absolutely. Thank you for looking into this Aldrin. I do indeed have NiFi configured as a service. I've stopped an started it dozens of times through the life of my workflow development these recent months. It's always previously started up like a champ. On this particular occasion I did this: service nifi stop as user nifi. It shutdown, and the logs presented no errors. I then did this: service nifi start as user nifi. The bootstrap log contained the INFO messages I shared with you above.
Our data flow has not taxed NiFi much at all. There was no data processing through at the time. We had recently done two bulk ingests of large data directories. The content repo had indicated 46% full, but after I let it sit overnight it had dropped back down to a typical level of 3-6%. As I learned yesterday, with my archive retention set to 12 hours it explained why I was seeing the content repo hold on to all that capacity after all my 100,000 files had processed through late yesterday. Early this morning I modified my conf/nifi.properties to drop my archive retention to 1 day from 12 days. This was when I tried and failed to restart. We've since rebooted the host and NiFi came right up. With my new archive retention value in place, I tried processing about 16,000 files through. They flew through, but I have noticed a Warning that I believe is caused by my change to archive retention: WARNING The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate. What else can I tell you? I suppose it would help to mention that my three major repos - content, flowfile, provenance - are on separate local disk devices. My workflow load peaks when I try to process approximately 100,000 files totaling 50 GB through the flow. The content repo maxes out at 46% of our 50GB capacity. The provenance and flowfile repos never peak into the double digits. I do some custom parsing and custom logging in InvokeScriptedProcessors. I employ HandleHttpResponse and HandleHttpRequests processors. I've not yet watched memory usage on the box as I run, but I'll try to use a 'watch -n [#] free -m' later to see what happens. My nifi instance runs with JVM memory parms in bootstrap.conf of -Xms4096m and -Xmx8192m. Jim On Thu, May 25, 2017 at 10:38 AM, Aldrin Piri <[email protected]> wrote: > If you happen to remember, could you get more specific into your sequence > of operations? Is nifi installed as a service? If so, was it restarted > Did you just issue a nifi.sh restart? > > Do you have any CM tooling (Puppet, Chef, Salt, etc) that is managing this > process/system? > > Could you tell us what the bootstrap log says prior to those lines in > terms of shutting down? > > Would you be able to describe the load exerted on the system by the flow? > A bit of an amorphous question, but is/was the system heavily taxed running > NiFi? > > The section you hit _should_ only be hit if NiFi (the flow process and not > the bootstrap) terminates for some reason (e.g. - Hit an out of memory > case). I have a few notions as to how the right confluence of events could > have gotten you otherwise, so any additional details would be great to vet > their possible culpability. > > Thanks! > > On Thu, May 25, 2017 at 10:10 AM, James McMahon <[email protected]> > wrote: > >> I did inspect the log more closely. It offers little additional insight. >> Here is what it says (unable to export, had to transcribe myself): >> >> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi Status >> File no longer exists. Will not restart NiFi >> [date] [time],### INFO [main] o.a.n.b.NotificationServiceManager >> Successfully loaded the following 0 services: [ ] >> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi >> Registered no Notification Services for Notification Type NIFI_STARTED >> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi >> Registered no Notification Services for Notification Type NIFI_STOPPED >> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi >> Registered no Notification Services for Notification Type NIFI_DIED >> [date] [time],### INFO [main] org.apache.nifi.bootstrap.Command Apache >> NiFi is not running >> >> My hope is that we can figure out what happens to this status file, and >> how I can prevent it from nonexistence. >> >> Jim >> >> On Thu, May 25, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote: >> >>> I don't think rebooting the system had anything to do with NiFi's >>> ability to startup. But i'm not sure I understand that particular >>> part of logic in the code in terms of the case it was defending >>> against. >>> >>> On Thu, May 25, 2017 at 9:34 AM, James McMahon <[email protected]> >>> wrote: >>> > Will do Joe. I'll dig for that now. >>> > >>> > Infrastructure Group did reboot the box, which had been up and running >>> for >>> > nearly two months. NiFi did indeed come up following the reboot. I >>> still >>> > want to try and get you this log information so that I can learn what >>> > triggers such a situation, and whether there is a more refined way to >>> solve >>> > it than full system reboot. There are other things running on the >>> resource >>> > and I should try to minimize impact to them by fully rebooting. >>> > >>> > Let me see about that log content. Thank you again. >>> > >>> > On Thu, May 25, 2017 at 9:25 AM, Joe Witt <[email protected]> wrote: >>> >> >>> >> Jim, >>> >> >>> >> The code relevant to that log output is here [1]. Can you share the >>> >> bootstrap output before/after that output? >>> >> >>> >> [1] >>> >> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-boot >>> strap/src/main/java/org/apache/nifi/bootstrap/RunNiFi.java >>> >> >>> >> Thanks >>> >> Joe >>> >> >>> >> On Thu, May 25, 2017 at 9:11 AM, James McMahon <[email protected]> >>> >> wrote: >>> >> > Am running NiFi 0.7.x. Have been running with great stability for a >>> long >>> >> > period of time. Tried this morning to make this change in my >>> >> > nifi.properties >>> >> > conf file: >>> >> > >>> >> > nifi.content.repository.archive.max.retention.period=1 hour >>> >> > >>> >> > Reduced from the default of 12 hours. Relatively simple change, >>> requires >>> >> > a >>> >> > nifi restart to take effect. >>> >> > >>> >> > My restart attempt throws no errors to the nifi app log, but in the >>> >> > bootstrap log I do see this: >>> >> > org.apache.nifi.bootstrap.RunNiFi Status file no longer exists. >>> Will not >>> >> > restart NiFi >>> >> > >>> >> > I've done some digging and all I could find is rebooting the box in >>> >> > hopes of >>> >> > resolving. Am reaching out to the infrastructure group that owns the >>> >> > server >>> >> > now, asking them to do so. Would like to also in parallel >>> understand why >>> >> > this happened, and where, exactly, this status file should be? >>> >> > >>> >> > Can I resolve this by manually recreating such a status file with >>> >> > certain >>> >> > permissions and ownership? >>> >> > >>> >> > Thanks in advance for your help. -Jim >>> >> > >>> >> > >>> > >>> > >>> >> >> >
