Thank you Joe. Do you advise, then, that we tune some parameters now or is it acceptable to allow NiFi to ..... self-regulate .... as it appears to be doing? If you suggest tuning, which ones should I look at - index.threads? I notice that at present I have that set to a robust 1.
That improvement with 1.2.0 sounds like it will make a big difference. Sadly as you recall it may be some time before 1.2.x is available to me. On Thu, May 25, 2017 at 11:05 AM, Joe Witt <[email protected]> wrote: > jim > > that provenance warning is not related to archive/retention. It is > provenance telling you it can only index events so fast and at present > it is falling behind so will slow the flow to ensure things dont get > too far out of balance. However, there are configuration properties > that let you give provenance indexing more threads. Also, we created > a new provenance implementation available in niFi 1.2.0 which is > multiple times faster with immediate indexing. > > Thanks > > On Thu, May 25, 2017 at 11:03 AM, James McMahon <[email protected]> > wrote: > > Absolutely. Thank you for looking into this Aldrin. > > > > I do indeed have NiFi configured as a service. I've stopped an started it > > dozens of times through the life of my workflow development these recent > > months. It's always previously started up like a champ. On this > particular > > occasion I did this: > > service nifi stop > > as user nifi. It shutdown, and the logs presented no errors. > > I then did this: > > service nifi start > > as user nifi. The bootstrap log contained the INFO messages I shared with > > you above. > > > > Our data flow has not taxed NiFi much at all. There was no data > processing > > through at the time. We had recently done two bulk ingests of large data > > directories. The content repo had indicated 46% full, but after I let it > sit > > overnight it had dropped back down to a typical level of 3-6%. As I > learned > > yesterday, with my archive retention set to 12 hours it explained why I > was > > seeing the content repo hold on to all that capacity after all my > 100,000 > > files had processed through late yesterday. > > > > Early this morning I modified my conf/nifi.properties to drop my archive > > retention to 1 day from 12 days. This was when I tried and failed to > > restart. > > > > We've since rebooted the host and NiFi came right up. With my new archive > > retention value in place, I tried processing about 16,000 files through. > > They flew through, but I have noticed a Warning that I believe is caused > by > > my change to archive retention: WARNING The rate of the dataflow is > > exceeding the provenance recording rate. Slowing down flow to > accommodate. > > > > What else can I tell you? I suppose it would help to mention that my > three > > major repos - content, flowfile, provenance - are on separate local disk > > devices. > > > > My workflow load peaks when I try to process approximately 100,000 files > > totaling 50 GB through the flow. The content repo maxes out at 46% of our > > 50GB capacity. The provenance and flowfile repos never peak into the > double > > digits. I do some custom parsing and custom logging in > > InvokeScriptedProcessors. I employ HandleHttpResponse and > HandleHttpRequests > > processors. > > > > I've not yet watched memory usage on the box as I run, but I'll try to > use a > > 'watch -n [#] free -m' later to see what happens. My nifi instance runs > > with JVM memory parms in bootstrap.conf of -Xms4096m and -Xmx8192m. > > > > Jim > > > > On Thu, May 25, 2017 at 10:38 AM, Aldrin Piri <[email protected]> > wrote: > >> > >> If you happen to remember, could you get more specific into your > sequence > >> of operations? Is nifi installed as a service? If so, was it restarted > >> Did you just issue a nifi.sh restart? > >> > >> Do you have any CM tooling (Puppet, Chef, Salt, etc) that is managing > this > >> process/system? > >> > >> Could you tell us what the bootstrap log says prior to those lines in > >> terms of shutting down? > >> > >> Would you be able to describe the load exerted on the system by the > flow? > >> A bit of an amorphous question, but is/was the system heavily taxed > running > >> NiFi? > >> > >> The section you hit _should_ only be hit if NiFi (the flow process and > not > >> the bootstrap) terminates for some reason (e.g. - Hit an out of memory > >> case). I have a few notions as to how the right confluence of events > could > >> have gotten you otherwise, so any additional details would be great to > vet > >> their possible culpability. > >> > >> Thanks! > >> > >> On Thu, May 25, 2017 at 10:10 AM, James McMahon <[email protected]> > >> wrote: > >>> > >>> I did inspect the log more closely. It offers little additional > insight. > >>> Here is what it says (unable to export, had to transcribe myself): > >>> > >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi Status > >>> File no longer exists. Will not restart NiFi > >>> [date] [time],### INFO [main] o.a.n.b.NotificationServiceManager > >>> Successfully loaded the following 0 services: [ ] > >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi > >>> Registered no Notification Services for Notification Type NIFI_STARTED > >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi > >>> Registered no Notification Services for Notification Type NIFI_STOPPED > >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi > >>> Registered no Notification Services for Notification Type NIFI_DIED > >>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.Command Apache > >>> NiFi is not running > >>> > >>> My hope is that we can figure out what happens to this status file, and > >>> how I can prevent it from nonexistence. > >>> > >>> Jim > >>> > >>> On Thu, May 25, 2017 at 9:37 AM, Joe Witt <[email protected]> wrote: > >>>> > >>>> I don't think rebooting the system had anything to do with NiFi's > >>>> ability to startup. But i'm not sure I understand that particular > >>>> part of logic in the code in terms of the case it was defending > >>>> against. > >>>> > >>>> On Thu, May 25, 2017 at 9:34 AM, James McMahon <[email protected]> > >>>> wrote: > >>>> > Will do Joe. I'll dig for that now. > >>>> > > >>>> > Infrastructure Group did reboot the box, which had been up and > running > >>>> > for > >>>> > nearly two months. NiFi did indeed come up following the reboot. I > >>>> > still > >>>> > want to try and get you this log information so that I can learn > what > >>>> > triggers such a situation, and whether there is a more refined way > to > >>>> > solve > >>>> > it than full system reboot. There are other things running on the > >>>> > resource > >>>> > and I should try to minimize impact to them by fully rebooting. > >>>> > > >>>> > Let me see about that log content. Thank you again. > >>>> > > >>>> > On Thu, May 25, 2017 at 9:25 AM, Joe Witt <[email protected]> > wrote: > >>>> >> > >>>> >> Jim, > >>>> >> > >>>> >> The code relevant to that log output is here [1]. Can you share > the > >>>> >> bootstrap output before/after that output? > >>>> >> > >>>> >> [1] > >>>> >> > >>>> >> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi- > bootstrap/src/main/java/org/apache/nifi/bootstrap/RunNiFi.java > >>>> >> > >>>> >> Thanks > >>>> >> Joe > >>>> >> > >>>> >> On Thu, May 25, 2017 at 9:11 AM, James McMahon < > [email protected]> > >>>> >> wrote: > >>>> >> > Am running NiFi 0.7.x. Have been running with great stability > for a > >>>> >> > long > >>>> >> > period of time. Tried this morning to make this change in my > >>>> >> > nifi.properties > >>>> >> > conf file: > >>>> >> > > >>>> >> > nifi.content.repository.archive.max.retention.period=1 hour > >>>> >> > > >>>> >> > Reduced from the default of 12 hours. Relatively simple change, > >>>> >> > requires > >>>> >> > a > >>>> >> > nifi restart to take effect. > >>>> >> > > >>>> >> > My restart attempt throws no errors to the nifi app log, but in > the > >>>> >> > bootstrap log I do see this: > >>>> >> > org.apache.nifi.bootstrap.RunNiFi Status file no longer exists. > >>>> >> > Will not > >>>> >> > restart NiFi > >>>> >> > > >>>> >> > I've done some digging and all I could find is rebooting the box > in > >>>> >> > hopes of > >>>> >> > resolving. Am reaching out to the infrastructure group that owns > >>>> >> > the > >>>> >> > server > >>>> >> > now, asking them to do so. Would like to also in parallel > >>>> >> > understand why > >>>> >> > this happened, and where, exactly, this status file should be? > >>>> >> > > >>>> >> > Can I resolve this by manually recreating such a status file with > >>>> >> > certain > >>>> >> > permissions and ownership? > >>>> >> > > >>>> >> > Thanks in advance for your help. -Jim > >>>> >> > > >>>> >> > > >>>> > > >>>> > > >>> > >>> > >> > > >
