Nick, Thanks for reporting back.
Just to confirm the scenario, you ran over night without any stalling happening, and then while nothing was stalled you stopped and started the GeoEnrichIP processor, which then didn't consume anything from the incoming queue? Or were things already stalled from overnight, and you stopped and started the processor to see if it would start processing again? I noticed in your nifi.properties you lowered the swap threshold to 1k, the default is 20k. Was there a specific reason for lowering it so much? Would you be to do another test putting that back to 20k? The way swapping works is that when the active queue for a processor reaches the threshold (1k in your case), it starts putting any additional flow files on to a separate swap queue, and when the swap queue reaches 10k it starts writing these swapped flow file files to disk in batches of 10k. I wouldn't expect setting the threshold to 1k to cause no processing to happen, but it will definitely cause a lot of extra work because as soon as 10k flowfiles are swapped back in, you are already over the 1k threshold again. One other thing to check would be to see if any crazy garbage collection is happening during these stalls. You could probably connect JVisualVM to one of your NiFi JVM processes and see if the GC activity graph is spiking up. -Bryan On Thu, Dec 22, 2016 at 11:36 AM, Nick Carenza < [email protected]> wrote: > I replaced the Kafka processor with PulishKafka_0_10. It didn't start > consuming from the stalled queue. I cleared all the queues again and it ran > overnight without stalling, longer than it has before. I stopped and > started the geoEnrichIp processor just now to see if it would stall and it > did. I should be able to restart a processor like that right, and it should > start consuming the queue again? As soon as I clear the stalled queue, > whether or not it's full, it starts flowing again. > > Thanks, > Nick > > On Wed, Dec 21, 2016 at 11:34 AM, Bryan Bende <[email protected]> wrote: > >> Thanks for the info. >> >> Since your Kafka broker is 0.10.1, I would be curious if you experience >> the same behavior switching to PublishKafka_0_10. >> >> The Kafka processors line up like this... >> >> GetKafka/PutKafka use the 0.8.x Kafka client >> ConsumeKafka/PublishKafka use the 0.9.x Kafka client >> ConsumeKafka_0_10/PublishKafka_0_10 use the 0.10.x Kafka client >> >> In some cases it is possible to use a version of the client with a >> different version of the broker, but it usually works best to use the >> client that goes with the broker. >> >> I'm wondering if your PutKafka processor is getting stuck somehow, which >> then causes back-pressure to build up all the way back to your TCP >> processor, since it looked like all your queues were filled up. >> >> It is entirely possible that there is something else going on, but maybe >> we can eliminate the Kafka processor from the list of possible problems by >> testing with PublishKafka_0_10. >> >> -Bryan >> >> On Wed, Dec 21, 2016 at 2:25 PM, Nick Carenza < >> [email protected]> wrote: >> >>> Hey Brian, >>> >>> Thanks for taking the time! >>> >>> - This is nifi 1.1.0. I had the same troubles on 1.0.0 and upgraded >>> recently with the hope there was a fix for the issue. >>> - Kafka is version 2.11-0.10.1.0 >>> - I am using the PutKafka processor. >>> >>> - Nick >>> >>> On Wed, Dec 21, 2016 at 11:19 AM, Bryan Bende <[email protected]> wrote: >>> >>>> Hey Nick, >>>> >>>> Sorry to hear about these troubles. A couple of questions... >>>> >>>> - What version of NiFi is this? >>>> - What version of Kafka are you using? >>>> - Which Kafka processor in NiFi are you using? It looks like PutKafka, >>>> but just confirming. >>>> >>>> Thanks, >>>> >>>> Bryan >>>> >>>> >>>> On Wed, Dec 21, 2016 at 2:00 PM, Nick Carenza < >>>> [email protected]> wrote: >>>> >>>>> I am running into an issue where a processor will stop receiving flow >>>>> files from it's queue. >>>>> >>>>> flow: tcp --(100,000)--> evaljsonpath --(100,000)--> geoip >>>>> --(100,000)--> putkafka >>>>> >>>>> This time, putkafka is the processor that has stopped receiving >>>>> flowfiles​ >>>>> >>>>> I will try to list the queue and I'll get a message that says the >>>>> queue has no flow files in it. I checked the http request and the response >>>>> says there are 100,000 flow files in the queue but the flowFileSummaries >>>>> array is empty. >>>>> >>>>> GET /nifi-api/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35 >>>>>> c2/listing-requests/22754339-0159-1000-2dc9-07db09366132 HTTP/1.1 >>>>>> { >>>>>> "listingRequest": { >>>>>> "id": "22754339-0159-1000-2dc9-07db09366132", >>>>>> "uri": "http://ipaddress:8080/nifi-ap >>>>>> i/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35c2/listi >>>>>> ng-requests/22754339-0159-1000-2dc9-07db09366132", >>>>>> "submissionTime": "12/21/2016 17:37:07.385 UTC", >>>>>> "lastUpdated": "17:37:07 UTC", >>>>>> "percentCompleted": 100, >>>>>> "finished": true, >>>>>> "maxResults": 100, >>>>>> "state": "Completed successfully", >>>>>> "queueSize": { >>>>>> "byteCount": 288609476, >>>>>> "objectCount": 100000 >>>>>> }, >>>>>> "flowFileSummaries": [], >>>>>> "sourceRunning": true, >>>>>> "destinationRunning": true >>>>>> } >>>>>> } >>>>> >>>>> >>>>> I tried stopping and starting all the processors, replacing the >>>>> putkafka with a new duplicate putkafka processor and moving the queue over >>>>> to it, restarting kafka itself. I ran a dump with all the processors >>>>> "running". >>>>> >>>>> Since this is not running in a production environment, as a last >>>>> resort I cleared the queue and then everything started flowing again. >>>>> >>>>> I have experienced this issue many times since I have begun evaluating >>>>> Nifi. I have heard others having great success with it so I am convinced I >>>>> have misconfigured something. I have tried to provide any relevant >>>>> configuration information here: >>>>> >>>>> # nifi.properties >>>>> nifi.version=1.1.0 >>>>> nifi.flowcontroller.autoResumeState=true >>>>> nifi.flowcontroller.graceful.shutdown.period=10 sec >>>>> nifi.flowservice.writedelay.interval=500 ms >>>>> nifi.administrative.yield.duration=30 sec >>>>> nifi.bored.yield.duration=10 millis >>>>> nifi.state.management.provider.local=local-provider >>>>> nifi.swap.manager.implementation=org.apache.nifi.controller. >>>>> FileSystemSwapManager >>>>> nifi.queue.swap.threshold=1000 >>>>> nifi.swap.in.period=5 sec >>>>> nifi.swap.in.threads=1 >>>>> nifi.swap.out.period=5 sec >>>>> nifi.swap.out.threads=4 >>>>> nifi.cluster.is.node=false >>>>> nifi.build.tag=nifi-1.1.0-RC2 >>>>> nifi.build.branch=NIFI-3100-rc2 >>>>> nifi.build.revision=f61e42c >>>>> nifi.build.timestamp=2016-11-26T04:39:37Z >>>>> >>>>> # JVM memory settings >>>>> java.arg.2=-Xms28g >>>>> java.arg.3=-Xmx28g >>>>> java.arg.13=-XX:+UseG1GC >>>>> >>>>> controller settings: >>>>> timer driven thread count: 10-20 (i have tried values from 10 to 20 >>>>> and still experience the issue) >>>>> event drive thread count: 5 (haven't touched) >>>>> >>>>> processors: >>>>> concurrency: 1-20 (i have tried values from 1 to 20 and still >>>>> experience the issue) >>>>> scheduling: timer driven (run-schedule: 0 run-duration: 0) >>>>> >>>>> queues: >>>>> backpressure flowfile count: 100,000 >>>>> backpressure flowfile size: 1G >>>>> >>>>> machine: >>>>> 128g ram >>>>> 20 cpu >>>>> disk: 3T >>>>> >>>>> --- >>>>> >>>>> Really I have 2 questions: >>>>> >>>>> 1. Why is this happening? >>>>> 2. Once the flow is in this state, how can I get it flowing again >>>>> without losing flowfiles? >>>>> >>>>> Thanks, >>>>> Nick >>>>> >>>>> >>>>> >>>> >>> >> >
