Re: Flow seized, cannot list flow files for full queue

Bryan Bende Thu, 22 Dec 2016 10:51:50 -0800

Nick,

Good news, I was able to reproduce this and I am fairly confident that as
long as you increase the swap threshold above 10k you shouldn't see this
problem anymore.


I created this JIRA which further describes what is happening:

https://issues.apache.org/jira/browse/NIFI-3250

Thanks,

Bryan

On Thu, Dec 22, 2016 at 12:59 PM, Bryan Bende <[email protected]> wrote:

> Nick,
>
> Thanks for reporting back.
>
> Just to confirm the scenario, you ran over night without any stalling
> happening, and then while nothing was stalled you stopped and started the
> GeoEnrichIP processor, which then didn't consume anything from the incoming
> queue?
> Or were things already stalled from overnight, and you stopped and started
> the processor to see if it would start processing again?
>
> I noticed in your nifi.properties you lowered the swap threshold to 1k,
> the default is 20k. Was there a specific reason for lowering it so much?
> Would you be to do another test putting that back to 20k?
>
> The way swapping works is that when the active queue for a processor
> reaches the threshold (1k in your case), it starts putting any additional
> flow files on to a separate swap queue, and when the swap queue reaches 10k
> it starts writing these swapped flow file files to disk in batches of 10k.
>
> I wouldn't expect setting the threshold to 1k to cause no processing to
> happen, but it will definitely cause a lot of extra work because as soon as
> 10k flowfiles are swapped back in, you are already over the 1k threshold
> again.
>
> One other thing to check would be to see if any crazy garbage collection
> is happening during these stalls. You could probably connect JVisualVM to
> one of your NiFi JVM processes and see if the GC activity graph is spiking
> up.
>
> -Bryan
>
>
> On Thu, Dec 22, 2016 at 11:36 AM, Nick Carenza <
> [email protected]> wrote:
>
>> I replaced the Kafka processor with PulishKafka_0_10. It didn't start
>> consuming from the stalled queue. I cleared all the queues again and it ran
>> overnight without stalling, longer than it has before. I stopped and
>> started the geoEnrichIp processor just now to see if it would stall and it
>> did. I should be able to restart a processor like that right, and it should
>> start consuming the queue again? As soon as I clear the stalled queue,
>> whether or not it's full, it starts flowing again.
>>
>> Thanks,
>> Nick
>>
>> On Wed, Dec 21, 2016 at 11:34 AM, Bryan Bende <[email protected]> wrote:
>>
>>> Thanks for the info.
>>>
>>> Since your Kafka broker is 0.10.1, I would be curious if you experience
>>> the same behavior switching to PublishKafka_0_10.
>>>
>>> The Kafka processors line up like this...
>>>
>>> GetKafka/PutKafka use the 0.8.x Kafka client
>>> ConsumeKafka/PublishKafka use the 0.9.x Kafka client
>>> ConsumeKafka_0_10/PublishKafka_0_10 use the 0.10.x Kafka client
>>>
>>> In some cases it is possible to use a version of the client with a
>>> different version of the broker, but it usually works best to use the
>>> client that goes with the broker.
>>>
>>> I'm wondering if your PutKafka processor is getting stuck somehow, which
>>> then causes back-pressure to build up all the way back to your TCP
>>> processor, since it looked like all your queues were filled up.
>>>
>>> It is entirely possible that there is something else going on, but maybe
>>> we can eliminate the Kafka processor from the list of possible problems by
>>> testing with PublishKafka_0_10.
>>>
>>> -Bryan
>>>
>>> On Wed, Dec 21, 2016 at 2:25 PM, Nick Carenza <
>>> [email protected]> wrote:
>>>
>>>> Hey Brian,
>>>>
>>>> Thanks for taking the time!
>>>>
>>>> - This is nifi 1.1.0. I had the same troubles on 1.0.0 and upgraded
>>>> recently with the hope there was a fix for the issue.
>>>> - Kafka is version 2.11-0.10.1.0
>>>> - I am using the PutKafka processor.
>>>>
>>>> - Nick
>>>>
>>>> On Wed, Dec 21, 2016 at 11:19 AM, Bryan Bende <[email protected]> wrote:
>>>>
>>>>> Hey Nick,
>>>>>
>>>>> Sorry to hear about these troubles. A couple of questions...
>>>>>
>>>>> - What version of NiFi is this?
>>>>> - What version of Kafka are you using?
>>>>> - Which Kafka processor in NiFi are you using? It looks like PutKafka,
>>>>> but just confirming.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Bryan
>>>>>
>>>>>
>>>>> On Wed, Dec 21, 2016 at 2:00 PM, Nick Carenza <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I am running into an issue where a processor will stop receiving flow
>>>>>> files from it's queue.
>>>>>>
>>>>>> flow: tcp --(100,000)--> evaljsonpath  --(100,000)--> geoip
>>>>>>  --(100,000)--> putkafka
>>>>>>
>>>>>> This time, putkafka is the processor that has stopped receiving
>>>>>> flowfiles
>>>>>>
>>>>>> I will try to list the queue and I'll get a message that says the
>>>>>> queue has no flow files in it. I checked the http request and the 
>>>>>> response
>>>>>> says there are 100,000 flow files in the queue but the flowFileSummaries
>>>>>> array is empty.
>>>>>>
>>>>>> GET /nifi-api/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35
>>>>>>> c2/listing-requests/22754339-0159-1000-2dc9-07db09366132 HTTP/1.1
>>>>>>> {
>>>>>>>   "listingRequest": {
>>>>>>>     "id": "22754339-0159-1000-2dc9-07db09366132",
>>>>>>>     "uri": "http://ipaddress:8080/nifi-ap
>>>>>>> i/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35c2/listi
>>>>>>> ng-requests/22754339-0159-1000-2dc9-07db09366132",
>>>>>>>     "submissionTime": "12/21/2016 17:37:07.385 UTC",
>>>>>>>     "lastUpdated": "17:37:07 UTC",
>>>>>>>     "percentCompleted": 100,
>>>>>>>     "finished": true,
>>>>>>>     "maxResults": 100,
>>>>>>>     "state": "Completed successfully",
>>>>>>>     "queueSize": {
>>>>>>>       "byteCount": 288609476,
>>>>>>>       "objectCount": 100000
>>>>>>>     },
>>>>>>>     "flowFileSummaries": [],
>>>>>>>     "sourceRunning": true,
>>>>>>>     "destinationRunning": true
>>>>>>>   }
>>>>>>> }
>>>>>>
>>>>>>
>>>>>> I tried stopping and starting all the processors, replacing the
>>>>>> putkafka with a new duplicate putkafka processor and moving the queue 
>>>>>> over
>>>>>> to it, restarting kafka itself. I ran a dump with all the processors
>>>>>> "running".
>>>>>>
>>>>>> Since this is not running in a production environment, as a last
>>>>>> resort I cleared the queue and then everything started flowing again.
>>>>>>
>>>>>> I have experienced this issue many times since I have begun
>>>>>> evaluating Nifi. I have heard others having great success with it so I am
>>>>>> convinced I have misconfigured something. I have tried to provide any
>>>>>> relevant configuration information here:
>>>>>>
>>>>>> # nifi.properties
>>>>>> nifi.version=1.1.0
>>>>>> nifi.flowcontroller.autoResumeState=true
>>>>>> nifi.flowcontroller.graceful.shutdown.period=10 sec
>>>>>> nifi.flowservice.writedelay.interval=500 ms
>>>>>> nifi.administrative.yield.duration=30 sec
>>>>>> nifi.bored.yield.duration=10 millis
>>>>>> nifi.state.management.provider.local=local-provider
>>>>>> nifi.swap.manager.implementation=org.apache.nifi.controller.
>>>>>> FileSystemSwapManager
>>>>>> nifi.queue.swap.threshold=1000
>>>>>> nifi.swap.in.period=5 sec
>>>>>> nifi.swap.in.threads=1
>>>>>> nifi.swap.out.period=5 sec
>>>>>> nifi.swap.out.threads=4
>>>>>> nifi.cluster.is.node=false
>>>>>> nifi.build.tag=nifi-1.1.0-RC2
>>>>>> nifi.build.branch=NIFI-3100-rc2
>>>>>> nifi.build.revision=f61e42c
>>>>>> nifi.build.timestamp=2016-11-26T04:39:37Z
>>>>>>
>>>>>> # JVM memory settings
>>>>>> java.arg.2=-Xms28g
>>>>>> java.arg.3=-Xmx28g
>>>>>> java.arg.13=-XX:+UseG1GC
>>>>>>
>>>>>> controller settings:
>>>>>> timer driven thread count: 10-20 (i have tried values from 10 to 20
>>>>>> and still experience the issue)
>>>>>> event drive thread count: 5 (haven't touched)
>>>>>>
>>>>>> processors:
>>>>>> concurrency: 1-20 (i have tried values from 1 to 20 and still
>>>>>> experience the issue)
>>>>>> scheduling: timer driven (run-schedule: 0 run-duration: 0)
>>>>>>
>>>>>> queues:
>>>>>> backpressure flowfile count: 100,000
>>>>>> backpressure flowfile size: 1G
>>>>>>
>>>>>> machine:
>>>>>> 128g ram
>>>>>> 20 cpu
>>>>>> disk: 3T
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Really I have 2 questions:
>>>>>>
>>>>>> 1. Why is this happening?
>>>>>> 2. Once the flow is in this state, how can I get it flowing again
>>>>>> without losing flowfiles?
>>>>>>
>>>>>> Thanks,
>>>>>> Nick
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flow seized, cannot list flow files for full queue

Reply via email to