Re: Back pressure deadlock

Mark Payne Mon, 23 Oct 2017 07:05:33 -0700

Arne,

Fair enough. NiFi could perhaps be smarter about looping connections instead of 
stopping at self-loops.


Another approach to this situation, which I have used, though, would be rather 
than having a flow that loops like you laid out
with PublishJMS -> LogAttribute -> Back to PublishJMS,
you could instead connect the 'failure' relationship to both PublishJMS as a 
self-loop and also connect it to the LogAttribute (or alerting
processor or whatever you have), and then set an age-off on that connection. So 
in this setup, even if the log/alerting processor
was having trouble, you'd not cause back pressure to be applied to PublishJMS 
because of the age-off. Typically in such a situation,
sending data to some sort of alerting/status publishing case, it is the case 
that age-off is appropriate (though granted it may not be 100%
of the time).

Another useful approach to consider in such a case may actually be to have 
Reporting Tasks [1] that would monitor the flow for large queues,
etc. While you can build such monitoring capabilities into the flow, I am a fan 
personally of 'pulling up' this logic out of the flow because it tends
to result in much cleaner, easier-to-understand, and easier-to-implement flows.

So I'm certainly not saying that what NiFi does is correct and perfect and 
can't be improved upon - any solution can probably be improved upon,
and NiFi is certainly rapidly improving each day. But I wanted to point out 
some ways that you can think about attacking the concerns that you
have with the current implementation.

Thanks!
-Mark


[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks



On Oct 23, 2017, at 9:45 AM, Arne Degenring 
<[email protected]<mailto:[email protected]>> wrote:

Hi Mark,

Thanks for clarifying that self-looping connections will still be processed in 
back pressure situations.

For this specific case, we can probably live without the additional routing to 
the logging component and back.

I think, however, that there are cases when such ping-pong routing in failure 
cases can be very useful. E.g. for alerting someone actively, publishing some 
information on a status page, ... etc.

Therefore I feel it would be great if NiFi could be extended to avoid such back 
pressure deadlock situations. Maybe through some kind of automatic deadlock 
detection, or by marking certain incoming relations as not back pressure 
relevant (same as self-looping connections).

Thanks,
Arne


On 23. Oct 2017, at 15:00, Mark Payne 
<[email protected]<mailto:[email protected]>> wrote:

Hi Arne,

Generally, the approach that is used in such a situation would be to route 
failure back to the PublishJMS processor
itself (without diverting first to a LogAttribute processor). The PublishJMS 
processors itself should be logging an error
with the FlowFile's identity. Then, troubleshooting can be done by inspecting 
the queue (right-click, List Queue) or
via Data Provenance [1]. When a processor encounters backpressure, it still 
will continue to process data that comes
in on self-looping connections. So the failure relationship would still get 
processed.

Does this help?

Thanks
-Mark



[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance



On Oct 23, 2017, at 6:46 AM, Arne Degenring 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

We came across a situation when we experience a kind of “back pressure dead 
lock”.

In our setup, this occurs around PublishJMS when the target JMS queue is full. 
Please find attached a screenshot of the relevant flow.

The failure relation we route to a logging component, and then back to 
PublishJMS for retry. Sooner or later, the failure and retry queues will become 
full and produce backpressure towards the main input (which is good). The 
problem is that the same back pressure is also applied to the retry queue.

In this situation, PublishJMS will not be called at all any longer. Even when 
the JMS problem resolves, the whole thing stays deadlocked.

Is there a recommended way to avoid such situation?

Obviously, an admin can temporarily increase the back pressure threshold of the 
failure connection, once the JMS problem is resolved. But it would be nicer if 
the problem could resolve automatically, i.e. PublishJMS should keep retrying 
somehow.

Any hints?

Thanks,
Arne



<backpressure-deadlock.png>

Re: Back pressure deadlock

Reply via email to