I'm also interested; personally never felt comfortable with the lack of visibility regarding things like connection failures that Messenger's api currently provides.
Tangentially related, perhaps - I'd like to see errors reported via the event collector interface. While my issue is engine related, perhaps Messenger should provide applications access to the event "bus"? I've opened a bug against the engine event model to include errors (at least for the transport/connection objects): https://issues.apache.org/jira/browse/PROTON-656 -K ----- Original Message ----- > From: "Fraser Adams" <[email protected]> > To: [email protected] > Sent: Monday, September 8, 2014 2:07:23 PM > Subject: Re: proton Messenger error handling/recovery REQUEST FEEDBACK! > > Messenger gurus seem to be keeping their heads down a bit. > > Is it *really* just Alan and I who are interested to understand the > error handling/reconnection behaviour of Messenger? > > Is anybody using it in "industrial strength" applications or is it just > being used in quick and dirty demos? Without error handling and > reconnection mechanisms I'm struggling to see how it can be used for the > former. > > I can likely hack things and Alan also mentioned that he "cheats", but > I'd really like to know from people who really understand messenger how > to do it *properly*. > > Frase > > > On 05/09/14 14:17, Alan Conway wrote: > > On Thu, 2014-09-04 at 18:28 +0100, Fraser Adams wrote: > >> On 03/09/14 23:29, Alan Conway wrote: > >>> On Wed, 2014-09-03 at 20:05 +0100, Fraser Adams wrote: > >>>> Hello, > >>>> I've probably missed something, but I don't know how to reliably detect > >>>> failures and reconnect. > >>>> > >>>> So if I sent to an address with a freshly stood up Messenger instance > >>>> and the address can't be found things aren't too bad and I wind up with > >>>> an ECONNREFUSED that I could do something with, however if I've been > >>>> sending messages to a valid address then I kill off the consumer I see > >>>> a: > >>>> > >>>> [0x513380]:ERROR amqp:connection:framing-error connection aborted > >>>> [0x513380]:ERROR[-2] connection aborted > >>>> > >>>> CONNECTION ERROR connection aborted (remote) > >>>> > >>>> The thing is that all of these are *internally* generated messages sent > >>>> to the console via fprintf, so my *application* doesn't really know > >>>> about them (though I could be crafty and interpose my own cheeky fprintf > >>>> to intercept them). That doesn't quite sound like the desired behaviour > >>>> for a robust system? > >>>> > >>>> > >>>> Similarly should I actually trap an error what's the correct way to > >>>> continue, as it happens currently my app carries on silently doing > >>>> nothing useful and continuing to do so even when the peer restarts (so > >>>> there is no magic internal reconnection logic as far as I can see). > >>>> > >>>> do I have to do a > >>>> messenger.stop() > >>>> messenger.start() > >>>> > >>>> cycle to get things going again, I'm guessing so, but I'll like to know > >>>> what the "correct"/expected way to create Messenger code that is robust > >>>> against remote failures, as far as I can see there are no examples of > >>>> that sort of thing? > >>> I've come up against similar problems, I think it's an area that needs > >>> some work in Proton. Is anybody already working on/thinking about this > >>> area? > >>> > >>> Cheers, > >>> Alan. > >>> > >> I'd definitely like to know how others deal with this sort of thing. > > I cheat. I've been using proton in dispatch system tests, I come up > > against these issues when I start up some proton/dispatch network and > > try to use it too quickly before things have settled down. I have some > > tweaks in my test harness to wait till things are ready so there are no > > errors :) That's not a solution for general non-test situations - > > although knowing how to wait till things are ready is always useful. > > > > https://svn.apache.org/repos/asf/qpid/dispatch/trunk/tests/system_test.py > > > > class Messenger adds a "flush" method that pumps the Messenger event > > loop till there is no more work to do. Otherwise subscribe() in > > particular gives no way to tell when the subscription is active. > > > > Note: My situation is a bit special in that dispatch creates addresses > > dynamically on subscribe and my tests involve slow stuff like waypoints > > to brokers etc. That introduces a delay in subscribe that probably isn't > > visible when the address is created beforehand. > > > > There's also Qpidd.wait_ready and Qdrouterd.wait_ready that wait for > > qpidd and dispatch router to be ready respectively so I can be sure that > > when I connect with proton they'll be listening. Those wait for the > > expected listening ports to be connectable and in the case of dispatch > > also does a qmf check to make sure that all expected outgoing connectors > > are there. > > > >> For info notwithstanding not necessarily being able to trap all the > >> errors without being devious around fprintf (which to be fair works, > >> but it's a bit sneaky and if you have multiple Messenger instances won't > >> tell you which one the error relates to) but when I do get an error I > >> appear to have to start from scratch - in other words: > >> > >> message.free(); > >> messenger.free(); > >> message = new proton.Message(); > >> messenger = new proton.Messenger(); > >> messenger.start(); > >> > >> If I try to restart the original messenger or use existing queue I get > >> no joy. It's not the end of the world but I've no idea what robust > >> Messenger code is *supposed* to look like. > >> > >> Presumably Alan and I aren't the only people who might like to be able > >> to trap errors and restart? Or does every one else write code that never > >> fails ;-> > > I always wondered how everybody but me can do that. Sigh. For you and me > > I think we need to do some work on proton's error handling. > > > > - proton (or any library!) should NEVER EVER write anything direct to > > stdout or stderr. It needs a (very simple) logging facility that can > > write to stderr by default but can be redirected elsewhere. > > - proton should never log an error without also returning some useful > > error condition to the application. > > > > Proton has some useful pn_error_* functions, they just need to be used > > more widely. In dispatch I introduced an errno-style thread-local error > > code/message (in proton it would be a pn_error_t*) That allows sensible > > error messages out of functions that want to return something else (e.g. > > pointer or null and set the thread error) It also allows you to work > > around lazy error handling (temporarily of course (hahahaha)) - a caller > > couple of stack frames up can detect an error even if intermediate > > functions didn't check & propagate errors properly. I'm not advocating > > lazy error checking but in C it is hard to get everything. > > > > FEEDBACK PLEASE: anyone think this is a great/horrible idea? Does proton > > already do things I've missed that would make this unnecessary? > > > > Cheers, > > Alan. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- -K --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
