I'm also interested; personally never felt comfortable with the lack of 
visibility regarding things like connection failures that Messenger's api 
currently provides.

Tangentially related, perhaps - I'd like to see errors reported via the event 
collector interface.  While my issue is engine related, perhaps Messenger 
should provide applications access to the event "bus"?

I've opened a bug against the engine event model to include errors (at least 
for the transport/connection objects):

https://issues.apache.org/jira/browse/PROTON-656


-K

----- Original Message -----
> From: "Fraser Adams" <[email protected]>
> To: [email protected]
> Sent: Monday, September 8, 2014 2:07:23 PM
> Subject: Re: proton Messenger error handling/recovery REQUEST FEEDBACK!
> 
> Messenger gurus seem to be keeping their heads down a bit.
> 
> Is it *really* just Alan and I who are interested to understand the
> error handling/reconnection behaviour of Messenger?
> 
> Is anybody using it in "industrial strength" applications or is it just
> being used in quick and dirty demos? Without error handling and
> reconnection mechanisms I'm struggling to see how it can be used for the
> former.
> 
> I can likely hack things and Alan also mentioned that he "cheats", but
> I'd really like to know from people who really understand messenger how
> to do it *properly*.
> 
> Frase
> 
> 
> On 05/09/14 14:17, Alan Conway wrote:
> > On Thu, 2014-09-04 at 18:28 +0100, Fraser Adams wrote:
> >> On 03/09/14 23:29, Alan Conway wrote:
> >>> On Wed, 2014-09-03 at 20:05 +0100, Fraser Adams wrote:
> >>>> Hello,
> >>>> I've probably missed something, but I don't know how to reliably detect
> >>>> failures and reconnect.
> >>>>
> >>>> So if I sent to an address with a freshly stood up Messenger instance
> >>>> and the address can't be found things aren't too bad and I wind up with
> >>>> an ECONNREFUSED that I could do something with, however if I've been
> >>>> sending messages to a valid address then I kill off the consumer I see
> >>>> a:
> >>>>
> >>>> [0x513380]:ERROR amqp:connection:framing-error connection aborted
> >>>> [0x513380]:ERROR[-2] connection aborted
> >>>>
> >>>> CONNECTION ERROR connection aborted (remote)
> >>>>
> >>>> The thing is that all of these are *internally* generated messages sent
> >>>> to the console via fprintf, so my *application* doesn't really know
> >>>> about them (though I could be crafty and interpose my own cheeky fprintf
> >>>> to intercept them). That doesn't quite sound like the desired behaviour
> >>>> for a robust system?
> >>>>
> >>>>
> >>>> Similarly should I actually trap an error what's the correct way to
> >>>> continue, as it happens currently my app carries on silently doing
> >>>> nothing useful and continuing to do so even when the peer restarts (so
> >>>> there is no magic internal reconnection logic as far as I can see).
> >>>>
> >>>> do I have to do a
> >>>> messenger.stop()
> >>>> messenger.start()
> >>>>
> >>>> cycle to get things going again, I'm guessing so, but I'll like to know
> >>>> what the "correct"/expected way to create Messenger code that is robust
> >>>> against remote failures, as far as I can see there are no examples of
> >>>> that sort of thing?
> >>> I've come up against similar problems, I think it's an area that needs
> >>> some work in Proton. Is anybody already working on/thinking about this
> >>> area?
> >>>
> >>> Cheers,
> >>> Alan.
> >>>
> >> I'd definitely like to know how others deal with this sort of thing.
> > I cheat. I've been using proton in dispatch system tests, I come up
> > against these issues when I start up some proton/dispatch network and
> > try to use it too quickly before things have settled down. I have some
> > tweaks in my test harness to wait till things are ready so there are no
> > errors :) That's not a solution for general non-test situations -
> > although knowing how to wait till things are ready is always useful.
> >
> > https://svn.apache.org/repos/asf/qpid/dispatch/trunk/tests/system_test.py
> >
> > class Messenger adds a "flush" method that pumps the Messenger event
> > loop till there is no more work to do. Otherwise subscribe() in
> > particular gives no way to tell when the subscription is active.
> >
> > Note: My situation is a bit special in that dispatch creates addresses
> > dynamically on subscribe and my tests involve slow stuff like waypoints
> > to brokers etc. That introduces a delay in subscribe that probably isn't
> > visible when the address is created beforehand.
> >
> > There's also Qpidd.wait_ready and Qdrouterd.wait_ready that wait for
> > qpidd and dispatch router to be ready respectively so I can be sure that
> > when I connect with proton they'll be listening. Those wait for the
> > expected listening ports to be connectable and in the case of dispatch
> > also does a qmf check to make sure that all expected outgoing connectors
> > are there.
> >
> >> For info notwithstanding not necessarily being able to trap all the
> >> errors without being devious around fprintf  (which to be fair works,
> >> but it's a bit sneaky and if you have multiple Messenger instances won't
> >> tell you which one the error relates to) but when I do get an error I
> >> appear to have to start from scratch - in other words:
> >>
> >> message.free();
> >> messenger.free();
> >> message = new proton.Message();
> >> messenger = new proton.Messenger();
> >> messenger.start();
> >>
> >> If I try to restart the original messenger or use existing queue I get
> >> no joy. It's not the end of the world but I've no idea what robust
> >> Messenger code is *supposed* to look like.
> >>
> >> Presumably Alan and I aren't the only people who might like to be able
> >> to trap errors and restart? Or does every one else write code that never
> >> fails ;->
> > I always wondered how everybody but me can do that. Sigh. For you and me
> > I think we need to do some work on proton's error handling.
> >
> > - proton (or any library!) should NEVER EVER write anything direct to
> > stdout or stderr. It needs a (very simple) logging facility that can
> > write to stderr by default but can be redirected elsewhere.
> > - proton should never log an error without also returning some useful
> > error condition to the application.
> >
> > Proton has some useful pn_error_* functions, they just need to be used
> > more widely. In dispatch I introduced an errno-style thread-local error
> > code/message (in proton it would be a pn_error_t*) That allows sensible
> > error messages out of functions that want to return something else (e.g.
> > pointer or null and set the thread error) It also allows you to work
> > around lazy error handling (temporarily of course (hahahaha)) - a caller
> > couple of stack frames up can detect an error even if intermediate
> > functions didn't check & propagate errors properly. I'm not advocating
> > lazy error checking but in C it is hard to get everything.
> >
> > FEEDBACK PLEASE: anyone think this is a great/horrible idea? Does proton
> > already do things I've missed that would make this unnecessary?
> >
> > Cheers,
> > Alan.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 

-- 
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to