I think the main problem here is not in the messenger but a problem in the protocol engine I discovered a while ago: https://issues.apache.org/jira/browse/PROTON-644?filter=-2 ... there is no way for the messenger to recover or handle the error because it can not see the connection fail.
-- Marcel On Sat, Oct 4, 2014 at 7:40 PM, Fraser Adams <[email protected]> wrote: > Is there any way to recover from Messenger errors short of completely > freeing the messenger instance and starting with a new one? > > > I've been deliberately making it fail, so for example starting a messenger > with subscriptions like this: > amqp://~0.0.0.0,localhost:5672 > > with no broker running the first subscription should succeed and the second > one should fail > > In my case it's a bit more awkward because it's fully asynchronous, but what > I see in this case is that it creates a connection instance to > localhost:5672 because in pn_connect there is a test for > > if (connect(sock, addr->ai_addr, addr->ai_addrlen) == -1) { > if (errno != EINPROGRESS) { > pn_i_error_from_errno(io->error, "connect"); > freeaddrinfo(addr); > close(sock); > return PN_INVALID_SOCKET; > } > } > > with my connect on a non-blocking socket EINPROGRESS is set so the socket > ends up being valid, but subsequently it will fail to connect. > > > I've actually got a listener that can detect the Connection refused, but > what I can't seem to do is to cleanly clear the connection object. > > I've tried all sorts of hacks around > pn_messenger_resolve/pni_messenger_reclaim (in that case > pn_messenger_resolve found the connection object given the name > "localhost:5672" which was found OK then I tried a pni_messenger_reclaim > hack to clear it, but that didn't seem to close the underlying socket). > > I also tried to find the relevant selectable pn_messenger_selectable that > matched the file descriptor of the failed connection I then tried a > pni_connection_finalize(sel) hack. In that case I seem to free up the > connection and the underlying socket gets closed, but when I subsequently > try to connect (to the working amqp://~0.0.0.0) although I get an accept on > the right file descriptor I subsequently get an assertion failed at > messenger.c,151,pni_context at Error > > > So in short given that a connection object gets created because of a connect > on a non-blocking socket, which subsequently and asynchronously fails to > connect there doesn't seem any way to tidy up that failed connection. > > To be clear if I have subscriptions > amqp://~0.0.0.0,localhost:5672 > > And ignore any errors and don't bother to try and tidy up and I subsequently > do a client connection to amqp://0.0.0.0 my client connects fine but on the > next file descriptor up from the one created by the failed localhost:5672 > connection so basically my failed subscription has leaked a connection. That > is the listen fd for amqp://~0.0.0.0 is 3 the (failed) fd for localhost:5672 > is 4 and when I connect to amqp://~0.0.0.0 the accept fd is 5, it really > should be 4 but I can't get shot of the connection object etc. for > localhost:5672. > > The only way to deal with it seems to be to free and create a new messenger > when anything fails, which is a pain because the subscription > amqp://~0.0.0.0 is actually fine. > > > TBH messenger's error handling is driving me nuts, it has been mentioned in > a few threads that it might be better to give up on messenger and just use > engine. > > Is messenger really irredeemably broken? Without decent error > handling/recovery it's very little use in a production environment. > > Frase > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
