Hi Alena, in the mlm_client.xml there is a state named "defaults" which is inherited by many others including "disconnecting". When the client is in "disconnecting" state and the server reconnects it will send a heartbeat which the client will answer with a connection ping and upon connection pong from the server the client will move from "disconnecting" state into "connected" state.
//Kevin 2016-04-18 8:47 GMT+02:00 Alena Chernikava <[email protected]>: > Hi, > > I would like to ask some questions and point out some problems in Malamute > broker. > > I am facing a problem with client reconnect procedure in malamute. Usually > a formal description allows me to better understand the problem, that is > why I started an investigation with creating a visualization of a state > machine for malamute client. I would say it helped me a lot :) Right away I > found some "strange behavior"s. I would like to ask some questions to make > it more clear for me (may be it was done intentionally) before I will try > to "experiment" with fixes. > > In the attachment you can find my hand-made visualization of the state > machine (I was doing it for myself, so it has my thoughts written down). > (GREEN - states, RED - events, BLUE - actions). It is not complete, but > already helped me to spot some potential and real problems. Here I would > describe some issues I found (numbering is the same as on the picture). > > 1. Re-connection problem. It is actually the main problem I want to > discuss. > > Situation: > client sends 3 PINGS and do not receive any PONGS back. After this client > will end up in the "disconnected" state. I would say that it is a black > hole state, as client cannot normally recover from it (to the "connected" > state) or at least move somewhere. > > Analysis: > * We can destroy the client. We will move out of "disconnected" state, but > we destroyed the client. :) End of work, nothing to do. Everything is fine > * We can move to the "connected" state, if client will receive "PONG" from > server or we can move to the "HAVE ERROR" state if client will receive > "ERROR" from server. In order to receive from server some response, we need > to send something to the server. And here we are: the client do not send > anything to the server :( PINGs are disabled in the "mlm_client.xml" from > the very beginning. > > Questions: > * Why PING was disabled in "disconnected" state? > * What was the basic idea for the "re connect" implementation? > > Proposal: > Enable PINGs. When server receive a PING from "unknown client" it will > send "ERROR" back that will trigger "re connection" procedure. But still, I > am not sure if client would reconnect correctly, but at least we can give > him a chance to do so, because now the client have no chance to reconnect > (if server is off for longer period) > > 2. Take a look on the picture on the right corner. > > in the mlm_client.xml: > > <state name = "connecting" inherit = "defaults"> > <event name = "OK" next = "connected"> > <action name = "signal success" /> > <action name = "client is connected" /> > </event> > This can cause that the following code will be ok (and actually I saw such > behavior couple times): > int rv = mlm_client_connect(); > assert (rv == 0) > assert (mlm_client_connected () == false) > > Proposal: do "signal success" after "client is connected" > Question: is there any reason to left the order as it is? > > 3+4. I didn't understand from the code one point. When client is supposed > to start heart beating? > I thought, that it should happen after client got "OK" response from the > server, but from the state machine I see that in the state "connecting" > (while waiting for the response from the server) heart beating starts. Is > this a bug or it was done intentionally? > > 5. It is just a bug, I will fix it later. If mlm_client_connect didn’t > work for the first time, the client should remain in «start" state. > > 6. It is a potential problem. If "PONG" will come before "OK" message from > server, the mlm_client_set_producer/consumer/worker will not end correctly > and potentially will never do a "return". I propose: return to "confirming" > state and wait for "OK" response from server. Do you think it will not > break anything? > > > > > > Thank you for reading this, waiting forward for your reply. > Alena Chernikava > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
