Hi all,

I try to debug a weird state between mlm clients and broker. I was not
sure how much the pdf is up to date with recent state definitions. So
I have hacked a zproto_dot.gsl (as usuall procastinating is better
that the real work ;-)), so the state diagrams will be generated
automatically by zproto.

You can see the result here

https://github.com/vyskocilm/malamute-core/blob/master/src/mlm_client.svg
https://github.com/vyskocilm/malamute-core/blob/master/src/mlm_server.svg

Bye
Michal

On Wed, Apr 20, 2016 at 6:57 PM, Pieter Hintjens <[email protected]> wrote:
> Sounds right to me.
>
> On Mon, Apr 18, 2016 at 1:29 PM, Kevin Sapper <[email protected]> wrote:
>> Okay, seems I was a little bit to quick :(. Great analysis btw :)
>>
>> Your correct the client cannot recover from disconnected state. The
>> heartbeat event has been overridden so the client itself will stop sending
>> heartbeat to the server. But this results in the client ignoring any
>> heartbeats from the revived server. This is definitely a bug! Instead of
>> ignoring the heartbeat events we need to stop the client heartbeat timer and
>> restart it upon reconnect.
>>
>> @hintjens please correct me if I'm wrong.
>>
>> 2016-04-18 13:13 GMT+02:00 Kevin Sapper <[email protected]>:
>>>
>>> Hi Alena,
>>>
>>> in the mlm_client.xml there is a state named "defaults" which is inherited
>>> by many others including "disconnecting". When the client is in
>>> "disconnecting" state and the server reconnects it will send a heartbeat
>>> which the client will answer with a connection ping and upon connection pong
>>> from the server the client will move from "disconnecting" state into
>>> "connected" state.
>>>
>>> //Kevin
>>>
>>> 2016-04-18 8:47 GMT+02:00 Alena Chernikava <[email protected]>:
>>>>
>>>> Hi,
>>>>
>>>> I would like to ask some questions and point out some problems in
>>>> Malamute broker.
>>>>
>>>> I am facing a problem with client reconnect procedure in malamute.
>>>> Usually a formal description allows me to better understand the problem,
>>>> that is why I started an investigation with creating a visualization of a
>>>> state machine for malamute client. I would say it helped me a lot :) Right
>>>> away I found some "strange behavior"s. I would like to ask some questions 
>>>> to
>>>> make it more clear for me (may be it was done intentionally) before I will
>>>> try to "experiment" with fixes.
>>>>
>>>> In the attachment you can find my hand-made visualization of the state
>>>> machine (I was doing it for myself, so it has my thoughts written down).
>>>> (GREEN - states, RED - events, BLUE - actions). It is not complete, but
>>>> already helped me to spot some potential and real problems. Here I would
>>>> describe some issues I found (numbering is the same as on the picture).
>>>>
>>>> 1. Re-connection problem. It is actually the main problem I want to
>>>> discuss.
>>>>
>>>> Situation:
>>>> client sends 3  PINGS and do not receive any PONGS back. After this
>>>> client will end up in the "disconnected" state. I would say that it is a
>>>> black hole state, as client cannot normally recover from it (to the
>>>> "connected" state) or at least move somewhere.
>>>>
>>>> Analysis:
>>>> * We can destroy the client. We will move out of "disconnected" state,
>>>> but we destroyed the client. :) End of work, nothing to do. Everything is
>>>> fine
>>>> * We can move to the "connected" state, if client will receive "PONG"
>>>> from server or we can move to the "HAVE ERROR" state if client will receive
>>>> "ERROR" from server. In order to receive from server some response, we need
>>>> to send something to the server. And here we are: the client do not send
>>>> anything to the server :( PINGs are disabled in the "mlm_client.xml" from
>>>> the very beginning.
>>>>
>>>> Questions:
>>>> * Why PING was disabled in "disconnected" state?
>>>> * What was the basic idea for the "re connect" implementation?
>>>>
>>>> Proposal:
>>>> Enable PINGs. When server receive a PING from "unknown client" it will
>>>> send "ERROR" back that will trigger "re connection" procedure. But still, I
>>>> am not sure if client would reconnect correctly, but at least we can give
>>>> him a chance to do so, because now the client have no chance to reconnect
>>>> (if server is off for longer period)
>>>>
>>>> 2. Take a look on the picture on the right corner.
>>>>
>>>> in the mlm_client.xml:
>>>>
>>>>     <state name = "connecting" inherit = "defaults">
>>>>         <event name = "OK" next = "connected">
>>>>             <action name = "signal success" />
>>>>             <action name = "client is connected" />
>>>>         </event>
>>>> This can cause that the following code will be ok (and actually I saw
>>>> such behavior couple times):
>>>>       int rv  = mlm_client_connect();
>>>>       assert (rv == 0)
>>>>       assert (mlm_client_connected () == false)
>>>>
>>>> Proposal: do "signal success" after "client is connected"
>>>> Question: is there any reason to left the order as it is?
>>>>
>>>> 3+4. I didn't understand from the code one point. When client is supposed
>>>> to start heart beating?
>>>> I thought, that it should happen after client got "OK" response from the
>>>> server, but from the state machine I see that in the state "connecting"
>>>> (while waiting for the response from the server) heart beating starts. Is
>>>> this a bug or it was done intentionally?
>>>>
>>>> 5. It is just a bug, I will fix it later. If mlm_client_connect didn’t
>>>> work for the first time, the client should remain in «start" state.
>>>>
>>>> 6. It is a potential problem. If "PONG" will come before "OK" message
>>>> from server, the mlm_client_set_producer/consumer/worker will not end
>>>> correctly and potentially will never do a "return". I propose: return to
>>>> "confirming" state and wait for "OK" response from server. Do you think it
>>>> will not break anything?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thank you for reading this, waiting forward for your reply.
>>>> Alena Chernikava
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> [email protected]
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> [email protected]
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



-- 
best regards
     Michal Vyskocil
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to