Brain, Mato, > Thanks for the quick reply. It helps to understand the overall philosophy. > >> Actually, this behaviour is normal and is not unique to the Python >> bindings. >> >> Calling zmq_connect() (which is what s.connect() does) just means "please >> try to connect asynchronously, now or later". You will only get an error if >> the endpoint is *invalid* (e.g. Host doesn't resolve, etc.), not if the >> other end is not present. > > OK, that makes sense - different from what I am used to but probably OK... > >> Same goes for recv/send -- 0MQ does autoreconnect and both recv/send are >> entirely asynchronous. So if the other end goes away your data will get >> sent once it comes back. > > What if the other end never comes back. Is there a way of clearing > the queue of messages > that would have been delivered to that endpoint. I guess it depends > on the type of socket right?
Exactly. Specifying message type determines the algorithm used to handle broken connections. In case of PUB/SUB the messages are simply dropped once the queue overflows. In cacase of REQ/REP unaccessible connections are skipped. Once there's no accessible connection and queue limits are reached, send function will block. > I would imagine that a socket type that round robin distributes to a > set of endpoints, would just skip > any endpoint that disconnects? What about reply/request queues or multicast? REQ/REP doesn't work over multicast right now. I haven't seen a compelling use case for the functionality by the way. If you have one, please do share it. >> We realise that there are many use cases where people do want to know if >> a peer is present at least for those transports where it makes sense but >> the implications of doing this properly (which means e.g. synchronous >> zmq_send() which defeats queueing and batching, etc.) need more thought. > > OK, I think I see why you are thinking of of a synchronous send now. > This is pretty subtle as we definitely > want things to be asynchronous. What I am thinking of is sort of > "delivery confirmation" that itself is > asynch. Imagine send having a callback that would be called upon > message delivery or failure. Or it could > return an object like a deferred > (http://twistedmatrix.com/documents/current/core/howto/defer.html). > At some > level, the underlying networking code does hit errors in these cases, > and those errors are asynch. The question > is how to represent them in the calling code (that does recv/send). > >> I would suggest implementing a "ping" function at the application level. >> Send a message every X seconds and if you don't get a reply within Y >> seconds then take evasive action. > > Yes, with low latency, this might be a great option. But, still there > has to be someway of handling > messages that won't ever be sent because the receiving endpoint has truly > died. Ok, this topic needs more thought. To start the discussion let me summarise the facts that have to be taken into account: 1. It should be made clear what 'disconnection' means. On networking level there are no disconnections. There are only packets either getting through or not getting through. Disconnection can mean various things: a.) I've sent a packet and haven't got ACK for N seconds. b.) I've sent a message and haven't got ACK for N seconds. c.) I've sent a message and the peer application haven't acknowledged that it have processed the transaction for N seconds. d.) There were no data received from the peer for N seconds (heartbeats). etc. 2. When should the disconnection notification be delivered? a.) Immediately when it happens. b.) It should be stored and delivered on next 0mq function call. c.) It should be placed into the queue and delivered just after the last message we've got before the disconnection. 3. Each 0MQ socket handles N "connections". Supposing the connections are anonymous the disconnection notification would simply state "one of the connections was broken" - which is not of much use aside of keeping track of number of opened connections. What's the use case here? 4. With multicast transports, sender is not even aware of all the receivers (though receiver is aware of the senders) and thus it is certainly not aware of receiver "disconnections". How does this fit into a bigger picture? 5. If there's a middlebox on the path from sender to receiver (say zmq_forwarder) this way A->B->C, when does the disconnection has to be reported to A. If A-B connection breaks? What about B-C disconnection? It prevents passage of messages in the same way as A-B disconnection does. How should the event be passed back to A? In overall, my feeling is that disconnection notifications are inherently flawed concept (please, do argue with the point). What's needed instead is an ACK mechanism, moving the responsibility for message transfer between nodes on the path, dead letter queues etc. That brings us to the reliability topic: Can message delivery be acknowledged if the next node stored it in memory? Or should it be stored persistently, so that it survives power failures? Or, should it be replicated on mutliple boxes to survive HD failure? Or should we wait for ACK from the peer application? Should the application send the ACK itself once it processed the business transaction associated with the message? Should it do so within a DB-like transaction? Should we support XA distributed transactions so that sending ACK will happen atomically with committing the transaction results into the database? Etc. Martin _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
