Hi All, Usually we are using zeromq with pgm as our message bus. We are using message bus to publish events between server side services.
The issue is that we need to support environment where multicast is not supported (like amazon cloud). I'm working on a design to make tcp based message bus and want to get your thoughts on that. There are three major requirements, we want services to be able to come and go without need to reconfigure the system, we want a brokeless design and we want to be able to recover lost messages between a publisher and a subscriber (caused by connection problem) like pgm does. We have three types of components, a discovery service, publisher and subscriber. Discovery Service is a standalone service, the discovery service has the list of all the subscribers in the network, the subscriber ping the discovery service every X seconds, when specific subscriber didn't ping the service for more than Y seconds it consider dead. On every new subscriber the publisher publish a message to all the publishers. For high availability there are more than one discovery services (probably 3). When publisher is starting it's asking the discovery service for all of the subscribers and subscribe for new subscribers (it asked all configured discovery services and takes the first answer, it subscribed for all of the discovery services). After getting the list the publisher is connecting to all of the subscribers. The publisher also connects to every new subscriber. The publisher is ignoring dead subscribers (mostly because I don't know how to handle it because the dead message can come from one of the discovery service but can still be alive on others). All the messages the publisher is sending are numbered, also the publisher is saving the X last messages it sends to support recovery of lost messages. Each publisher has a unique random id. If publisher doesn't send a message in X seconds the publisher will send a keep alive message to all subscribers. As mentioned the subscriber ping the discovery services every X seconds, when the subscriber get a message from a publisher for the first time it's saving the message number. From there if the subscriber detects a gap in the messages it directly connects to the publisher (using request-response) and asking for the missing messages. The only problem is that in lost messages situation the subscriber will stop handle new messages from all publishers until the missing messages are restored. If the publisher doesn't have those messages anymore the subscriber should raise an exception or restart the entire service. The only thing the subscriber and publisher need to know is the addresses of the discovery services. The reason I want the publisher to connect to the subscriber is to make sure when the connection is dropped the publisher will be able to recognize it and reconnect (the subscriber may not be able to recognize it because it doesn't send any data to the publishers). Thanks, I will very much appreciate your comments. Doron
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
