https://bugzilla.wikimedia.org/show_bug.cgi?id=43449
--- Comment #14 from Brandon Black <[email protected]> --- The daemon logs some stats to a file, which we could pick up and graph (but currently do not, yet). These would basically give you the rate of multicast purge requests the daemon's receiving and whether it's failing to process any of them due to some large-impact bug that's overflowing the queue. The larger issue that makes that relatively ineffective is that the requests arrive over multicast, which is an unreliable protocol by design. They could be lost in the sender's output buffers, anywhere in the network, or discarded at the receiving cache (local buffering issues) and we'd have no indication that was happening. Upgrading from multicast is also an expensive proposition in terms of complexity (after all, the reason we're using it is that it's simple and efficient). We've thrown around some ideas about replacing multicast with http://en.wikipedia.org/wiki/Pragmatic_General_Multicast , likely using http://zeromq.org/ as the communications abstraction layer, as a solution to the unreliability of multicast. This would basically give us a reliable sequence-number system with retransmission that's handled at that layer. That means adding zeromq support to the php that sends the purge requests, adding it to vhtcpd, and most likely also building out a redundant, co-operating set of middleboxes as publish/subscribe multiplexers. I'm not fond of going down this path unless we really see a strong need to upgrade from multicast, though. It smells of too much complexity for the problem we're trying to solve, and/or that there may be a better mechanism for this if we re-think how purging is being accomplished in general. In any case, I think that would all be outside the scope of this ticket. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
