Hi Steve, End-to-end flow control is something I'd really love to see. It sounds like your proposal won't fix all the problems we're seeing with flow control though.
A problem we've seen is kind of permanent congestion - the receiver gets a burst of several hundred CPG messages queued up and never really recovers. The sender continues sending enough CPG messages that the receiver never clears out its queue, but doesn't run out of memory either. The receiver's queue could hover in this state indefinitely. On our setup, a healthcheck mechanism detects the receiver has locked up (some operations are blocking due to flow control congestion) and eventually restarts the process. (As an interim workaround for this on our setup, I fudged the token backlog calculation to gradually force the sender to backoff, so the sender's totem message queue fills up and it starts getting TRY_AGAIN errors). I was wondering whether end-to-end flow control at the CPG group level is a possible/feasible option that'd solve both this case and the oom one? E.g. in the CPG library code it sends an internal message to notify the rest of the CPG group whenever the flow control status for an application changes? Regards, Tim _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
