Hi Steve,

End-to-end flow control is something I'd really love to see. It sounds like
your proposal won't fix all the problems we're seeing with flow control though.

A problem we've seen is kind of permanent congestion - the receiver gets a
burst of several hundred CPG messages queued up and never really recovers. The
sender continues sending enough CPG messages that the receiver never clears out
its queue, but doesn't run out of memory either. The receiver's queue could
hover in this state indefinitely. On our setup, a healthcheck mechanism detects
the receiver has locked up (some operations are blocking due to flow control
congestion) and eventually restarts the process.
(As an interim workaround for this on our setup, I fudged the token backlog
calculation to gradually force the sender to backoff, so the sender's totem
message queue fills up and it starts getting TRY_AGAIN errors).

I was wondering whether end-to-end flow control at the CPG group level is a
possible/feasible option that'd solve both this case and the oom one? E.g. in
the CPG library code it sends an internal message to notify the rest of the CPG
group whenever the flow control status for an application changes?

Regards,
Tim
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to