in my system, I have dedicated dispatcher services, that just know, where to forward the different messages.
for shutdown, I send a "shut up" message to all services, which afterwards refuse do do anything. refused messages are written to DB by the dispatchers, as well as fresh messages that may have been on the wire before the "shut up" was processed. Due to some counters in the service queue devices, and some special "info messages" that are still processed after a "shut up", I detect, when the system becomes silent (no messages any more) and I send a "self kill" message to all services which terminate peacefully.
(dispatcher services are quieted and killed separately, they ignore the "normal" service commands)
upto now, this concept worked great ... ok, the system is far from being finished, but startup, stress testing of ~30 services instances of 8 different types (one of them the dispatcher type) and the shutdown runs solid as a rock and the persintence during shutdown can allready be used to do a startup in exactly the state before last shutdown ... kind of "suspend to DB".
Sven
---------------------------------------------------------
E = mc² ± 2dBA ----- everything is relative
---------------------------------------------------------
-----Original Message-----
Date: Tue, 23 Nov 2010 14:49:44 +0100
Subject: Re: [zeromq-dev] Mac OS X: test_shutdown_stress sometimes fails
From: Martin Sustrik <[email protected]>
To: Dhammika Pathirana <[email protected]>
Dhammika,
> I donno, may be we should simplify this.
> Why don't we add a refcount?
As a quick workaround -- yes. Do you have a patch for that kind of solution?
However, thinking about it conceptually, the problem is more generic.
Namely, object A can call object B which in turn calls object A. In such
a scenario, the inner call on A works on inconsistent state as the outer
call isn't yet completed. Other way round, outer call on A gets the
state changed underneath its feet when it calls B (see seq1.png attached).
The real solution, IMO, would be to use events to sequence actions on
individual objects. That way there won't be inner and outer call,
rather, there will be two event handlers executed one after another (see
seq2.png).
Brief code review of 0MQ shows that this kind of problem only happens
within a cluster of object composed of session, engine and
decoder/encoder. The rest of the system (sockets, pipes) is using events
already and doesn't have to be changed.
Martin
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
