A similar pub sub socket at the bottom of the workers. You mirror the topology in synchronization space.
On Aug 27, 2010, at 10:01 AM, Andrew Hume <[email protected]> wrote: > thanks! that was just the input i was after. > > my intent is to do out-of-band signalling, > but because 0MQ doesn't provide clean startup/termination semantics, > and because of teh uncertainty caused by buffering, i had to simulate > one step of teh signalling by sending NO-OPs. > > if i don't use NO-OPs, and purely use OOB signalling, > how do i know when a worker is done with its work? > how do i know when a ventilator's work messages have all been delivered? > and, if possible, the answer shouldn't contain any time-related waits. > > On Aug 27, 2010, at 9:01 AM, Matt Weinstein wrote: > >> IMO >> >> You're trying to get state control messages to flow through the system, this >> method is a hybrid "in band" and "out of band" system. >> >> You probably should choose one or the other. >> >> OOB - You mirror the topology with a group of PUB/SUB sockets, top to bottom >> IB - you put an input at the top of the ventilators and send inband >> messages downstream. In this case it might be useful to have signaling >> points (devices) that let local components know what's going on without the >> stream of NOPs. >> >> I don't think both IB and OOB are necessary, and it will be easier to build >> a correct solution if you choose just one. >> >> In both cases UUIDs would be good to ensure that all nodes have been >> accounted for. Counting is not particularly safe in a distributed >> environment. >> >> Best, >> Matt >> >> On Aug 26, 2010, at 10:05 PM, Andrew Hume wrote: >> >>> i need some advice. i do not yet grok the feng shui of zeromq, >>> and thus seek advice from those who do. >>> >>> i have a fairly normal setup similiar to the parallel pipeline example in >>> teh guide. >>> except that i have a handful of ventilators, and a handful of sinks. >>> so far, so good. we just use the PUSH/PULL pattern. >>> >>> here is where it gets harder. i need to be able to essentially pause >>> the ventilators, adjust the number of workers and sinks, and then >>> unpause the ventilators WITHOUT losing any packets. >>> >>> the best (!?) solution i have so far is >>> >>> a) add a PUSH/PULL feedback socket (with all sinks and workers PUSH, >>> and the master is a PULL) >>> b) add a PUB/SUB command socket (with all ventilators, sinks and workers >>> SUB, >>> and the master PUB) >>> >>> c) we send an "IDLE" command to the ventilators; they pause their normal >>> work >>> and start sending NO-OP work items >>> d) as each worker starts getting NO-OPs, they push a "LAZY" message to the >>> master. >>> they orward the NO-OP to the sinks. >>> e) when the master sees k LAZY messages (where k is the existing number of >>> workers), >>> it rearranges teh workers (killing some or starting new ones). new workers >>> send NO-OPs. >>> f) when each sink starts getting NO-OPs, it sends a "LAZY" message to the >>> master. >>> g) when the master has done e), and seen NO-OPs from each of the j sinks, it >>> rearranges the sinks. when each new sink starts getting NO-OPs, it send s a >>> LAZY to teh master. >>> >>> h) when the master receives m "LAZY"s (where m is the number of new sinks), >>> it send an "GO" >>> command to teh ventilators, who then stop sending NO-OPs and start sending >>> real work. >>> >>> ------------------------------------- >>> >>> pros: i believe this scheme will work. and the additional cost of two >>> sockets is modest. >>> cons: it is tedious to send NO-OPs, but i don't know how else to flush teh >>> buffers >>> and synchronise everyone. it does involve knowing how many things there are, >>> but that is part of an external configuration in any case. >>> >>> is this the (or a) right way to do this? is there a better way? >>> >>> andrew >>> >>> ------------------ >>> Andrew Hume (best -> Telework) +1 732-886-1886 >>> [email protected] (Work) +1 973-360-8651 >>> AT&T Labs - Research; member of USENIX and LOPSA >>> >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > ------------------ > Andrew Hume (best -> Telework) +1 732-886-1886 > [email protected] (Work) +1 973-360-8651 > AT&T Labs - Research; member of USENIX and LOPSA > > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
