A similar pub sub socket at the bottom of the workers. You mirror the topology 
in synchronization space. 



On Aug 27, 2010, at 10:01 AM, Andrew Hume <[email protected]> wrote:

> thanks! that was just the input i was after.
> 
> my intent is to do out-of-band signalling,
> but because 0MQ doesn't provide clean startup/termination semantics,
> and because of teh uncertainty caused by buffering, i had to simulate
> one step of teh signalling by sending NO-OPs.
> 
> if i don't use NO-OPs, and purely use OOB signalling,
> how do i know when a worker is done with its work?
> how do i know when a ventilator's work messages have all been delivered?
> and, if possible, the answer shouldn't contain any time-related waits.
> 
> On Aug 27, 2010, at 9:01 AM, Matt Weinstein wrote:
> 
>> IMO
>> 
>> You're trying to get state control messages to flow through the system, this 
>> method is a hybrid "in band" and "out of band" system.
>> 
>> You probably should choose one or the other.
>> 
>> OOB - You mirror the topology with a group of PUB/SUB sockets, top to bottom
>> IB -  you put an input at the top of the ventilators and send inband 
>> messages downstream.  In this case it might be useful to have signaling 
>> points (devices) that let local components know what's going on without the 
>> stream of NOPs.
>> 
>> I don't think both IB and OOB are necessary, and it will be easier to build 
>> a correct solution if you choose just one.
>> 
>> In both cases UUIDs would be good to ensure that all nodes have been 
>> accounted for.  Counting is not particularly safe in a distributed 
>> environment.
>> 
>> Best,
>> Matt
>> 
>> On Aug 26, 2010, at 10:05 PM, Andrew Hume wrote:
>> 
>>> i need some advice. i do not yet grok the feng shui of zeromq,
>>> and thus seek advice from those who do.
>>> 
>>> i have a fairly normal setup similiar to the parallel pipeline example in 
>>> teh guide.
>>> except that i have a handful of ventilators, and a handful of sinks.
>>> so far, so good. we just use the PUSH/PULL pattern.
>>> 
>>> here is where it gets harder. i need to be able to essentially pause
>>> the ventilators, adjust the number of workers and sinks, and then
>>> unpause the ventilators WITHOUT losing any packets.
>>> 
>>> the best (!?) solution i have so far is
>>> 
>>> a) add a PUSH/PULL feedback socket (with all sinks and workers PUSH,
>>> and the master is a PULL)
>>> b) add a PUB/SUB command socket (with all ventilators, sinks and workers 
>>> SUB,
>>> and the master PUB)
>>> 
>>> c) we send an "IDLE" command to the ventilators; they pause their normal 
>>> work
>>> and start sending NO-OP work items
>>> d) as each worker starts getting NO-OPs, they push a "LAZY" message to the 
>>> master.
>>> they orward the NO-OP to the sinks.
>>> e) when the master sees k LAZY messages (where k is the existing number of 
>>> workers),
>>> it rearranges teh workers (killing some or starting new ones). new workers 
>>> send NO-OPs.
>>> f) when each sink starts getting NO-OPs, it sends a "LAZY" message to the 
>>> master.
>>> g) when the master has done e), and seen NO-OPs from each of the j sinks, it
>>> rearranges the sinks. when each new sink starts getting NO-OPs, it send s a 
>>> LAZY to teh master.
>>> 
>>> h) when the master receives m "LAZY"s (where m is the number of new sinks), 
>>> it send an "GO"
>>> command to teh ventilators, who then stop sending NO-OPs and start sending 
>>> real work.
>>> 
>>> -------------------------------------
>>> 
>>> pros: i believe this scheme will work. and the additional cost of two 
>>> sockets is modest.
>>> cons: it is tedious to send NO-OPs, but i don't know how else to flush teh 
>>> buffers
>>> and synchronise everyone. it does involve knowing how many things there are,
>>> but that is part of an external configuration in any case.
>>> 
>>> is this the (or a) right way to do this? is there a better way?
>>> 
>>>     andrew
>>> 
>>> ------------------
>>> Andrew Hume  (best -> Telework) +1 732-886-1886
>>> [email protected]  (Work) +1 973-360-8651
>>> AT&T Labs - Research; member of USENIX and LOPSA
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> [email protected]
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> ------------------
> Andrew Hume  (best -> Telework) +1 732-886-1886
> [email protected]  (Work) +1 973-360-8651
> AT&T Labs - Research; member of USENIX and LOPSA
> 
> 
> 
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to