Re: Distributed queue: how to ensure no lost items?

Flavio Junqueira Thu, 08 Jan 2009 08:07:03 -0800

You can't simply leave an element in the queue until a consumerfinishes processing it, otherwise multiple consumers may end upprocessing it. What about the following:


- Use a failure detector to detect which consumers are up;

- Before removing an element from the queue, a consumer creates aznode (can't be ephemeral) flagging that the consumer is processingthat element, that I'll call "/processing/pr";- Once the consumer is done with the element, it removes "/processing/pr";- There is a watchdog client that periodically verifies if the list ofelements being processed contains any element being processed by acrashed consumer. In that case, this client places the element back inthe queue. This process needs to be careful not to generate duplicatesas it is possible that the consumer crashes before it has time todelete the element from the queue and after it has created "/processing/pr".

I'm not sure if there is a more efficient solution because now youneed to implement a failure detection mechanism through zookeeper (orexternally if you prefer) and an extra watchdog process.


-Flavio

On Jan 8, 2009, at 4:15 PM, Stuart White wrote:

I'm interested in using ZooKeeper to provide a distributed
producer/consumer queue for my distributed application.

Of course I've been studying the recipes provided for queues,barriers, etc...


My question is: how can I prevent packets of work from being lost if a
process crashes?

For example, following the distributed queue recipe, when a consumer
takes an item from the queue, it removes the first "item" znode under
the "queue" znode.  But, if the consumer immediately crashes after
removing the item from the queue, that item is lost.

Is there a recipe or recommended approach to ensure that no queue
items are lost in the event of process failure?

Thanks!

Re: Distributed queue: how to ensure no lost items?

Reply via email to