RE: Distributed queue: how to ensure no lost items?
That is a good point. you could put a child znode of queue-X that contains the processing history. Like who tried to process and what time they started. ben From: Hiram Chirino [chir...@gmail.com] Sent: Monday, January 12, 2009 8:48 AM To: zookeeper-user@hadoop.apache.org Subject: Re: Distributed queue: how to ensure no lost items? At least once is generally the case in queuing systems unless you can do a distributed transaction with your consumer. What comes in handy in an at least once case, is letting the consumer know that a message may have 'potentially' already been processed. That way he can double check first before he goes off and processes the message again. But adding that info in ZK might be more expensive that doing the double check every time in consumer anyways. On Thu, Jan 8, 2009 at 11:42 AM, Benjamin Reed wrote: > We should expand that section. the current queue recipe guarantees that > things are consumed at most once. to guarantee at least the consumer creates > an ephemeral node queue-X-inprocess to indicate that the node is being > processed. once the queue element has been processed the consumer deletes > queue-X and queue-X-inprocess (in that order). > > using an emphemeral node means that if a consumer crashes, the *-inprocess > node will be deleted allowing the queue elements it was working on to be > consumed by someone else. putting the *-inprocess nodes at the same level of > the queue-X nodes allows the consumer to get the list of queue elements and > the inprocess flags with the same getChildren call. the *-inprocess flag > ensures that only one consumer is processing a given item. by deleting > queue-X before queue-X-inprocess we make sure that no other consumer will see > queue-X as available for consumption after it is processed and before it is > deleted. > > this is at last once, because the consumer has a race condition. the consumer > may process the item and then crash before it can delete the corresponding > queue-X node. > > ben > > -Original Message- > From: Stuart White [mailto:stuart.whi...@gmail.com] > Sent: Thursday, January 08, 2009 7:15 AM > To: zookeeper-user@hadoop.apache.org > Subject: Distributed queue: how to ensure no lost items? > > I'm interested in using ZooKeeper to provide a distributed > producer/consumer queue for my distributed application. > > Of course I've been studying the recipes provided for queues, barriers, etc... > > My question is: how can I prevent packets of work from being lost if a > process crashes? > > For example, following the distributed queue recipe, when a consumer > takes an item from the queue, it removes the first "item" znode under > the "queue" znode. But, if the consumer immediately crashes after > removing the item from the queue, that item is lost. > > Is there a recipe or recommended approach to ensure that no queue > items are lost in the event of process failure? > > Thanks! > -- Regards, Hiram Blog: http://hiramchirino.com Open Source SOA http://open.iona.com
Re: Distributed queue: how to ensure no lost items?
At least once is generally the case in queuing systems unless you can do a distributed transaction with your consumer. What comes in handy in an at least once case, is letting the consumer know that a message may have 'potentially' already been processed. That way he can double check first before he goes off and processes the message again. But adding that info in ZK might be more expensive that doing the double check every time in consumer anyways. On Thu, Jan 8, 2009 at 11:42 AM, Benjamin Reed wrote: > We should expand that section. the current queue recipe guarantees that > things are consumed at most once. to guarantee at least the consumer creates > an ephemeral node queue-X-inprocess to indicate that the node is being > processed. once the queue element has been processed the consumer deletes > queue-X and queue-X-inprocess (in that order). > > using an emphemeral node means that if a consumer crashes, the *-inprocess > node will be deleted allowing the queue elements it was working on to be > consumed by someone else. putting the *-inprocess nodes at the same level of > the queue-X nodes allows the consumer to get the list of queue elements and > the inprocess flags with the same getChildren call. the *-inprocess flag > ensures that only one consumer is processing a given item. by deleting > queue-X before queue-X-inprocess we make sure that no other consumer will see > queue-X as available for consumption after it is processed and before it is > deleted. > > this is at last once, because the consumer has a race condition. the consumer > may process the item and then crash before it can delete the corresponding > queue-X node. > > ben > > -Original Message- > From: Stuart White [mailto:stuart.whi...@gmail.com] > Sent: Thursday, January 08, 2009 7:15 AM > To: zookeeper-user@hadoop.apache.org > Subject: Distributed queue: how to ensure no lost items? > > I'm interested in using ZooKeeper to provide a distributed > producer/consumer queue for my distributed application. > > Of course I've been studying the recipes provided for queues, barriers, etc... > > My question is: how can I prevent packets of work from being lost if a > process crashes? > > For example, following the distributed queue recipe, when a consumer > takes an item from the queue, it removes the first "item" znode under > the "queue" znode. But, if the consumer immediately crashes after > removing the item from the queue, that item is lost. > > Is there a recipe or recommended approach to ensure that no queue > items are lost in the event of process failure? > > Thanks! > -- Regards, Hiram Blog: http://hiramchirino.com Open Source SOA http://open.iona.com
RE: Distributed queue: how to ensure no lost items?
We should expand that section. the current queue recipe guarantees that things are consumed at most once. to guarantee at least the consumer creates an ephemeral node queue-X-inprocess to indicate that the node is being processed. once the queue element has been processed the consumer deletes queue-X and queue-X-inprocess (in that order). using an emphemeral node means that if a consumer crashes, the *-inprocess node will be deleted allowing the queue elements it was working on to be consumed by someone else. putting the *-inprocess nodes at the same level of the queue-X nodes allows the consumer to get the list of queue elements and the inprocess flags with the same getChildren call. the *-inprocess flag ensures that only one consumer is processing a given item. by deleting queue-X before queue-X-inprocess we make sure that no other consumer will see queue-X as available for consumption after it is processed and before it is deleted. this is at last once, because the consumer has a race condition. the consumer may process the item and then crash before it can delete the corresponding queue-X node. ben -Original Message- From: Stuart White [mailto:stuart.whi...@gmail.com] Sent: Thursday, January 08, 2009 7:15 AM To: zookeeper-user@hadoop.apache.org Subject: Distributed queue: how to ensure no lost items? I'm interested in using ZooKeeper to provide a distributed producer/consumer queue for my distributed application. Of course I've been studying the recipes provided for queues, barriers, etc... My question is: how can I prevent packets of work from being lost if a process crashes? For example, following the distributed queue recipe, when a consumer takes an item from the queue, it removes the first "item" znode under the "queue" znode. But, if the consumer immediately crashes after removing the item from the queue, that item is lost. Is there a recipe or recommended approach to ensure that no queue items are lost in the event of process failure? Thanks!
Re: Distributed queue: how to ensure no lost items?
You can't simply leave an element in the queue until a consumer finishes processing it, otherwise multiple consumers may end up processing it. What about the following: - Use a failure detector to detect which consumers are up; - Before removing an element from the queue, a consumer creates a znode (can't be ephemeral) flagging that the consumer is processing that element, that I'll call "/processing/pr"; - Once the consumer is done with the element, it removes "/processing/ pr"; - There is a watchdog client that periodically verifies if the list of elements being processed contains any element being processed by a crashed consumer. In that case, this client places the element back in the queue. This process needs to be careful not to generate duplicates as it is possible that the consumer crashes before it has time to delete the element from the queue and after it has created "/ processing/pr". I'm not sure if there is a more efficient solution because now you need to implement a failure detection mechanism through zookeeper (or externally if you prefer) and an extra watchdog process. -Flavio On Jan 8, 2009, at 4:15 PM, Stuart White wrote: I'm interested in using ZooKeeper to provide a distributed producer/consumer queue for my distributed application. Of course I've been studying the recipes provided for queues, barriers, etc... My question is: how can I prevent packets of work from being lost if a process crashes? For example, following the distributed queue recipe, when a consumer takes an item from the queue, it removes the first "item" znode under the "queue" znode. But, if the consumer immediately crashes after removing the item from the queue, that item is lost. Is there a recipe or recommended approach to ensure that no queue items are lost in the event of process failure? Thanks!