We have implemented a distributed queue (similar to AWS SQS) and a job queue in Cassandra.
Vram On Sat, Jun 26, 2010 at 1:56 PM, Andrew Miklas <and...@pagerduty.com> wrote: > Hi all, > > Has anyone written a work-queue implementation using Cassandra? > > There's a section in the UseCase wiki page for "A distributed Priority Job > Queue" which looks perfect, but unfortunately it hasn't been filled in yet. > http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue > > I've been thinking about how best to do this, but every solution I've > thought of seems to have some serious drawback. The "range ghost" problem > in particular creates some issues. I'm assuming each job has a row within > some column family, where the row's key is the time at which the job should > be run. To find the next job, you'd do a range query with a start a few > hours in the past, and an end at the current time. Once a job is completed, > you delete the row. > > The problem here is that you have to scan through deleted-but-not-yet-GCed > rows each time you run the query. Is there a better way? > > Preventing more than one worker from starting the same job seems like it > would be a problem too. You'd either need an external locking manager, or > have to use some other protocol where workers write their ID into the row > and then immediately read it back to confirm that they are the owner of the > job. > > Any ideas here? Has anyone come up with a nice implementation? Is > Cassandra not well suited for queue-like tasks? > > > > Thanks, > > > Andrew >