Re: Using C* and CAS to coordinate workers

Jan Algermissen Fri, 04 Apr 2014 12:51:21 -0700

Hi Duy Hai,


On 04 Apr 2014, at 20:48, DuyHai Doan <doanduy...@gmail.com> wrote:

> @Jan
> 
>  Your use-case is different than what i though. So basically you have only 
> one data source (the feed) and many consumers (the workers)
> 
>  Only one worker is allowed to consumer the feed at a time.
> 
>  This can be modeled very easily using distributed lock with C.A.S

thank you - this confirms a design I am using in a similar scenario. However, I 
have problems there when concurrency is getting high - maybe my C* version is a 
too-early version 2. But since you confirm the design, I’ll dig deeper into the 
issue. Also, I’ll try to spread out the workers using modulo calculations 
somehow.

Having said that - my impression was that what you suggest is sort of 'brute 
force coordination’ and that there is likely a more clever design - like your 
queuing example. That is why I went to the list for answers.

Thank you either way!

Jan



> 
> CREATE TABLE feed_lock (
>     lock text PRIMARY KEY,
>     worker_token text
> ); 
> 
> 1) First initialize the table with : INSERT INTO feed_lock (lock) 
> VALUES('lock'). The primary key value is hard-coded and is always the same: 
> 'lock'. It does not really matter. At this step, the column worker_token is 
> null since we did not insert any value in it
> 
> 2) If a worker w1 wants to get the lock: UPDATE feed_lock SET 
> worker_token='token1' WHERE lock='lock' IF worker_token=null.
> w1 tries to acquire the lock if it is not hold by any other worker (IF 
> worker_token=null)
> 
> 3) Concurrently, if another worker w2 tries to acquire the lock, it will fail 
> since worker_token is not null any more. 
> 
> 4) w1 will release the lock with: UPDATE feed_lock SET worker_token=null 
> WHERE lock='lock' IF worker_token='token1'
> The important detail here is that only w1 knows the value of token1 so only 
> w1 is able to release the lock
> 
> This is a very simple design.
> 
>  Now it can happen that the lock is never released because one worker holding 
> it crashed and the secret token is lost. To circumvent that, you can held the 
> lock using an update with TTL. The lock will be released no matter how after 
> the TTL expires. If the processing of the feed takes longer than the TTL 
> time, the current worker can always extends the lease with another update 
> with TTL again to reset the TTL value.
> 
>  Hope that helps
> 
>  Regards
> 
>  Duy Hai DOAN
> 
> 
> 
>   
> 
> 
> On Fri, Apr 4, 2014 at 6:48 PM, Jan Algermissen <jan.algermis...@nordsc.com> 
> wrote:
> Hi DuyHai,
> 
> 
> On 04 Apr 2014, at 13:58, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> > @Jan
> >
> >  This subject of distributed workers & queues has been discussed in the 
> > mailing list many times.
> 
> Sorry + thanks.
> 
> Unfortunately, I do not want to use C* as a queue, but to coordinate workers 
> that page through an (XML) data feed of events every N seconds.
> 
> Let me try again:
> 
> - I have N instances of the same system, replicated to ensure work is
>   being done despite failure of instances
> - the instances are master less and know nothing about each other. Given
>   them an integer ID isn’r really possible and the number of instances isn’t
>   really known
> - there is a schedule, controlling how often the feed is read, say once every 
> Minute
> - the schedule might change by way of an administrator of the ‘feed polling’
> - the worker instances check for work every, e.g. 10 secs
> - once a worker starts, it checks whether there is work to do (the schedule 
> aspect) and if so,
>   starts polling the feed until the last event has been reached.
> - During that time, no other worker must poll the feed
> - once the working worker is done it saves the timestamp or ID of the last 
> seen event and sets the next schedule
> 
> - the processing of the events might take much longer than the schedule 
> intervals
> 
> I hope this explains more, what I am up to. Maybe I can adapt your 
> suggestion, I just do not see how.
> 
> Jan
> 
> 
> 
> > Basically one implementation can be:
> >
> > 1) p data providers, c data consumers
> > 2) create partitions (physical rows) of arbitrary number of columns (let's 
> > say 10 000, not too big though). Partition key = bucket number (#b)
> > 3) assign an integer id (pId) to each provider, same for each consumer (cId)
> > 4) each provider can only write messages in bucket number such that #b mod 
> > p = pId mod p
> > 5) once the provider reaches 10 000 messages per bucket, it switches to the 
> > next one with new #b = old #b + p
> > 6) the consumers follow the same rule for bucket switching
> >
> > Example:
> >
> >  p = 5, c = 3
> >
> >  - p1 writes messages into buckets {1,6,11,16...} // 1, 1+5, 1+5+5, ....
> >  - p2 writes messages into buckets {2,7,12,17...} // 2, 2+5, 2+5+5,...
> >  - p3 writes messages into buckets {3,8,13,18...}
> >  - p4 writes messages into buckets {4,9,14,19...}
> >  - p5 writes messages into buckets {5,10,15,20...}
> >
> >  - c1 consumes messages from buckets {1,4,7,10...} // 1, 1+3, 1+3+3...
> >  - c2 consumes messages from buckets {2,5,8,11...}
> >  - c1 consumes messages from buckets {3,6,9,12...}
> >
> > Of course, consumers can not re-put messages into the bucket otherwise the 
> > counting (10 000 elements/bucket) is screwed
> >
> > Alternatively, you can insert messages with TTL to automatically expired 
> > "consumed buckets" after a while, saving you the hassle to clean up old 
> > buckets to reclaim disk space.
> >
> >
> >  There are other implementations based on distributed lock using C* C.A.S 
> > also but the above algorithm do not requires any lock.
> >
> > Regards
> >
> >  Duy Hai DOAN
> >
> >
> >
> >
> >
> > On Fri, Apr 4, 2014 at 12:47 PM, prem yadav <ipremya...@gmail.com> wrote:
> > Oh ok. I thought you did not have a cassandra cluster already. Sorry about 
> > that.
> >
> >
> > On Fri, Apr 4, 2014 at 11:42 AM, Jan Algermissen 
> > <jan.algermis...@nordsc.com> wrote:
> >
> > On 04 Apr 2014, at 11:18, prem yadav <ipremya...@gmail.com> wrote:
> >
> >> Though cassandra can work but to me it looks like you could use a 
> >> persistent queue for example (rabbitMQ) to implement this. All your 
> >> workers can subscribe to a queue.
> >> In fact, why not just MySQL?
> >
> > Hey, I have got a C* cluster that can (potentially) do CAS.
> >
> > Why would I set up a MySQL cluster to solve that problem?
> >
> > And yeah, I could use a queue or redis or whatnot, but I want to avoid yet 
> > another moving part :-)
> >
> > Jan
> >
> >
> >>
> >>
> >> On Thu, Apr 3, 2014 at 11:44 PM, Jan Algermissen 
> >> <jan.algermis...@nordsc.com> wrote:
> >> Hi,
> >>
> >> maybe someone knows a nice solution to the following problem:
> >>
> >> I have N worker processes that are intentionally masterless and do not 
> >> know about each other - they are stateless and independent instances of a 
> >> given service system.
> >>
> >> These workers need to poll an event feed, say about every 10 seconds and 
> >> persist a state after processing the polled events so the next worker 
> >> knows where to continue processing events.
> >>
> >> I would like to use C*’s CAS feature to coordinate the workers and protect 
> >> the shared state (a row or cell in  a C* key space, too).
> >>
> >> Has anybody done something similar and can suggest a ‘clever’ data model 
> >> design and interaction?
> >>
> >>
> >>
> >> Jan
> >>
> >
> >
> >
> 
>

Re: Using C* and CAS to coordinate workers

Reply via email to