Re: Proposal: Clustered Tasks via Locking

Elias Torres Fri, 29 Sep 2006 09:01:39 -0700


Allen Gilliland wrote:
> 
> 
> Elias Torres wrote:
>> Not sure if we need a whole extra table. wouldn't task.name.propName =
>> value be enough?
> 
> well, that's possible i suppose, but then we are stuffing a lot of
> information into that single property and forcing ourselves to parse it
> out.  capturing the various states then becomes a bit more confusing, i.e.
> 
> task.planet.refreshEntries = 987523897, 78345897238, 1800 (locked)
> task.planet.refreshEntries = 872837348 (unlocked)


That's not what I meant, it was more like:

task.planetRefreshEntries.status = locked
task.planetRefreshEntries[worker1].lastRun = 987523897
...

> 
> and that assumes that we wouldn't want to extend that data at all,
> because if we did want to track some kind of other data then we would be
> screwed.  i was thinking that it may be nice to capture some basic error
> status from the tasks in the event that they fail for some reason, in
> which case you may not want to keep running a task if it's failed 3
> times in a row.  so if a task has a 'Status' attribute which could
> indicate the state of the task like running, paused, error, waiting, etc.
> 
> 

... but a table is fine too.

-Elias

>>
>> Another problem I thought would be that always the same node gets to
>> perform the task which also brings the synchronized clock issue again.
> 
> depends on the task, but that would probably be true.  i'm not sure why
> that's an issue though.  if we do setup a separate table for tracking
> the tasks and their state then it becomes easier to work on some sql
> code to the use the db time rather than the cluster nodes time.
> 
> doing that given the old proposal kind of defeats the purpose because in
> the old proposal the point was to implement it in a way that didn't
> require work on the backend because we were just using a simple runtime
> property, but if we are going to do work on the backend then we should
> probably just build out a more complete solution.
> 
> -- Allen
> 
> 
>>
>> -Elias
>>
>> Allen Gilliland wrote:
>>> I had thought about this a little more and one thing that I was planning
>>> to neglect because of time was the fact that simply using a lock doesn't
>>> prevent a task from being run too frequently.  i.e. if a task is meant
>>> to run once an hour and you have 2 members in your cluster they may have
>>> been started 30mins apart from each other so that the end result is that
>>> the task runs every 30 mins.
>>>
>>> Perhaps a better solution to this would be to actually create a simple
>>> "tasks" table which can maintain one row per task and keep track of ...
>>>
>>> 1. when the task was last run
>>> 2. if the task is currently running (locked)
>>> 3. how long the current running task is leased for
>>> 4. ?? possible others (how it was triggered, etc)
>>>
>>> This would actually work much better at synchronizing the running of the
>>> tasks so that if you had 10 cluster members running the task itself is
>>> still only run at the correct interval.
>>>
>>> Should I update the proposal to work this way?
>>>
>>> -- Allen
>>>
>>>
>>> Anil Gangolli wrote:
>>>> The backend db-specific SQL required may be worthwhile.
>>>>
>>>> Synchronization can't assumed to be perfect, so if you do use the
>>>> webapp host's time, you will want to delay lease grabbing until after
>>>> some grace period that is larger than a max (assumed) clock difference
>>>> between the hosts (or conversely require an extending renewal by the
>>>> lease holder if the remaining lease time is lower than that).
>>>>
>>>> --a.
>>>>
>>>>
>>>> ----- Original Message ----- From: "Allen Gilliland"
>>>> <[EMAIL PROTECTED]>
>>>> To: <[email protected]>
>>>> Sent: Thursday, September 21, 2006 10:40 AM
>>>> Subject: Re: Proposal: Clustered Tasks via Locking
>>>>
>>>>
>>>>> I was actually thinking that a lazier approach of "that's the sys
>>>>> admin's job" was preferable.  Setting up a way to get the current
>>>>> time from the db requires backend work and i would prefer not to do
>>>>> that if we can.  My expectation is that anyone who has a large enough
>>>>> installation to need 2+ servers working in a cluster should also be
>>>>> able to make sure that each cluster member has synchronized time.
>>>>> Most other clustering software has that same expectation.
>>>>>
>>>>> -- Allen
>>>>>
>>>>>
>>>>> Elias Torres wrote:
>>>>>> One more thing.. I think you'd need to come up with a SQL query that
>>>>>> would test for the lock using the server time and not the app server
>>>>>> time. Basically a SELECT and INSERT/UPDATE together using
>>>>>> CURRENT_TIME()
>>>>>> or whatever to make sure we don't run into clock drifts.
>>>>>>
>>>>>> -Elias
>>>>>>
>>>>>> Allen Gilliland wrote:
>>>>>>> Elias Torres wrote:
>>>>>>>> I like the proposal and I think it's very important/useful.
>>>>>>>>
>>>>>>>> I would suggest though to not use a hard-coded expiration
>>>>>>>> mechanism and
>>>>>>>> instead use a leasing mechanism. I propose that a task says it
>>>>>>>> needs the
>>>>>>>> lock for X number of minutes/hours and writes the time it
>>>>>>>> started and
>>>>>>>> the lease amount. It just a subtle tweak, but it optimizes the
>>>>>>>> scheduling a bit, so a quick task like saving referrers can get a
>>>>>>>> 3-min
>>>>>>>> lease and not block 3 hours of thread time. Additionally, they
>>>>>>>> could
>>>>>>>> store the name of the task, so parallel tasks can work w/o
>>>>>>>> blocking each
>>>>>>>> other and only tasks with the same service name wait on each other.
>>>>>>>> Obviously, a task can extend their lease if needed to run for more
>>>>>>>> time.
>>>>>>>>
>>>>>>>> For example, let's store this as the
>>>>>>>> property: task.indexing value: 12:00:01,3mins
>>>>>>> yep, I can do it that way.  I guess I consider this to be the same
>>>>>>> thing
>>>>>>> because the lease time for a given task is not likely to ever
>>>>>>> change, so
>>>>>>> if the task knows what the lease time is for its lock then there
>>>>>>> is no
>>>>>>> reason the lease needs to be in the db.  Obviously if the lease
>>>>>>> time may
>>>>>>> vary for a given lock then your approach makes a lot more sense.
>>>>>>>
>>>>>>> Either way will work, but yours is slightly more flexible so I'll
>>>>>>> do it
>>>>>>> that way.  For the actual property I am going to simplify the
>>>>>>> value so
>>>>>>> that it's just long<time>, long<lease>.
>>>>>>>
>>>>>>> So if you see the lease in the db it would be
>>>>>>> property: task.indexing value: 983472893, 1800
>>>>>>>
>>>>>>> This way it's just easier for the application to use the values
>>>>>>> without
>>>>>>> actually having to worry about parsing date strings.
>>>>>>>
>>>>>>> Thanks for the suggestion.
>>>>>>>
>>>>>>> -- Allen
>>>>>>>
>>>>>>>
>>>>>>>> In other words, let's re-invent JINI.
>>>>>>>>
>>>>>>>> -Elias
>>>>>>>>
>>>>>>>> Allen Gilliland wrote:
>>>>>>>>> This is a really short one, but I did a proposal anyways.  I'd
>>>>>>>>> like to
>>>>>>>>> add a simple locking mechanism to the various background tasks
>>>>>>>>> that we
>>>>>>>>> have so that running them in clustered environments is safe from
>>>>>>>>> synchronization issues and we can prevent a task from running
>>>>>>>>> at the
>>>>>>>>> same time on multiple machines in the cluster.
>>>>>>>>>
>>>>>>>>> http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_ClusteredTasksViaLocking
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Since this is such a short proposal I'd like to go ahead and
>>>>>>>>> propose a
>>>>>>>>> vote on the proposal as is, since I don't expect there is a
>>>>>>>>> need for
>>>>>>>>> lots of discussion.  This would go into Roller 3.1.
>>>>>>>>>
>>>>>>>>> -- Allen
>>>>>>>>>
>

Re: Proposal: Clustered Tasks via Locking

Reply via email to