I have one item with this proposal that could use an opinion. I have completed the code for this and tested that it all works, but I am a little at odds about where the methods for acquireLock() and releaseLock() should go. There are 2 options ...

1. Put the code in a new Manager called TaskLockManager like I mentioned in the proposal. The pro is that it's clean and isolated, but the con is that it may be extraneous and non-cohesive.

2. Put the code in the existing ThreadManager. The pro here is that we don't need a new Manager class and it makes a lot of sense from a cohesion point of view. The con is that then the ThreadManager becomes a persistent class which makes it a little more confusing since it mixes persistent and non-persistent behavior.

I think right now I prefer #2, but either way is fine.

-- Allen


Allen Gilliland wrote:
Okay, one more time, if anyone is interested ...

http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_ClusteredTasksViaLocking

that's what I'm planning to do.

-- Allen


Allen Gilliland wrote:
I didn't hear opinions on this from anyone other than Elias, so I think I am going to go ahead and plan to make a table for tracking tasks/threads and setup the new clustered tasks code to consult that table to do locking, etc.

-- Allen


Elias Torres wrote:

Allen Gilliland wrote:

Elias Torres wrote:
Not sure if we need a whole extra table. wouldn't task.name.propName =
value be enough?
well, that's possible i suppose, but then we are stuffing a lot of
information into that single property and forcing ourselves to parse it
out. capturing the various states then becomes a bit more confusing, i.e.

task.planet.refreshEntries = 987523897, 78345897238, 1800 (locked)
task.planet.refreshEntries = 872837348 (unlocked)

That's not what I meant, it was more like:

task.planetRefreshEntries.status = locked
task.planetRefreshEntries[worker1].lastRun = 987523897
...

and that assumes that we wouldn't want to extend that data at all,
because if we did want to track some kind of other data then we would be screwed. i was thinking that it may be nice to capture some basic error
status from the tasks in the event that they fail for some reason, in
which case you may not want to keep running a task if it's failed 3
times in a row.  so if a task has a 'Status' attribute which could
indicate the state of the task like running, paused, error, waiting, etc.



... but a table is fine too.

-Elias

Another problem I thought would be that always the same node gets to
perform the task which also brings the synchronized clock issue again.
depends on the task, but that would probably be true.  i'm not sure why
that's an issue though.  if we do setup a separate table for tracking
the tasks and their state then it becomes easier to work on some sql
code to the use the db time rather than the cluster nodes time.

doing that given the old proposal kind of defeats the purpose because in
the old proposal the point was to implement it in a way that didn't
require work on the backend because we were just using a simple runtime
property, but if we are going to do work on the backend then we should
probably just build out a more complete solution.

-- Allen


-Elias

Allen Gilliland wrote:
I had thought about this a little more and one thing that I was planning to neglect because of time was the fact that simply using a lock doesn't prevent a task from being run too frequently. i.e. if a task is meant to run once an hour and you have 2 members in your cluster they may have been started 30mins apart from each other so that the end result is that
the task runs every 30 mins.

Perhaps a better solution to this would be to actually create a simple "tasks" table which can maintain one row per task and keep track of ...

1. when the task was last run
2. if the task is currently running (locked)
3. how long the current running task is leased for
4. ?? possible others (how it was triggered, etc)

This would actually work much better at synchronizing the running of the tasks so that if you had 10 cluster members running the task itself is
still only run at the correct interval.

Should I update the proposal to work this way?

-- Allen


Anil Gangolli wrote:
The backend db-specific SQL required may be worthwhile.

Synchronization can't assumed to be perfect, so if you do use the
webapp host's time, you will want to delay lease grabbing until after some grace period that is larger than a max (assumed) clock difference
between the hosts (or conversely require an extending renewal by the
lease holder if the remaining lease time is lower than that).

--a.


----- Original Message ----- From: "Allen Gilliland"
<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, September 21, 2006 10:40 AM
Subject: Re: Proposal: Clustered Tasks via Locking


I was actually thinking that a lazier approach of "that's the sys
admin's job" was preferable.  Setting up a way to get the current
time from the db requires backend work and i would prefer not to do
that if we can. My expectation is that anyone who has a large enough
installation to need 2+ servers working in a cluster should also be
able to make sure that each cluster member has synchronized time.
Most other clustering software has that same expectation.

-- Allen


Elias Torres wrote:
One more thing.. I think you'd need to come up with a SQL query that would test for the lock using the server time and not the app server
time. Basically a SELECT and INSERT/UPDATE together using
CURRENT_TIME()
or whatever to make sure we don't run into clock drifts.

-Elias

Allen Gilliland wrote:
Elias Torres wrote:
I like the proposal and I think it's very important/useful.

I would suggest though to not use a hard-coded expiration
mechanism and
instead use a leasing mechanism. I propose that a task says it
needs the
lock for X number of minutes/hours and writes the time it
started and
the lease amount. It just a subtle tweak, but it optimizes the
scheduling a bit, so a quick task like saving referrers can get a
3-min
lease and not block 3 hours of thread time. Additionally, they
could
store the name of the task, so parallel tasks can work w/o
blocking each
other and only tasks with the same service name wait on each other. Obviously, a task can extend their lease if needed to run for more
time.

For example, let's store this as the
property: task.indexing value: 12:00:01,3mins
yep, I can do it that way. I guess I consider this to be the same
thing
because the lease time for a given task is not likely to ever
change, so
if the task knows what the lease time is for its lock then there
is no
reason the lease needs to be in the db.  Obviously if the lease
time may
vary for a given lock then your approach makes a lot more sense.

Either way will work, but yours is slightly more flexible so I'll
do it
that way.  For the actual property I am going to simplify the
value so
that it's just long<time>, long<lease>.

So if you see the lease in the db it would be
property: task.indexing value: 983472893, 1800

This way it's just easier for the application to use the values
without
actually having to worry about parsing date strings.

Thanks for the suggestion.

-- Allen


In other words, let's re-invent JINI.

-Elias

Allen Gilliland wrote:
This is a really short one, but I did a proposal anyways.  I'd
like to
add a simple locking mechanism to the various background tasks
that we
have so that running them in clustered environments is safe from
synchronization issues and we can prevent a task from running
at the
same time on multiple machines in the cluster.

http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_ClusteredTasksViaLocking





Since this is such a short proposal I'd like to go ahead and
propose a
vote on the proposal as is, since I don't expect there is a
need for
lots of discussion.  This would go into Roller 3.1.

-- Allen

Reply via email to