Allen Gilliland wrote: > > > Elias Torres wrote: >> Not sure if we need a whole extra table. wouldn't task.name.propName = >> value be enough? > > well, that's possible i suppose, but then we are stuffing a lot of > information into that single property and forcing ourselves to parse it > out. capturing the various states then becomes a bit more confusing, i.e. > > task.planet.refreshEntries = 987523897, 78345897238, 1800 (locked) > task.planet.refreshEntries = 872837348 (unlocked)
That's not what I meant, it was more like: task.planetRefreshEntries.status = locked task.planetRefreshEntries[worker1].lastRun = 987523897 ... > > and that assumes that we wouldn't want to extend that data at all, > because if we did want to track some kind of other data then we would be > screwed. i was thinking that it may be nice to capture some basic error > status from the tasks in the event that they fail for some reason, in > which case you may not want to keep running a task if it's failed 3 > times in a row. so if a task has a 'Status' attribute which could > indicate the state of the task like running, paused, error, waiting, etc. > > ... but a table is fine too. -Elias >> >> Another problem I thought would be that always the same node gets to >> perform the task which also brings the synchronized clock issue again. > > depends on the task, but that would probably be true. i'm not sure why > that's an issue though. if we do setup a separate table for tracking > the tasks and their state then it becomes easier to work on some sql > code to the use the db time rather than the cluster nodes time. > > doing that given the old proposal kind of defeats the purpose because in > the old proposal the point was to implement it in a way that didn't > require work on the backend because we were just using a simple runtime > property, but if we are going to do work on the backend then we should > probably just build out a more complete solution. > > -- Allen > > >> >> -Elias >> >> Allen Gilliland wrote: >>> I had thought about this a little more and one thing that I was planning >>> to neglect because of time was the fact that simply using a lock doesn't >>> prevent a task from being run too frequently. i.e. if a task is meant >>> to run once an hour and you have 2 members in your cluster they may have >>> been started 30mins apart from each other so that the end result is that >>> the task runs every 30 mins. >>> >>> Perhaps a better solution to this would be to actually create a simple >>> "tasks" table which can maintain one row per task and keep track of ... >>> >>> 1. when the task was last run >>> 2. if the task is currently running (locked) >>> 3. how long the current running task is leased for >>> 4. ?? possible others (how it was triggered, etc) >>> >>> This would actually work much better at synchronizing the running of the >>> tasks so that if you had 10 cluster members running the task itself is >>> still only run at the correct interval. >>> >>> Should I update the proposal to work this way? >>> >>> -- Allen >>> >>> >>> Anil Gangolli wrote: >>>> The backend db-specific SQL required may be worthwhile. >>>> >>>> Synchronization can't assumed to be perfect, so if you do use the >>>> webapp host's time, you will want to delay lease grabbing until after >>>> some grace period that is larger than a max (assumed) clock difference >>>> between the hosts (or conversely require an extending renewal by the >>>> lease holder if the remaining lease time is lower than that). >>>> >>>> --a. >>>> >>>> >>>> ----- Original Message ----- From: "Allen Gilliland" >>>> <[EMAIL PROTECTED]> >>>> To: <[email protected]> >>>> Sent: Thursday, September 21, 2006 10:40 AM >>>> Subject: Re: Proposal: Clustered Tasks via Locking >>>> >>>> >>>>> I was actually thinking that a lazier approach of "that's the sys >>>>> admin's job" was preferable. Setting up a way to get the current >>>>> time from the db requires backend work and i would prefer not to do >>>>> that if we can. My expectation is that anyone who has a large enough >>>>> installation to need 2+ servers working in a cluster should also be >>>>> able to make sure that each cluster member has synchronized time. >>>>> Most other clustering software has that same expectation. >>>>> >>>>> -- Allen >>>>> >>>>> >>>>> Elias Torres wrote: >>>>>> One more thing.. I think you'd need to come up with a SQL query that >>>>>> would test for the lock using the server time and not the app server >>>>>> time. Basically a SELECT and INSERT/UPDATE together using >>>>>> CURRENT_TIME() >>>>>> or whatever to make sure we don't run into clock drifts. >>>>>> >>>>>> -Elias >>>>>> >>>>>> Allen Gilliland wrote: >>>>>>> Elias Torres wrote: >>>>>>>> I like the proposal and I think it's very important/useful. >>>>>>>> >>>>>>>> I would suggest though to not use a hard-coded expiration >>>>>>>> mechanism and >>>>>>>> instead use a leasing mechanism. I propose that a task says it >>>>>>>> needs the >>>>>>>> lock for X number of minutes/hours and writes the time it >>>>>>>> started and >>>>>>>> the lease amount. It just a subtle tweak, but it optimizes the >>>>>>>> scheduling a bit, so a quick task like saving referrers can get a >>>>>>>> 3-min >>>>>>>> lease and not block 3 hours of thread time. Additionally, they >>>>>>>> could >>>>>>>> store the name of the task, so parallel tasks can work w/o >>>>>>>> blocking each >>>>>>>> other and only tasks with the same service name wait on each other. >>>>>>>> Obviously, a task can extend their lease if needed to run for more >>>>>>>> time. >>>>>>>> >>>>>>>> For example, let's store this as the >>>>>>>> property: task.indexing value: 12:00:01,3mins >>>>>>> yep, I can do it that way. I guess I consider this to be the same >>>>>>> thing >>>>>>> because the lease time for a given task is not likely to ever >>>>>>> change, so >>>>>>> if the task knows what the lease time is for its lock then there >>>>>>> is no >>>>>>> reason the lease needs to be in the db. Obviously if the lease >>>>>>> time may >>>>>>> vary for a given lock then your approach makes a lot more sense. >>>>>>> >>>>>>> Either way will work, but yours is slightly more flexible so I'll >>>>>>> do it >>>>>>> that way. For the actual property I am going to simplify the >>>>>>> value so >>>>>>> that it's just long<time>, long<lease>. >>>>>>> >>>>>>> So if you see the lease in the db it would be >>>>>>> property: task.indexing value: 983472893, 1800 >>>>>>> >>>>>>> This way it's just easier for the application to use the values >>>>>>> without >>>>>>> actually having to worry about parsing date strings. >>>>>>> >>>>>>> Thanks for the suggestion. >>>>>>> >>>>>>> -- Allen >>>>>>> >>>>>>> >>>>>>>> In other words, let's re-invent JINI. >>>>>>>> >>>>>>>> -Elias >>>>>>>> >>>>>>>> Allen Gilliland wrote: >>>>>>>>> This is a really short one, but I did a proposal anyways. I'd >>>>>>>>> like to >>>>>>>>> add a simple locking mechanism to the various background tasks >>>>>>>>> that we >>>>>>>>> have so that running them in clustered environments is safe from >>>>>>>>> synchronization issues and we can prevent a task from running >>>>>>>>> at the >>>>>>>>> same time on multiple machines in the cluster. >>>>>>>>> >>>>>>>>> http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_ClusteredTasksViaLocking >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Since this is such a short proposal I'd like to go ahead and >>>>>>>>> propose a >>>>>>>>> vote on the proposal as is, since I don't expect there is a >>>>>>>>> need for >>>>>>>>> lots of discussion. This would go into Roller 3.1. >>>>>>>>> >>>>>>>>> -- Allen >>>>>>>>> >
