I'm have a few things that need clarification and am also experiencing some odd behavior with the scheduler. I'm using my app's db instance (mysql) for the scheduler.
at the bottom of scheduler.py: from gluon.scheduler import Scheduler scheduler = Scheduler(db,heartbeat=3) I start my workers like this: head node: python web2py.py -K myapp:upload,myapp:upload,myapp:upload,myapp:upload, myapp:upload,myapp:download,myapp:download,myapp:download,myapp:download, myapp:download,myapp:head_monitorQ 5 compute nodes: GROUP0="myapp:"$HOSTNAME"_comp_0:compQ" GROUP1="myapp:"$HOSTNAME"_comp_1:compQ" GROUP2="myapp:"$HOSTNAME"_comp_2:compQ" GROUP3="myapp:"$HOSTNAME"_comp_3:compQ" GROUP4="myapp:"$HOSTNAME"_comp_4:compQ" GROUP5="myapp:"$HOSTNAME"_comp_5:compQ" GROUP6="myapp:"$HOSTNAME"_comp_6:compQ" MON="myapp:"$HOSTNAME"_monitorQ" python web2py.py -K $GROUP0,$GROUP1,$GROUP2,$GROUP3,$GROUP4,$GROUP5,$GROUP6,$MON The head node has 4 "upload" and 4 "download" processes. Each compute node has 7 "compQ" processes that do the actual work. The hostname based groups are unique so I can remotely manage the workers. The monitorQ's run a task every 30s to provide hw monitoring to my application. 1) I have the need to dynamically enable/disable workers to match available hardware. I was hoping to do this with the disable/resume commands but the behavior isn't what I had hoped (but I think what is intended). I would like to send a command that will stop a worker from getting assigned/picking up jobs until a resume is issued. From the docs and experimenting, it looks like all disable does is simply sleep the worker for a little bit and then it gets right back to work. To get my current desired behavior I issue a terminate command, but then i need to ssh into each compute node and restart workers when i want to scale back up...which works but is less than ideal. *Is there any way to "toggle" a worker into a disabled state?* 2) A previous post from Niphlod explains the worker assignment: A QUEUED task is not picked up by a worker, it is first ASSIGNED to a > worker that can pick up only the ones ASSIGNED to him. The "assignment" > phase is important because: > - the group_name parameter is honored (task queued with the group_name > 'foo' gets assigned only to workers that process 'foo' tasks (the > group_names column in scheduler_workers)) > - DISABLED, KILL and TERMINATE workers are "removed" from the assignment > alltogether > - in multiple workers situations the QUEUED tasks are split amongst > workers evenly, and workers "know in advance" what tasks they are allowed > to execute (the assignment allows the scheduler to set up n "independant" > queues for the n ACTIVE workers) This is an issue for me, because my tasks do not have a uniform run time. Some jobs can take 4 minutes while some can take 4 hours. I keep getting into situations where a node is sitting there with plenty of idle workers available, but they apparently don't have tasks to pick up. Another node is chugging along with a bunch of backlogged assigned tasks. Also sometimes a single worker on a node is left with all the assigned tasks while the other works are sitting idle. *Is there any built-in way to periodically force a reassignment of tasks to deal with this type if situation?* 3) I had been using "immediate=True" on all of my tasks. I started to see db deadlock errors occasionally when scheduling jobs using queue_task(). Removing "immediate=True" seemed to fix this problem. *Is there any reason why immediate could be causing deadlocks?* -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

