Hi Niphlod and Dave,

I think the worker is deleted because the the last heartbeat doesn't change 
for a while. When I comment out the line "#dead_workers.delete()" in 
gluon/schdeuler.py, the task is stuck as QUEUED without new run record 
created. And the worker supposed to be deleted with the last_heartbeat 
doesn't change. When I keep this line "dead_workers.delete()", the 
situation remain the same as my original email. 

I think the task is never processed, since I added a single test line as 
"os.system('touch <some_folder>/test.log')" at the beginning of the task 
function, but this "test.log" file is never created in the server. I guess 
the worker is died right after it is assigned a task, even before the task 
is being processed, and then the task is QUEUED again. 

I have the configuration in /etc/init/web2py-scheduler,conf, and the 
service is running, that is why new worker are pop out when previous one 
dies. 

When I set timeout as 10s for the task, it is still the same as before, 
multiple runs are generated for the same task and they are not ended though 
the assigned workers are died. I guess for the task, the time from RUNNING 
to QUEUED again is very short.
 
scheduler_run.id 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.id>
scheduler_run.task_id 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.task_id>
scheduler_run.status 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.status>
scheduler_run.start_time 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.start_time>
scheduler_run.stop_time 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.stop_time>
scheduler_run.run_output 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.run_output>
scheduler_run.run_result 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.run_result>
scheduler_run.traceback 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.traceback>
scheduler_run.worker_name 
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.worker_name>
130 <https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/130>
run <https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>
RUNNING2015-04-29 21:41:32NoneNoneNoneNoneip-172-31-2-6...131 
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/131>run 
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>RUNNING2015-04-29
 
21:41:47NoneNoneNoneNoneip-172-31-2-6...132 
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/132>run 
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>RUNNING2015-04-29
 
21:42:03NoneNoneNoneNoneip-172-31-2-6...133 
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/133>run 
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>RUNNING2015-04-29
 
21:42:18NoneNoneNoneNoneip-172-31-2-6...
I also did clean all schedule related tables in /database/ and my mySql 
database, and restart from the empty scheduler related tables, but it is 
still the same.  

One difference between the server in original region (works fine) and the 
current region is that in the current region, I use the port 8000 for the 
HTTPS access, instead of the default 443. This is because the current one 
is in China and there are restrictions on HTTP/HTTPS access with port 
80/8080/443. I am not sure if this will affect the workers since the worker 
name only have ip information but not port.

Niphlod, 

May I ask how can I enable the DEBUG mode and check the logging? I do see 
logging in the script of gluon/scheduler.py, but I don't know where I can 
find the logging information in a log file. 

Thanks!


On Wednesday, April 29, 2015 at 4:04:27 PM UTC-4, Niphlod wrote:
>
> whoa. Thank god I put some statistics in to help with debugging..........
>
> It seems that the worker processes are handling one task and dieing in the 
> middle of the process. 
> Also, it seems that your task is never-ending. 
> workers that haven't a task to be processed are running fine (empty_runs 
> on 93164 and 93168 is 20, that means that they're quietly looping), so as 
> long as they don't process any task they seem to be healthy.
> Every execution is done by a different worker, that seems to point towards 
> the previous analysis of the worker dieing in the middle of processing the 
> task.
> Worker processes are not respawned automatically by web2py, so it's likely 
> you have in place some kind of supervisor that restarts them when they die. 
> I'd make two things: 1) setting a timeout on the task, just to see if it's 
> so long that it blocks somewhere (it may be leaking). 2) inspect logs of 
> the worker processes to see where they get stuck. 
>
> I'd also stop all the workers, start one in the console with DEBUG 
> logging, and see see what's going on "in real-time".
>
>
>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to