Hi Niphlod and Dave,
I think the worker is deleted because the the last heartbeat doesn't change
for a while. When I comment out the line "#dead_workers.delete()" in
gluon/schdeuler.py, the task is stuck as QUEUED without new run record
created. And the worker supposed to be deleted with the last_heartbeat
doesn't change. When I keep this line "dead_workers.delete()", the
situation remain the same as my original email.
I think the task is never processed, since I added a single test line as
"os.system('touch <some_folder>/test.log')" at the beginning of the task
function, but this "test.log" file is never created in the server. I guess
the worker is died right after it is assigned a task, even before the task
is being processed, and then the task is QUEUED again.
I have the configuration in /etc/init/web2py-scheduler,conf, and the
service is running, that is why new worker are pop out when previous one
dies.
When I set timeout as 10s for the task, it is still the same as before,
multiple runs are generated for the same task and they are not ended though
the assigned workers are died. I guess for the task, the time from RUNNING
to QUEUED again is very short.
scheduler_run.id
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.id>
scheduler_run.task_id
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.task_id>
scheduler_run.status
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.status>
scheduler_run.start_time
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.start_time>
scheduler_run.stop_time
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.stop_time>
scheduler_run.run_output
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.run_output>
scheduler_run.run_result
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.run_result>
scheduler_run.traceback
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.traceback>
scheduler_run.worker_name
<https://54.223.188.59:8000/AGIS/appadmin/select/db?orderby=scheduler_run.worker_name>
130 <https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/130>
run <https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>
RUNNING2015-04-29 21:41:32NoneNoneNoneNoneip-172-31-2-6...131
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/131>run
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>RUNNING2015-04-29
21:41:47NoneNoneNoneNoneip-172-31-2-6...132
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/132>run
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>RUNNING2015-04-29
21:42:03NoneNoneNoneNoneip-172-31-2-6...133
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_run/133>run
<https://54.223.188.59:8000/AGIS/appadmin/update/db/scheduler_task/4>RUNNING2015-04-29
21:42:18NoneNoneNoneNoneip-172-31-2-6...
I also did clean all schedule related tables in /database/ and my mySql
database, and restart from the empty scheduler related tables, but it is
still the same.
One difference between the server in original region (works fine) and the
current region is that in the current region, I use the port 8000 for the
HTTPS access, instead of the default 443. This is because the current one
is in China and there are restrictions on HTTP/HTTPS access with port
80/8080/443. I am not sure if this will affect the workers since the worker
name only have ip information but not port.
Niphlod,
May I ask how can I enable the DEBUG mode and check the logging? I do see
logging in the script of gluon/scheduler.py, but I don't know where I can
find the logging information in a log file.
Thanks!
On Wednesday, April 29, 2015 at 4:04:27 PM UTC-4, Niphlod wrote:
>
> whoa. Thank god I put some statistics in to help with debugging..........
>
> It seems that the worker processes are handling one task and dieing in the
> middle of the process.
> Also, it seems that your task is never-ending.
> workers that haven't a task to be processed are running fine (empty_runs
> on 93164 and 93168 is 20, that means that they're quietly looping), so as
> long as they don't process any task they seem to be healthy.
> Every execution is done by a different worker, that seems to point towards
> the previous analysis of the worker dieing in the middle of processing the
> task.
> Worker processes are not respawned automatically by web2py, so it's likely
> you have in place some kind of supervisor that restarts them when they die.
> I'd make two things: 1) setting a timeout on the task, just to see if it's
> so long that it blocks somewhere (it may be leaking). 2) inspect logs of
> the worker processes to see where they get stuck.
>
> I'd also stop all the workers, start one in the console with DEBUG
> logging, and see see what's going on "in real-time".
>
>
>>
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.