[web2py] Scheduler in systemd environment (running very long tasks)

Zbigniew Pomianowski Mon, 12 Dec 2016 05:59:26 -0800

First of all: I decided to use web2py for my purposes becase it is awesome 
;)
I believe it is not a web2py's bug or anything like related thing. It can 
be more OS and systemd related issue.


Let me explain what I do and what is the environment. I work in a lab where 
we try to automate many tests on physical devices (like STBs and phones).
I have a single source for master (ubuntu server) and slave servers (ubuntu 
server/desktop). Master is configured with uwsgi+nginx+mysql+web2py 
services. Then I do have slaves that use the same source, but can spawn 
tests within scheduler processes.

I need to connect many physical devices to the  slaves (climate chambers, 
arduino for IR control, v4l2 capture cards, ethernet controled power 
sources, power supply instruments, measurement instruments... bla bla bla).
I decided to make a GUI using qooxdoo where user can write a python code 
that allocates physical devices and run specific test scenarios to examine 
DUT (Device Under Test) condition.
These tests sometimes need to be run for tens of hours. So the workflow can 
be described as:

   - user writes a script
   - the test is enqueued as a task in db (JobGraph does a perfect work for 
   me because I need to control the execution sequence mainly because of the 
   existence of physical devices like climate chambers and etc; allocated lab 
   instrument cannot be used by two tests at the same time, jobgraph can yield 
   it) 
   - every slave has it's unique group-name
      - DUTs and lab instruments are bound to the specific slave - 
      scheduler group-name
   - slave executes the test scenario programmed by user
      - test is nothing more than overriden TestUnit
      - every LAB instrument has child process which logs parameters 
      (temperature, humidity, voltage bla bla bla)
      - for DUT is also created instance of a class that spawns child 
      processes (video freeze detection based on gstreamer, udp/tcp/telnet 
      interface to interract with STB)
      - in test scenario I have plenty of sleeps - test scenario demands 
      for example that STB stays in a cimate chamber for 20h in specific temp 
and 
      humidity
   
My systemd service file looks like this:
[Unit]
Description=ATMS workers
After=network-online.target
Wants=network-online.target

[Service]
User=<USER>
Restart=on-failure
RestartSec=120
Environment=DISPLAY=:<DISPLAY_NB> # usually 0
Environment=XAUTHORITY=/home/<USER>/.Xauthority
EnvironmentFile={{INSTALL}}/web2py_venv/web2py/applications/atms/private/
atms.env
ExecStartPre=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R 
${WEB2PYDIR}/applications/atms/systemd/on_start.py -P"
ExecStart=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -K atms:%H,atms:%H"
ExecStop=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R 
${WEB2PYDIR}/applications/atms/systemd/on_stop.py -P"

[Install]
# graphical because i had to make some kind of preview with ximagesink for 
fast lookup if video is ok on STB
WantedBy=graphical.target
Alias=atms.service


I realised that for very long test (last one was planned to be longer than 
100h) i got  sth like this in logs:
gru 11 12:01:52 slaveX sh[2184]:   File 
"/atms/web2py_venv/web2py/gluon/packages/dal/pydal/adapters/base.py", line 
1435, in
gru 11 12:01:52 slaveX sh[2184]:     return str(long(obj))
gru 11 12:01:52 slaveX sh[2184]:   File 
"/atms/web2py_venv/web2py/gluon/packages/dal/pydal/objects.py", line 82, in 
<lambda
gru 11 12:01:52 slaveX sh[2184]:     __long__ = lambda self: long(self.get(
'id'))
gru 11 12:01:52 slaveX sh[2184]: TypeError: long() argument must be a string 
or a number, not 'NoneType'

The test was stopped 20h before it was supposed to be finished :/
After some digging I found that before these errors i got this one:
gru 11 12:01:34 slaveX sh[2184]: ERROR:web2py.app.atms:[(</tmp/
taskId10672_caseId852_duts32/test_script.py.TestCase testMethod=test_example
>, 'Traceback (most recent call last):\n  File 
"/tmp/taskId10672_caseId852_duts32/test_script.py", line 90, in 
test_example\n    sleep(M10)\n  File 
"/atms/web2py_venv/web2py/gluon/scheduler.py", line 702, in <lambda>\n   
 signal.signal(signal.SIGTERM, lambda signum, stack_frame: 
sys.exit(1))\nSystemExit: 1\n')]
gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms:    new task report: 
FAILED
gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms:   traceback: 
Traceback (most recent call last):
.. and many many many tracebacks with errors after that

Line 702 in scheduler.py is:
signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
....in scheduler's loop function. What does it mean? The process was 
stopped because kernel/systemd sth else decided to do so??
Long sleep calls can have sth in common?
Did anyone encountered similar problems? Do you have any idea how to 
prevent against such behavior?

Thank you in advance for any response :)

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[web2py] Scheduler in systemd environment (running very long tasks)

Reply via email to