First of all: I decided to use web2py for my purposes becase it is awesome
;)
I believe it is not a web2py's bug or anything like related thing. It can
be more OS and systemd related issue.
Let me explain what I do and what is the environment. I work in a lab where
we try to automate many tests on physical devices (like STBs and phones).
I have a single source for master (ubuntu server) and slave servers (ubuntu
server/desktop). Master is configured with uwsgi+nginx+mysql+web2py
services. Then I do have slaves that use the same source, but can spawn
tests within scheduler processes.
I need to connect many physical devices to the slaves (climate chambers,
arduino for IR control, v4l2 capture cards, ethernet controled power
sources, power supply instruments, measurement instruments... bla bla bla).
I decided to make a GUI using qooxdoo where user can write a python code
that allocates physical devices and run specific test scenarios to examine
DUT (Device Under Test) condition.
These tests sometimes need to be run for tens of hours. So the workflow can
be described as:
- user writes a script
- the test is enqueued as a task in db (JobGraph does a perfect work for
me because I need to control the execution sequence mainly because of the
existence of physical devices like climate chambers and etc; allocated lab
instrument cannot be used by two tests at the same time, jobgraph can yield
it)
- every slave has it's unique group-name
- DUTs and lab instruments are bound to the specific slave -
scheduler group-name
- slave executes the test scenario programmed by user
- test is nothing more than overriden TestUnit
- every LAB instrument has child process which logs parameters
(temperature, humidity, voltage bla bla bla)
- for DUT is also created instance of a class that spawns child
processes (video freeze detection based on gstreamer, udp/tcp/telnet
interface to interract with STB)
- in test scenario I have plenty of sleeps - test scenario demands
for example that STB stays in a cimate chamber for 20h in specific temp
and
humidity
My systemd service file looks like this:
[Unit]
Description=ATMS workers
After=network-online.target
Wants=network-online.target
[Service]
User=<USER>
Restart=on-failure
RestartSec=120
Environment=DISPLAY=:<DISPLAY_NB> # usually 0
Environment=XAUTHORITY=/home/<USER>/.Xauthority
EnvironmentFile={{INSTALL}}/web2py_venv/web2py/applications/atms/private/
atms.env
ExecStartPre=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R
${WEB2PYDIR}/applications/atms/systemd/on_start.py -P"
ExecStart=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -K atms:%H,atms:%H"
ExecStop=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R
${WEB2PYDIR}/applications/atms/systemd/on_stop.py -P"
[Install]
# graphical because i had to make some kind of preview with ximagesink for
fast lookup if video is ok on STB
WantedBy=graphical.target
Alias=atms.service
I realised that for very long test (last one was planned to be longer than
100h) i got sth like this in logs:
gru 11 12:01:52 slaveX sh[2184]: File
"/atms/web2py_venv/web2py/gluon/packages/dal/pydal/adapters/base.py", line
1435, in
gru 11 12:01:52 slaveX sh[2184]: return str(long(obj))
gru 11 12:01:52 slaveX sh[2184]: File
"/atms/web2py_venv/web2py/gluon/packages/dal/pydal/objects.py", line 82, in
<lambda
gru 11 12:01:52 slaveX sh[2184]: __long__ = lambda self: long(self.get(
'id'))
gru 11 12:01:52 slaveX sh[2184]: TypeError: long() argument must be a string
or a number, not 'NoneType'
The test was stopped 20h before it was supposed to be finished :/
After some digging I found that before these errors i got this one:
gru 11 12:01:34 slaveX sh[2184]: ERROR:web2py.app.atms:[(</tmp/
taskId10672_caseId852_duts32/test_script.py.TestCase testMethod=test_example
>, 'Traceback (most recent call last):\n File
"/tmp/taskId10672_caseId852_duts32/test_script.py", line 90, in
test_example\n sleep(M10)\n File
"/atms/web2py_venv/web2py/gluon/scheduler.py", line 702, in <lambda>\n
signal.signal(signal.SIGTERM, lambda signum, stack_frame:
sys.exit(1))\nSystemExit: 1\n')]
gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: new task report:
FAILED
gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: traceback:
Traceback (most recent call last):
.. and many many many tracebacks with errors after that
Line 702 in scheduler.py is:
signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
....in scheduler's loop function. What does it mean? The process was
stopped because kernel/systemd sth else decided to do so??
Long sleep calls can have sth in common?
Did anyone encountered similar problems? Do you have any idea how to
prevent against such behavior?
Thank you in advance for any response :)
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.