that piece of code is in place to let the worker being terminated by a
sigterm, i.e a ctrl+c, that is useful for development purposes. it *should*
have nothing to do with long running tasks, but I'm really honest saying I
never had a single task alive for more than an hour. Frankly I don't know
how to test it: being in front of a terminal for 4 days is not that
feasible.
On Monday, December 12, 2016 at 2:58:47 PM UTC+1, Zbigniew Pomianowski
wrote:
>
> First of all: I decided to use web2py for my purposes becase it is awesome
> ;)
> I believe it is not a web2py's bug or anything like related thing. It can
> be more OS and systemd related issue.
>
> Let me explain what I do and what is the environment. I work in a lab
> where we try to automate many tests on physical devices (like STBs and
> phones).
> I have a single source for master (ubuntu server) and slave servers
> (ubuntu server/desktop). Master is configured with uwsgi+nginx+mysql+web2py
> services. Then I do have slaves that use the same source, but can spawn
> tests within scheduler processes.
>
> I need to connect many physical devices to the slaves (climate chambers,
> arduino for IR control, v4l2 capture cards, ethernet controled power
> sources, power supply instruments, measurement instruments... bla bla bla).
> I decided to make a GUI using qooxdoo where user can write a python code
> that allocates physical devices and run specific test scenarios to examine
> DUT (Device Under Test) condition.
> These tests sometimes need to be run for tens of hours. So the workflow
> can be described as:
>
> - user writes a script
> - the test is enqueued as a task in db (JobGraph does a perfect work
> for me because I need to control the execution sequence mainly because of
> the existence of physical devices like climate chambers and etc; allocated
> lab instrument cannot be used by two tests at the same time, jobgraph can
> yield it)
> - every slave has it's unique group-name
> - DUTs and lab instruments are bound to the specific slave -
> scheduler group-name
> - slave executes the test scenario programmed by user
> - test is nothing more than overriden TestUnit
> - every LAB instrument has child process which logs parameters
> (temperature, humidity, voltage bla bla bla)
> - for DUT is also created instance of a class that spawns child
> processes (video freeze detection based on gstreamer, udp/tcp/telnet
> interface to interract with STB)
> - in test scenario I have plenty of sleeps - test scenario demands
> for example that STB stays in a cimate chamber for 20h in specific temp
> and
> humidity
>
> My systemd service file looks like this:
> [Unit]
> Description=ATMS workers
> After=network-online.target
> Wants=network-online.target
>
> [Service]
> User=<USER>
> Restart=on-failure
> RestartSec=120
> Environment=DISPLAY=:<DISPLAY_NB> # usually 0
> Environment=XAUTHORITY=/home/<USER>/.Xauthority
> EnvironmentFile={{INSTALL}}/web2py_venv/web2py/applications/atms/private/
> atms.env
> ExecStartPre=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R
> ${WEB2PYDIR}/applications/atms/systemd/on_start.py -P"
> ExecStart=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -K atms:%H,atms:%H"
> ExecStop=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R
> ${WEB2PYDIR}/applications/atms/systemd/on_stop.py -P"
>
> [Install]
> # graphical because i had to make some kind of preview with ximagesink for
> fast lookup if video is ok on STB
> WantedBy=graphical.target
> Alias=atms.service
>
>
> I realised that for very long test (last one was planned to be longer than
> 100h) i got sth like this in logs:
> gru 11 12:01:52 slaveX sh[2184]: File
> "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/adapters/base.py",
> line 1435, in
> gru 11 12:01:52 slaveX sh[2184]: return str(long(obj))
> gru 11 12:01:52 slaveX sh[2184]: File
> "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/objects.py", line 82,
> in <lambda
> gru 11 12:01:52 slaveX sh[2184]: __long__ = lambda self: long(self.get
> ('id'))
> gru 11 12:01:52 slaveX sh[2184]: TypeError: long() argument must be a
> string or a number, not 'NoneType'
>
> The test was stopped 20h before it was supposed to be finished :/
> After some digging I found that before these errors i got this one:
> gru 11 12:01:34 slaveX sh[2184]: ERROR:web2py.app.atms:[(</tmp/
> taskId10672_caseId852_duts32/test_script.py.TestCase testMethod=
> test_example>, 'Traceback (most recent call last):\n File
> "/tmp/taskId10672_caseId852_duts32/test_script.py", line 90, in
> test_example\n sleep(M10)\n File
> "/atms/web2py_venv/web2py/gluon/scheduler.py", line 702, in <lambda>\n
> signal.signal(signal.SIGTERM, lambda signum, stack_frame:
> sys.exit(1))\nSystemExit: 1\n')]
> gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: new task report
> : FAILED
> gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: traceback:
> Traceback (most recent call last):
> .. and many many many tracebacks with errors after that
>
> Line 702 in scheduler.py is:
> signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
> ....in scheduler's loop function. What does it mean? The process was
> stopped because kernel/systemd sth else decided to do so??
> Long sleep calls can have sth in common?
> Did anyone encountered similar problems? Do you have any idea how to
> prevent against such behavior?
>
> Thank you in advance for any response :)
>
>
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.