I'm naturally curious to know the culprit in your env, but unfortunately I
can't reproduce in any of my systems (*buntu 14.04 + postgresql/sqlite or
Win2012r2 + postgresql/sqlite/mssql).
Needs to be said that I mostly hate mysql (and would never trust it nor
recommend to the worst of my enemies) but its totally a personal bias. I
know several peoples using mysql and never reporting issues whatsoever.
Dunno if I'm the "most powered user" of the scheduler around but I range
from 4 to 20 workers (even 50, but "cheating" with the redis-backed
version) and these kind of issues never happened, unless a reaaaally long
task with a reeeeaally long stdout output (which, once known, it's easy to
suppress, and btw the problem is less of a matter since a few releases
ago).
"My" worker processes are usually alive for more than a week (i.e. same
pid), so it's not a problem of phantom processes or leaks.
Although pretty heavy (resources-wise) if compared to pyres, celery, huey,
rq (just to name the "famous ones") , web2py's scheduler, which HAS to work
in all supported OS by web2py is rather simple: each worker spawns a single
process with a queue 1 element long to communicate over ... that process
handles a single task, and then dies. Everything is thrashed and recreated
at the next task pick-up.
On Wednesday, November 2, 2016 at 5:47:42 PM UTC+1, Erwn Ltmann wrote:
>
> Hi Niphlod,
>
> your replies are always a pleasure to me. :)
>
> On Wednesday, November 2, 2016 at 12:00:48 PM UTC+1, Niphlod wrote:
>>
>> I'd say there are a LOT of strange things going on on your system, since
>> you're reporting several different issues that nobody ever faced and all in
>> the last week.
>>
>
> Concerning deadlocks and zombies - right? Both issues are faced within the
> using the scheduler, not web2py in general. And only in cases I start more
> than one worker.
>
> zombie processes shouldn't be there unless you killed improperly a worker
>> process.
>> Python can't really do anything about it, and that's the way there's a
>> specific API to kill (or terminate) a worker.
>>
>
> Your right, the killer is the scheduler himself. Why? The scheduler
> terminates a task after passing the timeout. The timeout happened because
> the task never does that as it is defined. In cases of zombie situation the
> sub process is stopping with sem_wait() function (pstack). I don't know
> way. But, it's happened before the function 'executor' entered, because of
> no debug line printing at the entry point of that function.
>
> Ok. That's all what I know. I have different RHEL systems (RH6,RH5) with
> python 2.7.12 and MariaDB. Not realy exotic.
>
> Thank you for your endurance
> Erwn
>
>
>
>>
>> On Wednesday, November 2, 2016 at 10:53:58 AM UTC+1, Erwn Ltmann wrote:
>>>
>>> Dear all,
>>>
>>> I'm astonished about a lot of processes as sub process of scheduler
>>> worker are not finished.
>>>
>>> pstree -p 16731
>>>
>>>>
>>>> bash(16731)---python2.7(24545)-+-python2.7(24564)---{python2.7}(24565)
>>>> |-python2.7(24572)-+-python2.7(1110)
>>>> | |-python2.7(8647)
>>>> | |-python2.7(11747)
>>>> | |-python2.7(14117)
>>>> | |-python2.7(14302)
>>>>
>>>
>>> The 16731 is my shell I started the scheduler with four worker:
>>>
>>> w2p -K arm:ticker,arm,arm,arm
>>>>
>>>
>>> The pid 24564 is the ticker worker (only hold the ticker) and 24572 one
>>> of three standard worker which has to process my task's function.
>>>
>>> My first focus was on the function itself. But, if I clip the function
>>> ('return True' at start point) the zombies were already there. My next
>>> analyze step was to show the pid at the start point of 'executor' function
>>> of scheduler.py. In case of zombie processes I never reach this debug
>>> point. Next I printed out the list of zombie processes
>>> (multiprocessing.active_children()) at the exit point of tasks which passed
>>> the timeout (see function async). It's the point in the scheduler code
>>> where 'task timeout' is printing out. The timeout is clear because of a
>>> process which never returns a result. But, how is it possible?
>>>
>>> Here's the list of my extra debug line in function async's timeout part:
>>>
>>> 09:09:47.752 [24576] Process-4:488,
>>>> 09:14:28.907 [24576] Process-4:488, Process-4:1125,
>>>> 09:15:59.526 [24576] Process-4:488, Process-4:1125, Process-4:1301,
>>>> 09:20:35.924 [24576] Process-4:488, Process-4:1880, Process-4:1125,
>>>> Process-4:1301,
>>>>
>>>
>>> Why did the 'executor' function never process the code?
>>>
>>> def async(self, task):
>>>
>>> ...
>>>
>>> out = multiprocessing.Queue()
>>>> queue = multiprocessing.Queue(maxsize=1)
>>>> p = multiprocessing.Process(target=executor, args=(queue, task, out))
>>>
>>> ...
>>>> if p.is_alive():
>>>> p.terminate()
>>>> logger.debug(' +- Zombie (%s)' %
>>>> multiprocessing.active_children())
>>>>
>>>
>>> And here the extra line in executor:
>>>
>>> def executor(queue, task, out):
>>>> """The function used to execute tasks in the background process."""
>>>> logger.debug(' task started PID:%s -> %s' %
>>>> (os.getppid(),os.getpid()))
>>>
>>> ...
>>>>
>>>
>>> Of course, I have to stress the scheduler to become zombies. The rate is
>>> 1 of 1000. In my case 25 times each hour!
>>>
>>> Can any body clarify this? May it's concerning pure python.
>>>
>>> Thx,
>>> Erwn
>>>
>>
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.