[sage-devel] Re: zombie problem solved (hoepfully)

Robert Bradshaw Wed, 08 Nov 2006 13:22:58 -0800

On Wed, 8 Nov 2006, William Stein wrote:

> On Wed, 08 Nov 2006 07:27:39 -0800, Gonzalo Tornaria
> <[EMAIL PROTECTED]> wrote:
>
>> On Wed, Nov 08, 2006 at 02:23:18AM -0800, William Stein wrote:
>>> Good question.  Longterm there are a couple of issues:
>>>
>>>   (1) How do you tell the monitor about new processes that get spawned?
>>>       You could put that info in a temp file, but that feels a little
>>>       clunky.
>>
>> You can read from a pipe or some other form of IPC (e.g. unix socket)
>
> Yes, that could work.


This is what I was envisioning too.

>
>>>   (3) I want to continue to support the spawned processes running on
>>> other
>>>       computers (or as different users) via ssh.  With a separate
>>> monitor
>>>       for each spawned process this is possible (though two ssh sessions
>>>       would be needed).  This isn't possible if there is only one
>>> monitor,
>>>       since it can only run on one computer.
>>
>> Just run one monitor for each host.
>
> OK, but then what about question 1 again?  In particular, telling a monitor
> over the network about new processes would be complicated.

I don't think the monitor would have to run on the client computer. Rather 
if I sent several jobs to, say, sage.math, there could be a single monitor 
with an ssh to sage.math that monitors all of them. Then there would be 
one monitor per host.

>
>>>   (4) For reasons I don't understand, the slave process doesn't really
>>> die
>>>       until the monitor exits.  If in the monitor script instead of
>>> doing
>>>       a sys.exit(0) after the kill, I continue the monitor running,
>>> then the
>>>       process the monitor is watching doesn't terminate as it should.
>>> This
>>>       is on OS X Intel, and is rather odd, but isn't an issue with the
>>> 1-monitor
>>>       per process model.
>>
>> See wait(2) and waitpid(2) (and also wait3 and wait4 for resource
>> information).
>>
>> Essentially, when a process dies, it stays in "zombie" state so that
>> one can (a) get exit status (b) get resource usage information (c)
>> dump core IIRC, etc.
>>
>> The usual trick to spawn e.g. a daemon is to fork / setsid(2) / fork,
>> run the process in question as a grandchild, and let the child die;
>> because of the setsid(2) call, the process is not adopted by its
>> grandparent, but by the init process, which is supposed to clean up on
>> exit of any process.  [ See also setsid(8) ]
>>
>> Since we are talking about a monitor, the sensible thing is that the
>> monitor waits for all its subprocesses.
>
> One point that might not have been clear from my previous posting
> is that the monitor does not have any subprocesses.  The gap/gp/magma,
> etc., process that it monitors is a sibling rather than a subprocess.
>
>> In addition, the monitor can get information about resource usage,
>> which could be interesting.
>
> Yes.
>
>>>   (5) The overhead is minimal -- it really is only 2MB to run a minimal
>>>       Python process.
>>
>> However small, it's still O(n).
>
> Yes but for a typically running SAGE program n is about 3-4, at most.
> There's no reason in SAGE to launch numerous subprocesses.

Maybe I'm just not used to the idea of having 64GB of ram yet, but 2MB, 
though it won't bring a system to its knees, doesn't seem tiny to me. 
Maybe it is because of oprhan processes, but I often see more than 3-4 
sage processes running at a time. E.g. one might have a notebook with 
several worksheets (which each have their own sage subprossess) which may 
have, say, magma and mathematica and maxima running.

>
>> BTW, isn't it better to kill -15 first, wait some time, then kill -9
>> (give a chance to cleanup in case there is a SIGTERM handler)
>
> Yes.  Good point.
>
> Many thanks for your email.
>
> Anyway, this process monitor thing is a completely general purpose
> unix tool.  It really a priori has nothing to do with SAGE.  Most
> of the suggestions on the list are to turn it from what I wrote
> into a generic daemon.  I wonder -- has such a generic daemon for
> process monitoring *already* been written and I just don't know
> about it?
>
> William
>
>
> >

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: zombie problem solved (hoepfully)

Reply via email to