[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 03/06/2013 1:02am, spresse1 wrote:
 Whats really bugging me is that it remains open and I can't fetch a reference.
 If I could do either of these, I'd be happy.
 ...
 Perhaps I really want to be implementing with os.fork().  Sigh, I was trying 
 to
 save myself some effort...

I don't see how using os.fork() would make things any easier.  In either 
case you need to prepare a list of fds which the child process should 
close before it starts, or alternatively a list of fds *not* to close.

The real issue is that there is no way for multiprocessing (or 
os.fork()) to automatically infer which fds the child process is going 
to use: if don't explicitly close unneeded ones then the child process 
will inherit all of them.

It might be helpful if multiprocessing exposed a function to close all 
fds except those specified -- see close_all_fds_except() at

http://hg.python.org/sandbox/sbt/file/5d4397a38445/Lib/multiprocessing/popen_spawn_posix.py#l81

Remembering not to close stdout (fd=1) and stderr (fd=2), you could use 
it like

 def foo(reader):
 close_all_fds_except([1, 2, reader.fileno()])
 ...

 r, w = Pipe(False)
 p = Process(target=foo, args=(r,))

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread spresse1

spresse1 added the comment:

 I don't see how using os.fork() would make things any easier.  In either 
 case you need to prepare a list of fds which the child process should 
 close before it starts, or alternatively a list of fds *not* to close.

With fork() I control where the processes diverge much more readily.  I could 
create the pipe in the main process, fork, close unnecessary fds, then call 
into the class that represents the operation of the subprocess.  (ie: do it the 
c way).  This way the class never needs to know about pipes it doesnt care 
about and I can ensure that unnecessary pipes get closed.  So I get the clean, 
understandable semantics I was after and my pipes get closed.  The only thing I 
lose is windows interoperability.

I could reimplement the close_all_fds_except() call (in straight python, using 
os.closerange()).  That seems like a reasonable solution, if a bit of a hack.  
However, given that pipes are exposed by multiprocessing, it might make sense 
to try to get this function incorperated into the main version of it?

I also think that with introspection it would be possible for the subprocessing 
module to be aware of which file descriptors are still actively referenced.  
(ie: 0,1,2 always referenced, introspect through objects in the child to see if 
they have the file.fileno() method) However, I can't state this as a certainty 
without going off and actually implementing such a version.  Additionally, I 
can make absolutely no promises as to the speed of this.  Perhaps, if it 
functioned, it would be an option one could turn on for cases like mine.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 03/06/2013 3:07pm, spresse1 wrote:
 I could reimplement the close_all_fds_except() call (in straight python, using
 os.closerange()).  That seems like a reasonable solution, if a bit of a hack.
 However, given that pipes are exposed by multiprocessing, it might make sense
 to try to get this function incorperated into the main version of it?

close_all_fds_except() is already pure python:

try:
MAXFD = os.sysconf(SC_OPEN_MAX)
except:
MAXFD = 256

def close_all_fds_except(fds):
fds = list(fds) + [-1, MAXFD]
fds.sort()
for i in range(len(fds) - 1):
os.closerange(fds[i]+1, fds[i+1])

 I also think that with introspection it would be possible for the 
 subprocessing
 module to be aware of which file descriptors are still actively referenced.
 (ie: 0,1,2 always referenced, introspect through objects in the child to see 
 if
 they have the file.fileno() method) However, I can't state this as a certainty
 without going off and actually implementing such a version.  Additionally, I 
 can
 make absolutely no promises as to the speed of this.  Perhaps, if it 
 functioned,
 it would be an option one could turn on for cases like mine.

So you want a way to visit all objects directly or indirectly referenced 
by the process object, so you can check whether they have a fileno() 
method?  At the C level all object types which support GC define a 
tp_traverse function, so maybe that could be made available from pure 
Python.

But really, this sounds rather fragile.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread spresse1

spresse1 added the comment:

Oooh, thanks.  I'll use that.

 But really, this sounds rather fragile.

Absolutely.  I concur there is no good way to do this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread Richard Oudkerk

Richard Oudkerk added the comment:

Actually, you can use gc.get_referents(obj) which returns the direct children 
of obj (and is presumably implemented using tp_traverse).

I will close.

--
resolution:  - rejected
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

New submission from spresse1:

[Code demonstrating issue attached]

When overloading multiprocessing.Process and using pipes, a reference to a pipe 
spawned in the parent is not properly garbage collected in the child.  This 
causes the write end of the pipe to be held open with no reference to it in the 
child process, and therefore no way to close it.  Therefore, it can never throw 
EOFError.

Expected behavior:
1. Create a pipe with multiprocessing.Pipe(False)
2. Pass read end to a class which subclasses multiprocessing.Process
3. Close write end in parent process
4. Receive EOFError from read end

Actual behavior:
1. Create a pipe with multiprocessing.Pipe(False)
2. Pass read end to a class which subclasses multiprocessing.Process
3. Close write end in parent process
4. Never receive EOFError from read end

Examining the processes in /proc/[pid]/fds/ indicates that a write pipe is 
still open in the child process, though none should be.  Additionally, no write 
pipe is open in the parent process.  It is my belief that this is the write 
pipe spawned in the parent, and is remaining around incorrectly in the child, 
though there are no references to it.

Tested on 2.7.3 and 3.2.3

--
components: Library (Lib)
files: bugon.tar.gz
messages: 190492
nosy: spresse1
priority: normal
severity: normal
status: open
title: multiprocessing: garbage collector fails to GC Pipe() end when spawning 
child process
versions: Python 2.7, Python 3.2
Added file: http://bugs.python.org/file30448/bugon.tar.gz

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Matthias Lee

Changes by Matthias Lee matthias.a@gmail.com:


--
nosy: +madmaze

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

Now also tested with source-built python 3.3.2.  Issue still exists, same 
example files.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

The way to deal with this is to pass the write end of the pipe to the child 
process so that the child process can explicitly close it -- there is no reason 
to expect garbage collection to make this happen automatically.

You don't explain the difference between functional.py and nonfunctional.py.  
The most obvious thing is the fact that nonfunctional.py seems to have messed 
up indentation: you have a while loop in the class declaration instead of in 
the run() method.

--
nosy: +sbt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

The difference is that nonfunctional.py does not pass the write end of the 
parent's pipe to the child.  functional.py does, and closes it immediately 
after breaking into a new process.  This is what you mentioned to me as a 
workaround.  Corrected code (for indentation) attached.

Why SHOULDN'T I expect this pipe to be closed automatically in the child?  Per 
the documentation for multiprocessing.Connection.close():
This is called automatically when the connection is garbage collected.

The write end of that pipe goes out of scope and has no references in the child 
thread.  Therefore, per my understanding, it should be garbage collected (in 
the child thread).  Where am I wrong about this?

--
Added file: http://bugs.python.org/file30449/bugon.tar.gz

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

 The write end of that pipe goes out of scope and has no references in the 
 child thread.  Therefore, per my understanding, it should be garbage 
 collected (in the child thread).  Where am I wrong about this?

The function which starts the child process by (indirectly) invoking os.fork() 
never gets a chance to finish in the child process, so nothing goes out of 
scope.

Anyway, relying on garbage collection to close resources for you is always a 
bit dodgy.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

So you're telling me that when I spawn a new child process, I have to deal with 
the entirety of my parent process's memory staying around forever?  I would 
have expected this to call to fork(), which gives the child plenty of chance to 
clean up, then call exec() which loads the new executable.  Either that or the 
same instance of the python interpreter is used, just with the knowledge that 
it should execute the child function and then exit.  Keeping all the state that 
will never be used in the second case seems sloppy on the part of python.

The semantics in this case are much better if the pipe gets GC'd.  I see no 
reason my child process should have to know about pipe ends it never uses in 
order to close them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

 So you're telling me that when I spawn a new child process, I have to 
 deal with the entirety of my parent process's memory staying around 
 forever?

With a copy-on-write implementation of fork() this quite likely to use less 
memory than starting a fresh process for the child process.  And it is 
certainly much faster.

 I would have expected this to call to fork(), which gives the child 
 plenty of chance to clean up, then call exec() which loads the new 
 executable.

There is an experimental branch (http://hg.python.org/sandbox/sbt) which 
optionally behaves like that.  Note that clean up means close all fds not 
explcitly passed, and has nothing to do with garbage collection.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

 So you're telling me that when I spawn a new child process, I have to 
 deal with the entirety of my parent process's memory staying around 
 forever?

 With a copy-on-write implementation of fork() this quite likely to use 
 less memory than starting a fresh process for the child process.  And 
 it is certainly much faster.

Fair enough.

 I would have expected this to call to fork(), which gives the child 
 plenty of chance to clean up, then call exec() which loads the new 
 executable.

 There is an experimental branch (http://hg.python.org/sandbox/sbt) 
 which optionally behaves like that.  Note that clean up means close 
 all fds not explcitly passed, and has nothing to do with garbage 
 collection.

I appreciate the pointer, but I am writing code intended for distribution - 
using an experimental branch isn't useful.

What I'm still trying to grasp is why Python explicitly leaves the parent 
processes info around in the child.  It seems like there is no benefit 
(besides, perhaps, speed) and that this choice leads to non-intuitive behavior 
- like this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

 What I'm still trying to grasp is why Python explicitly leaves the
 parent processes info around in the child.  It seems like there is
 no benefit (besides, perhaps, speed) and that this choice leads to
 non-intuitive behavior - like this.

The Windows implementation does not use fork() but still exhibits the 
same behaviour in this respect (except in the experimental branch 
mentioned before).  The real issue is that fds/handles will get 
inherited by the child process unless you explicitly close them. 
(Actually on Windows you need to find a way to inject specific handles 
from the parent to child process).

The behaviour you call non-intuitive is natural to someone used to using 
fork() and pipes on Unix.  multiprocessing really started as a 
cross-platform work-around for the lack of fork() on Windows.

Using fork() is also a lot more flexible: many things that work fine on 
Unix will not work correctly on Windows because of pickle-issues.

The main problem with fork() is that forking a process with multiple 
threads can be problematic.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

I'm actually a nix programmer by trade, so I'm pretty familiar with that 
behavior =p  However, I'm also used to inheriting some way to refer to these 
fds, so that I can close them.  Perhaps I've just missed somewhere a call to 
ask the process for a list of open fds?  This would, to me, be an acceptable 
workaround - I could close all the fds I didn't wish to inherit.

Whats really bugging me is that it remains open and I can't fetch a reference.  
If I could do either of these, I'd be happy.

Maybe this is more an issue with the semantics of multiprocessing?  In that 
this behavior is perfectly reasonable with os.fork() but makes some difficulty 
here.

Perhaps I really want to be implementing with os.fork().  Sigh, I was trying to 
save myself some effort...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com