subject:"\[issue6721\] Locks in the standard library should be sanitized on fork"


Connor Wolf added the comment:

> Python 3.5.1+ (default, Mar 30 2016, 22:46:26)

Whatever the stock 3.5 on ubuntu 16.04 x64 is.

I've actually been running into a whole horde of really bizarre issues related 
to what I /think/ is locking in stdout. 

Basically, I have a context where I have thousands and thousands of (relatively 
short lived) `multiprocessing.Process()` processes, and over time they all get 
wedged (basically, I have ~4-32 processes alive at any time, but they all get 
recycled every few minutes).

After doing some horrible 
(https://github.com/fake-name/ReadableWebProxy/blob/master/logSetup.py#L21-L78) 
hackery in the logging module, I'm not seeing processes get wedged there, but I 
do still encounter issues with what I can only assume is a lock in the print 
statement. I'm hooking into a wedged process using 
[pystuck](https://github.com/alonho/pystuck)

durr@rwpscrape:/media/Storage/Scripts/ReadableWebProxy⟫ pystuck --port 6675
Welcome to the pystuck interactive shell.
Use the 'modules' dictionary to access remote modules (like 'os', or '__main__')
Use the `%show threads` magic to display all thread stack traces.

In [1]: show threads
<_MainThread(MainThread, started 140574012434176)>
  File "runScrape.py", line 74, in 
go()
  File "runScrape.py", line 57, in go
runner.run()
  File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 453, 
in run
living = sum([manager.check_run_jobs() for manager in managers])
  File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 453, 
in 
living = sum([manager.check_run_jobs() for manager in managers])
  File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 364, 
in check_run_jobs
proc.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 74, in _launch
code = process_obj._bootstrap()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
  File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 145, 
in run
run.go()
  File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 101, 
in go
self.log.info("RunInstance starting!")
  File "/usr/lib/python3.5/logging/__init__.py", line 1279, in info
self._log(INFO, msg, args, **kwargs)
  File "/usr/lib/python3.5/logging/__init__.py", line 1415, in _log
self.handle(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 1425, in handle
self.callHandlers(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 1487, in callHandlers
hdlr.handle(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 855, in handle
self.emit(record)
  File "/media/Storage/Scripts/ReadableWebProxy/logSetup.py", line 134, in emit
print(outstr)


  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
self._bootstrap_inner()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rpyc/utils/server.py", line 241, 
in start
self.accept()
  File "/usr/local/lib/python3.5/dist-packages/rpyc/utils/server.py", line 128, 
in accept
sock, addrinfo = self.listener.accept()
  File "/usr/lib/python3.5/socket.py", line 195, in accept
fd, addr = self._accept()


  File "/usr/local/lib/python3.5/dist-packages/pystuck/thread_probe.py", line 
15, in thread_frame_generator
yield (thread_, frame)


So, somehow the print() statement is blocking, which I have /no/ idea how to go 
about debugging. I assume there's a lock /in/ the print statement function 
call, and I'm probably going to look into wrapping both the print() call and 
the multiprocessing.Process() call  execution in a single, shared 
multiprocessing lock, but that
seems like a very patchwork solution to something that should just work.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2016-07-08 Thread Gregory P. Smith


Gregory P. Smith added the comment:

My intent is not to block anything.  I'm Just explaining why I'm not motivated 
to spend much time on this issue myself.  Others are welcome to.

subprocess is not related to this issue, it has been fixed for use with threads 
(in 3.2 and higher) with an extremely widely used drop in replacement back-port 
for 2.7 https://pypi.python.org/pypi/subprocess32.  But even 2.7's poor 
subprocess implementation never triggered this specific issue in the first 
place (unless someone logged from a pre_exec_fn which would be a laughable 
thing to do anyways).

multiprocessing: It has spawn (as of 3.4) and forkserver methods both of which 
can help avoid this issue.  Caveats: spawn understandably has negative 
performance implications and forkserver requires the forkserver to be forked 
before threads potentially holding locks have been started.

As for the gross hacky monkey patching workaround: That was the approach I took 
in 
https://github.com/google/python-atfork/blob/master/atfork/stdlib_fixer.py#L51

Definitely a hack, but one that does work on existing interpreters.

Conner & lesha: Which Python version(s) are you using?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2016-07-08 Thread lesha


lesha added the comment:

On a moment's reflection, a lot of my prior comment is wrong. Sorry about that. 

- glog does not, that I know of, sanitize locks on fork. You just shouldn't log 
after fork but before exec.

- Using `pthread_atfork` to clean up the `logging` lock might be enough to make 
it safe from the "just forked" context, but without adding fairly exhaustive 
tests around this logic, it would be fragile with respect to further 
improvements to `logging`. So even just making this one library safe is a 
considerable amount of work.

So I retract most of my previous opinion. Sorry.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2016-07-08 Thread lesha

lesha added the comment:

I work on a multi-million-line C++ codebase that uses fork() from multithreaded
programs all over the place. We use `glog` ubiquitously.

This bug here that spans 6 years and has affected dozens of people
(conservatively) simply does not exist for us. That is because glog takes the
pragmatic approach of sanitizing its mutex on fork:

https://github.com/google/glog/blob/4d391fe692ae6b9e0105f473945c415a3ce5a401/src/base/mutex.h#L249

In my opinion, "thou shalt never fork() in a threaded program" is impractical
purism. It is good to be aware of the dangers that lie therein, but it is
completely possible to safely spawn **subprocesses** from multithreaded
programs on modern OSes like Linux.

Python's subprocess **ought** to be safe to use in threaded programs. Any
issues with this (aside from `pre_exec_fn`, obviously) are bugs in
`subprocess`.

Here is a C++ implementation of the concept that can be safely used in threaded
programs:

https://github.com/facebook/folly/blob/master/folly/Subprocess.cpp

Note that unlike Python's subprocess `pre_exec_fn`, the C++ analog is very loud
in warning you about the scary world in which your closure will execute:

https://github.com/facebook/folly/blob/master/folly/Subprocess.h#L252

The point to my message is simple: there is a pragmatic way to save hundreds of
debugging hours for users of Python. People are going to find it necessary to
do such things from time to time, so instead of telling them to redesign their
programs, it is better to give them a safer tool.

Taking the glog approach in `logging` has no cost to the standard library, but
it does have real world benefits.

Please don't block shipping such a fix.

___
Python tracker

___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork


Connor Wolf added the comment:

Arrrgh, s/threading/multiprocessing/g in my last message.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork


Connor Wolf added the comment:

> IMNSHO, working on "fixes" for this issue while ignoring the larger 
> application design flaw elephant in the room doesn't make a lot of sense.

I understand the desire for a canonically "correct" fix, but it seems the issue 
with fixing it "correctly" has lead to the /actual/ implementation being broken 
for at least 6 years now.

As it is, my options are:
A. Rewrite the many, many libraries I use that internally spawn threads.
B. Not use multiprocessing.

(A) is prohibitive from a time perspective (I don't even know how many 
libraries I'd have to rewrite!), and (B) means I'd get 1/24-th of my VMs 
performance, so it's somewhat prohibitive.

At the moment, I've thrown together a horrible, horrible fix where I reach into 
the logging library (which is where I'm seeing deadlocks), and manually iterate 
over all attached log managers, resetting the locks in each immediately when 
each process spawns. 
This is, I think it can be agreed, a horrible, horrible hack, but in my 
particular case it works (the worst case result is garbled console output for a 
line or two). 

---

If a canonical fix is not possible, at least add a facility to the threading 
fork() call that lets the user decide what to do. In my case, my program is 
wedging in the logging system, and I am entirely OK with having transiently 
garbled logs, if it means I don't wind up deadlocking and having to force kill 
the interpreter (which is, I think, far /more/ destructive an action).

If I could basically do `multiprocessing.Process(*args, *kwargs, 
_clear_locks=True)`, that would be entirely sufficient, and not change existing 
behaviour at all.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2016-07-08 Thread Gregory P. Smith


Gregory P. Smith added the comment:

For me the momentum on fixing these things has stalled because I no longer work 
on any code that runs into this.  There is a fundamental problem: You cannot 
safely use threading and os.fork() in the same application per POSIX rules.  So 
even if the standard library and interpreter to tried to force its locks into 
some sort of consistent state post os.fork(), the much more fundamental POSIX 
problem remains.

IMNSHO, working on "fixes" for this issue while ignoring the larger application 
design flaw elephant in the room doesn't make a lot of sense.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork


Connor Wolf added the comment:

Is anything happening with these fixes? This is still an issue (I'm running 
into it now)?

--
nosy: +Connor.Wolf

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2016-03-24 Thread A. Jesse Jiryu Davis


Changes by A. Jesse Jiryu Davis :


--
nosy: +emptysquare

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2014-11-03 Thread Nir Soffer


Changes by Nir Soffer nir...@gmail.com:


--
nosy: +nirs

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2014-11-02 Thread Maries Ionel Cristian


Changes by Maries Ionel Cristian ionel...@gmail.com:


--
nosy: +ionel.mc

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2014-08-23 Thread Dan O'Reilly


Changes by Dan O'Reilly oreil...@gmail.com:


--
nosy: +dan.oreilly

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2014-06-30 Thread Tshepang Lekhonkhobe


Changes by Tshepang Lekhonkhobe tshep...@gmail.com:


--
title: Locks in python standard library should be sanitized on fork - Locks in 
the standard library should be sanitized on fork
versions: +Python 3.5 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in the standard library should be sanitized on fork

2014-06-30 Thread Tshepang Lekhonkhobe


Changes by Tshepang Lekhonkhobe tshep...@gmail.com:


--
nosy: +tshepang

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2014-04-22 Thread Forest Bond


Changes by Forest Bond for...@alittletooquiet.net:


--
nosy: +forest_atq

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-02 Thread Tomaž Šolc


Changes by Tomaž Šolc tomaz.s...@tablix.org:


--
nosy:  -avian

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-02 Thread Vinay Sajip


Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

  Use file locks in logging, whenever possible.

Logging doesn't just log to files, and moreover, also has locks to serialise 
access to internal data structures (nothing to do with files). Hence, using 
file locks in logging is not going to magically solve problems caused in 
threading+forking scenarios.

Apart from logging a commonly used part of the stdlib library which uses locks, 
I don't think this issue is to do with logging, specifically; logging uses 
locks in an unexceptional, conventional way, much as any other code might. 
Whatever solution is come up with for this thorny issue, it needs to be 
generic, in my view; otherwise we might just be papering over the cracks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-02 Thread Richard Oudkerk


Richard Oudkerk shibt...@gmail.com added the comment:

Lesha, the problems about magical __del__ methods you are worried about 
actually have nothing to do with threading and locks.  Even in a single 
threaded program using fork, exactly the same issues of potential corruption 
would be present because the object might be finalized at the same in multiple 
processes.

The idea that protecting the object with a thread lock will help you is 
seriously misguided UNLESS you also make sure you acquire them all before the 
fork -- and then you need to worry about the order in which you acquire all 
these locks.  There are much easier and more direct ways to deal with the issue 
than wrapping all objects with locks and trying to acquire them all before 
forking.

You could of course use multiprocessing.Lock() if you want a lock shared 
between processes.  But even then, depending on what the __del__ method does, 
it is likely that you will not want the object to be finalized in both 
processes.

However, the suggestion that locked-before-fork-locks should by default raise 
an error is reasonable enough.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-01 Thread Gregory P. Smith


Gregory P. Smith g...@krypto.org added the comment:

We could make any later attempt to acquire or release a lock that was 
reinitialized while it was held raise an exception.

Such exception raising behavior should be conditional at lock construction 
time; some code (such as logging) never wants to deal with one because it is 
unacceptable for it to ever fail or deadlock.  It should also be possible for 
the exception to be usefully handled if caught; locks should gain an API to 
clear their internal reinitialized while held flag.

Q: Should .release() raise the exception?  Or just warn?  I'm thinking no 
exception on release().  Releasing a lock that was held during 
re-initialization could just reset the reinitialized while held flag.

The acquire() that would deadlock or crash today would be where raising an 
exception makes the most sense.

Deadlocks are unacceptable.  The whole point of this bug is that we can do 
better.  An exception would provide a stack trace of exactly which thing where 
caused the offending operation so that the code can be fixed to not misuse 
locks in the first place or that specific lock can be changed to silent 
reinitialization on fork.  (or better yet, the fork can be removed entirely)

Both behaviors are better than today.  This change would surface bugs in 
people's code much better than difficult to debug deadlocks.

It should be a pretty straightforward change to reinit_locks_2 (Patch Set 6) to 
behave that way.

Looking back, raising an exception is pretty much what I suggested in 
http://bugs.python.org/issue6721#msg94115 2.5 years ago.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-01 Thread Richard Oudkerk


Richard Oudkerk shibt...@gmail.com added the comment:

  conn = MySQLConn()
  start_thread1(conn)
  start_thread2(conn):
  while True:
if os.fork() == 0:  # child
  raise Exception('doom')  # triggers destructor

There is no guarantee here that the lock will be held at the time of the fork.  
So even if we ensure that a lock acquired before the fork stayed lock, we won't 
necessarily get a deadlock.

More importantly, you should never fork without ensuring that you exit with 
os._exit() or os.exec*().  So your example should be something like

  conn = MySQLConn()
  start_thread1(conn)
  start_thread2(conn):
  while True:
if os.fork() == 0:  # child
  try:
raise Exception('doom')  # does NOT trigger destructor
  except:
sys.excepthook(*sys.exc_info())
os._exit(1)
  else:
os._exit(0)

With this hard exit the destructor never runs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-01 Thread Vinay Sajip


Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

 Re threading locks cannot be used to protect things outside of a
 single process:

 The Python standard library already violates this, in that the
 logging module uses such a lock to protect the file/socket/whatever,
 to which it is writing.

logging is not doing anything to protect things *outside* of a single process - 
the logging docs make that clear, and give specific recommendations for how to 
use logging in a multi-process scenario. Logging is just using locks to manage 
contention between multiple threads in a single process. In that sense, it is 
no different to any other Python code that uses locks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

I feel like I'm failing to get my thesis across. I'll write it out fully:

== Thesis start ==

Basic fact: It is an error to use threading locks in _any_ way after a
fork. I think we mostly agree on this. The programs we discussing are
**inherently buggy**.

We disagree on the right action when such a bug happens. I see 3 possibilities:

1) deadlock (the current behavior, if the lock was held in the parent at the 
time of fork)

2) continue to execute:
 a) as if nothing happened (the current behavior, if the lock was not
held in the parent)
 b) throw an Exception (equivalent to a, see below)

3) crash hard.

I think both 1 and 3 are tolerable, while 2 is **completely unsafe**
because the resulting behavior of the program is unexpected and unpredictable 
(data corruption, deletion, random actions, etc).

== Thesis end ==



I will now address Gregory's, Richard's, and Vinay's comments in view
of this thesis:



1) Gregory suggests throwing an exception when the locks are used in a
child. He also discusses some cases, in which he believes one could
safely continue execution.

My responses:

a) Throwing an exception is tatamount to continuing execution.

Imagine that the parent has a tempfile RAII object that erases the
file after the object disappears, or in some exception handler.

The destructor / handler will now get called in the child... and the
parent's tempfile is gone. Good luck tracking that one down.

b) In general, is not safe to continue execution on release(). If you
release() and reinitialize, the lock could still later be reused by
both parent and child, and there would still be contention leading to
data corruption.

c) Re: deadlocks are unacceptable...

A deadlock is better than data corruption. Whether you prefer a
deadlock or a crash depends on whether your system is set up to dump
core. You can always debug a deadlock with gdb. A crash without a core
dump is impossible to diagnose. However, a crash is harder to ignore,
and it lets the process recover. So, in my book, though I'm not 100%
certain: hard crash  deadlock  corruption

d) However, we can certainly do better than today:

i) Right now, we sometimes deadlock, and sometimes continue execution.
It would be better to deadlock always (or crash always), no matter how
the child uses the lock.

ii) We can log before deadlocking (this is hard in general, because
it's unclear where to log to), but it would immensely speed up
debugging.

iii) We can hard-crash with an extra-verbose stack dump (i.e. dump the lock 
details in addition to the stack)



2) Richard explains how my buggy snippets are buggy, and how to fix them.

I respond: Richard, thanks for explaining how to avoid these bugs!

Nonetheless, people make bugs all the time, especially in areas like
this. I made these bugs. I now know better, mostly, but I wouldn't bet on it.

We should choose the safest way to handle these bugs: deadlocking
always, or crashing always. Reinitializing the locks is going to cost
Python users a lot more in the long run. Deadlocking _sometimes_, as we do now, 
is equally bad. 

Also, even your code is potentially unsafe: when you execute the
excepthook in the child, you could be running custom exception logic,
or even a custom excepthook. Those could well-intentionedly, but
stupidly, destroy some of the parent's valuable data.



3) Vinay essentially says using logging after fork is user error. 

I respond: Yes, it is. In any other logging library, this error would only 
result in mangled log lines, but no lasting harm.

In Python, you sometimes get a deadlock, and other times, mangled lines.

 logging is not doing anything to protect things *outside* of a single process 

A file is very much outside a single process. If you are logging to a file, the 
only correct way is to use a file lock. Thus, I stand by my assertion that 
logging is buggy.

Windows programs generally have no problems with this. fork() on UNIX gives you 
both the rope and the gallows to hang yourself.

Specifically for logging, I think reasonable options include:

a) [The Right Way (TM)] Using a file lock + CLOEXEC when available; this lets 
multiple processes cooperate safely.

b) It's okay to deadlock  log with an explanation of why the deadlock is 
happening.

c) It's okay to crash with a similar explanation.

d) It's pretty okay even to reinitialize logs, although mangled log lines do 
prevent automated parsing.



I really hope that my compact thesis can help us get closer to a consensus, 
instead of arguing about the details of specific bugs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

A slightly more ambitious solution than crashing / deadlocking always is to 
have Python automatically spawn a fork server whenever you start using 
threads.

Then, you would be able to have subprocess work cleanly, and not worry about 
any of this stuff.

I don't know if we want to take the perf hit on import threading though...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

Actually, we might be able to automatically spawn a safe fork server _only_ 
when people start mixing threading and subprocess.

I'm not totally sure if this would allow us to salvage multiprocessing as 
well...

The tricky bit is that we'd need to proxy into the fork server all the calls 
having to do with file descriptors / sockets that we would want to pass into 
the child processes.

That suggests to me that it'll be really hard to do this in a 
backwards-compatible way.

Given that subprocess is a pretty broken library, this might be a good time to 
replace it anyway.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-06-01 Thread Gregory P. Smith


Gregory P. Smith g...@krypto.org added the comment:

subprocess has nothing to do with this bug.  subprocess is safe as of Python 
3.2 (and the subprocess32 backport for 2.x).  Its preexec_fn argument is 
already documented as an unsafe legacy.  If you want to replace subprocess, go 
ahead, write something new and post it on pypi.  That is out of the scope of 
this issue.

Look at the original message I opened this bug with.  I *only* want to make the 
standard library use of locks not be a source of deadlocks as it is 
unacceptable for a standard library itself to force your code to adopt a 
threads only or a fork only programming style.  How we do that is irrelevant; I 
merely started the discussion with one suggestion.

Third party libraries are always free to hang their users however they see fit.

If you want to log something before deadlocking, writing directly to the 
stderr file descriptor is the best that can be done.  That is what exceptions 
that escape __del__ destructors do.

logging, http.cookiejar, _strptime  - all use locks that could be dealt with in 
a sane manner to avoid deadlocks after forking.

Queue, concurrent.futures  threading.Condition  - may not make sense to fix as 
these are pretty threading specific as is and should just carry the don't 
fork caveats in their documentation.


(My *real* preference would be to remove os.fork() from the standard library.  
Not going to happen.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

1) I'm totally in favor of making the standard library safe. For that purpose, 
I think we should do a combination of:

a) Use file locks in logging, whenever possible.

b) Introduce LockUnsafelyReinitializedAtFork, using a generation counter, or 
whatever else, which can be used by the few places in the standard library that 
can safely deal with lock reinitialization.

2) http://docs.python.org/library/subprocess.html#module-subprocess does not 
actually document that preexec_fn is unsafe and in need of deprecation. New 
users will continue to shoot themselves in the foot.

3) I think that in addition to making the standard library safe, all other 
locks need to be made sane (crash or deadlock), so that we at least always 
avoid the option 2) continue to execute the child despite it relying on an 
unsafe lock.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-31 Thread Richard Oudkerk


Richard Oudkerk shibt...@gmail.com added the comment:

Attached is an updated version of Charles-François's reinit_locks.diff.

Changes:

* Handles RLock by assuming that if self-count != 0 when we acquire
  the lock, then the lock must have been reinitialized by 
PyThread_ReInitLocks().

* Applies existing fork tests for Lock to RLock.

* Fixes capitalization issues with 
PyThread_ReInitLocks()/PyThread_ReinitLocks().

* Defines PyThread_ReInitLocks() to be empty on non-pthread platforms.

Note that RLock._is_owned() is unreliable after a fork until RLock.acquire() 
has been called.

Also, no synchronization has been added for the list of locks.  Are 
PyThread_allocate_lock() and PyThread_free_lock() supposed to be safe to call 
while not holding the GIL?

--
Added file: http://bugs.python.org/file25776/reinit_locks_2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

I am really alarmed by the reinit_locks patches.

I scanned the comment history, and looked at the patch. I may have missed 
something, but it looks to me like the basic behavior is this:

After fork(), all locks are replaced by brand-new lock objects that are NOT 
locked.

*Grim Prediction*: This is going to cause some disastrous, unexpected, and 
hilarious data loss or corruption for somebody.

Here is why:

  class MySQLConn:

def __init__(self):
  self.lock = Lock()

def doWork(self):
  self.lock.acquire()
  # do a sequence of DB operations that must not be interrupted,
  # and cannot be made transactional.
  self.lock.release()

Run this in a thread:

  def thread1(conn):
while True:
  conn.doWork()
  time.sleep(0.053)

Run this in another thread:

  def thread2(conn):
while True:
  conn.doWork()
  time.sleep(0.071)

Run this in a third thread:

  def thread2():
while True:
  subprocess.call([ls, -l])
  time.sleep(0.3)

With reinit_locks(), this will eventually break horribly. 

a) fork() is called with the DB lock held by thread1.
b) Some time passes before the child gets to exec().
c) In that time, the child's thread2 gets to doWork(). 
d) Simultaneously, the parent's doWork is still running and holding a lock.
e) Thanks to reinit_locks, the child's thread2 does not have a lock, and it 
will merrily proceed to corrupt the parent's work.

So I think this approach is basically doomed.

I think my approach of marking _some_ locks as safe to reinit upon fork is 
workable (i.e. to solve the bad interaction with logging or import). 

However, there's also an orthogonal approach that might work well:

1) Right before the first thread gets created in any Python program, fork off a 
fork() server. 

From then on, subprocess will only use the fork server to call commands.

Thoughts?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-31 Thread Richard Oudkerk


Richard Oudkerk shibt...@gmail.com added the comment:

 a) fork() is called with the DB lock held by thread1.
 b) Some time passes before the child gets to exec().
 c) In that time, the child's thread2 gets to doWork(). 
 d) Simultaneously, the parent's doWork is still running and holding a lock.
 e) Thanks to reinit_locks, the child's thread2 does not have a lock, and 
 it will merrily proceed to corrupt the parent's work.

You seem to be saying that all three threads survive the fork.

I think forkall() on Solaris acts like that, but the normal fork() function 
does not.  Only the thread which performs fork() will survive in the child 
process.

So doWork() never runs in the child process, and the lock is never used in the 
child process.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

 I think forkall() on Solaris acts like that, but the normal fork() 
 function does not.  Only the thread which performs fork() will survive 
 in the child process.

Sorry, brain fail. A slightly more contrived failure case is this:

subprocess.Popen(
  ..., 
  preexec_fn=lambda: conn.doWork()
)

Everything else is the same.

Another failure case is:

  class MySQLConn:
... doWork as before ...

def __del__(self):
  self.doWork()

Followed by:

  def thread3(conn):
while True:
  subprocess.call(['nonexistent_program']) 
  time.sleep(0.1)

The destructor will fire in the child and corrupt the parent's data.

An analogous example is:

  conn = MySQLConn()
  start_thread1(conn)
  start_thread2(conn):
  while True:
if os.fork() == 0:  # child
  raise Exception('doom')  # triggers destructor

Basically, it is really really dangerous to release locks that protect any 
resources that are not copied by fork (i.e. network resources, files, DB 
connections, etc, etc).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-31 Thread Gregory P. Smith


Gregory P. Smith g...@krypto.org added the comment:

Anyone using a preexec function in subprocess has already declared that they 
like deadlocks so that isn't an issue. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

Deadlocks are dandy, but corruption is cruel.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-31 Thread Gregory P. Smith


Gregory P. Smith g...@krypto.org added the comment:

threading locks cannot be used to protect things outside of a single process.  
Any code using them to do that is broken.

In your examples you are suggesting a class that wants to do one or more mysql 
actions within a destructor and worried that the __del__ method would be called 
in the fork()'ed child process.

With the subprocess module, this will never happen.  the child exec's or does a 
hard exit.   
http://hg.python.org/cpython/file/bd2c2def77a7/Modules/_posixsubprocess.c#l634

When someone is using os.fork() directly, they are responsible for all 
destructors in their application behaving sanely within the child process.

Destructors are an evil place to put code that does actual work and are best 
avoided.  When required, they must be written defensively because they really 
cannot depend on much of the Python execution environment around them being in 
a functional state as they have no control over _when_ they will be called 
during shutdown.  Nothing new here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


lesha pybug.20.le...@xoxy.net added the comment:

Re threading locks cannot be used to protect things outside of a single 
process:

The Python standard library already violates this, in that the logging module 
uses such a lock to protect the file/socket/whatever, to which it is writing.

If I had a magic wand that could fix all the places in the world where people 
do this, I'd accept your argument.

In practice, threading locks are abused in this way all the time.

Most people don't even think about the interaction of fork and threads until 
they hit a bug of this nature.


Right now, we are discussing a patch that will take broken code, and instead of 
having it deadlock, make it actually destroy data. 

I think this is a bad idea. That is all I am arguing.

I am glad that my processes deadlocked instead of corrupting files. A deadlock 
is easier to diagnose.


You are right: subprocess does do a hard exit on exceptions. However, the 
preexec_fn and os.fork() cases definitely happen in the wild. I've done both of 
these before.


I'm arguing for a simple thing: let's not increase the price of error. A 
deadlock sucks, but corrupted data sucks much worse -- it's both harder to 
debug, and harder to fix.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-30 Thread Richard Oudkerk


Richard Oudkerk shibt...@gmail.com added the comment:

  Is there any particular reason not to merge Charles-François's 
  reinit_locks.diff?
  
  Reinitialising all locks to unlocked after a fork seems the only sane 
  option.

 I agree with this. 
 I haven't looked at the patch very closely. I think perhaps each lock
 could have an optional callback for specific code to be run after
 forking, but that may come in another patch.
 (this would allow to make e.g. the C RLock fork-safe)

An alternative way of handling RLock.acquire() would be to always start by 
trying a non-blocking acquire while holding the GIL: if this succeeds and 
self-rlock_count != 0 then we can assume that the lock was cleared by 
PyThread_ReinitLocks().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-23 Thread Richard Oudkerk


Richard Oudkerk shibt...@gmail.com added the comment:

 (1) Good catch. I suspect that this could be mitigated even if we cared 
 about LinuxThreads. I haven't looked, but there's got to be a way to 
 determine if we are a thread or a fork child.

Using a generation count would probably work just as well as the PID: main 
process has generation 0, children have generation 1, grandchildren have 
generation 2, ...

 (2) I think I didn't explain my idea very well. I don't mean that we 
 should release *all* locks on fork. That will end in disaster, as 
 Charles-François amply explained.

So what are you suggesting?  That a lock of the default type should raise
an error if you try to acquire it when it has been acquired in a previous
process?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-23 Thread lesha


lesha pybug.20.le...@xoxy.net added the comment:

 So what are you suggesting?  That a lock of the default type should
 raise an error if you try to acquire it when it has been acquired in a 
 previous process?

I was suggesting a way to make 'logging' fork-safe. No more, no less.

Does what my previous comment make sense in light of this?

 Using a generation count

Sure, that's a good idea.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-22 Thread lesha


lesha pybug.20.le...@xoxy.net added the comment:

This is a reply to: http://bugs.python.org/issue6721#msg151266

Charles-François raises two problems:

1) LinuxThreads have different PIDs per thread. He then points out that 
LinuxThreads have long been deprecated.

2) you cannot release locks on fork() because that might let the forked process 
access protected resources.

My replies:

(1) Good catch. I suspect that this could be mitigated even if we cared about 
LinuxThreads. I haven't looked, but there's got to be a way to determine if we 
are a thread or a fork child.

(2) I think I didn't explain my idea very well. I don't mean that we should 
release *all* locks on fork. That will end in disaster, as Charles-François 
amply explained.

All I meant is that we could introduce a special lock class ForkClearedRLock 
that self-releases on fork(). We could even use Charles-François's reinit magic 
for this.

Using ForkClearedRLock in logging would prevent deadlocks. The only potential 
harm that would come from this is that your logfile might get pretty ugly, i.e. 
the fork parent and child might be printing simultaneously, resulting in logs 
like:

Error: parentparentparError: childchildchildchildchild
entparentparent

It's not great, but it's definitely better than deadlocking.

I don't think logging can do anything more sensible across fork() anyway.

Did this explanation make more sense?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-17 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

Should we go forward on this?

--
assignee: gregory.p.smith - 
stage: test needed - patch review
type: behavior - enhancement
versions:  -Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-05-17 Thread Gregory P. Smith


Gregory P. Smith g...@krypto.org added the comment:

going forward with reinit_locks.diff makes sense.

I've added comments to it in the code review link.  It is Patch Set 3

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-02-13 Thread Vinay Sajip


Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


--
nosy: +vinay.sajip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-23 Thread sbt


sbt shibt...@gmail.com added the comment:

Attached is a patch (without documentation) which creates an atfork module for 
Unix.

Apart from the atfork() function modelled on pthread_atfork() there is also a 
get_fork_lock() function.  This returns a recursive lock which is held whenever 
a child process is created using os.fork(), subprocess.Popen() or 
multiprocessing.Process().  It can be used like

with atfork.get_fork_lock():
r, w = os.pipe()
pid = os.fork()
if pid == 0:
try:
os.close(r)
# do something with w
finally:
os._exit(0)
else:
os.close(w)

# do something with r

This prevents processes forked by other threads from accidentally inheriting 
the writable end (which would potentially cause EOF to be delayed when reading 
from the pipe).  It can also be used to eliminate the potential race where you 
create an fd and then set the CLOEXEC flag on it.

The patch modifies Popen() and Process.start() to acquire the lock when they 
create their pipes.  (A race condition previously made Process.sentinel and 
Process.join() potentially unreliable in a multithreaded program.)

Note that using the deprecated os.popen?() and os.spawn?() functions can still 
cause accidental inheritance of fds.

(I have also done a hopefully complete patch to multiprocessing to optionally 
allow fork+exec on Unix -- see Issue 8713.)

--
Added file: http://bugs.python.org/file24303/atfork.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-23 Thread sbt


sbt shibt...@gmail.com added the comment:

Is there any particular reason not to merge Charles-François's 
reinit_locks.diff?

Reinitialising all locks to unlocked after a fork seems the only sane option.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-23 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

 Is there any particular reason not to merge Charles-François's 
 reinit_locks.diff?
 
 Reinitialising all locks to unlocked after a fork seems the only sane option.

I agree with this. 
I haven't looked at the patch very closely. I think perhaps each lock
could have an optional callback for specific code to be run after
forking, but that may come in another patch.
(this would allow to make e.g. the C RLock fork-safe)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-14 Thread Jesús Cea Avión


Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-14 Thread Charles-François Natali


Charles-François Natali neolo...@free.fr added the comment:

 However, extending RLock to provide ForkClearedRLock (this would be used by 
 logging, i.e.) is quite straighforward.

 The extended class would simply need to record the process ID, in which the 
 lock was created, and the process ID, which is trying to acquire it.  Done!

There are at least two problems with this approach.
- with LinuxThreads, threads have different PIDs, so this would break.
 LinuxThreads have now been replaced with NPTL, so this may not be a
showstopper, though
- however, the other problem is more serious: it has the exact same
problem as the lock reinitialization upon fork(): locks are used to
protect critical sections, to make sure that threads see a consistent
state: if you simply proceed and reset/acquire the lock when the
process is not the last one that owned it, the invariants protected by
the lock will be broken.
The proper solution is to setup handlers to be called upon fork, that
not only reset locks, but also internal state of objects they protect.
However, this is a dull and boring task, and would require patching
dozens of different places. It's been on my todo list for some time...
Another solution would be to remove the only place in the standard
library where a bare fork() is used, in multiprocessing (issue #8713).
Then, it's the user's problem if he calls fork() :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-14 Thread Gregory P. Smith


Gregory P. Smith g...@krypto.org added the comment:

A new lock type will NOT solve this.  It is ALWAYS okay to clear all 
thread/threading module locks after a fork.

They are and always have been process-local by definition so they are also by 
definition 100% invalid to any child process.

Anyone who has written code using them to lock an out-of-process resource has 
written code that is already broken today. Thread locks can't guard network 
resources.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2012-01-13 Thread lesha


lesha pybug.20.le...@xoxy.net added the comment:

Just wanted to say that I spent something like 8 hours debugging a subprocess + 
threading + logging deadlock on a real production system. 

I suspected one of my locks at first, but I couldn't find any. The post-fork 
code was very simple, and I didn't suspect that logging would be subject to the 
same issue.

The good news that I see a very clean solution for fixing this.

We can't free all locks across fork -- that is unsafe and mad, because the 
child might end up corrupting some shared (network) resource, for example/

However, extending RLock to provide ForkClearedRLock (this would be used by 
logging, i.e.) is quite straighforward.

The extended class would simply need to record the process ID, in which the 
lock was created, and the process ID, which is trying to acquire it.  Done!

--
nosy: +lesha

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-08-31 Thread Nir Aides


Nir Aides n...@winpdb.org added the comment:

For the record, turns out there was a bit of misunderstanding. 

I used the term deprecate above to mean warn users (through documentation) 
that they should not use (a feature) and not in its Python-dev sense of 
remove (a feature) after a period of warning.

I do not think the possibility to mix threading and multiprocessing together 
should be somehow forcibly disabled. 

Anyway, since my view does not seem to resonate with core developers I I'll 
give it a rest for now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-08-31 Thread Charles-François Natali


Charles-François Natali neolo...@free.fr added the comment:

 Anyway, since my view does not seem to resonate with core developers I I'll
 give it a rest for now.

Well, the problem is that many views have been expressed in this
thread, which doesn't help getting a clear picture of what's needed to
make progress on this issue.
AFAIC, I think the following seems reasonable:
1) add an atfork module which provides a generic and
pthread_atfork-like mechanism to setup handlers that must be called
after fork (right now several modules use their own ad-hoc mechanism)
2) for multiprocessing, call exec() after fork() (issue #8713)
3) for buffered file objects locks, use the approach similar to the
patch I posted (reinit locks in the child process right after fork())

Does that sound reasonable?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-08-29 Thread sbt


sbt shibt...@gmail.com added the comment:

multiprocessing.util already has register_after_fork() which it uses for 
cleaning up certain things when a new process (launched by multiprocessing) is 
starting.  This is very similar to the proposed atfork mechanism.

Multiprocessing assumes that it is always safe to delete lock objects.  If 
reinit_locks.diff is committed then I guess this won't be a problem.

I will try to go through multiprocessing's use of threads: 

Queue
-

Queue's have a feeder thread which pushes objects in to the underlying pipe as 
soon as possible.  The state which can be modified by this thread is a 
threading.Condition object and a collections.deque buffer.  Both of these are 
replaced by fresh copies by the after-fork mechanism.

However, because objects in the buffer may have __del__ methods or weakref 
callbacks associated, arbitrary code may be run by the background thread if the 
reference count falls to zero.

Simply pickling the argument of put() before adding it to the buffer fixes that 
problem -- see the patch for Issue 10886.  With this patch I think Queue's use 
of threads is fork-safe.

Pool


If a fork occurs while a pool is running then a forked process will get a copy 
of the pool object in an inconsistent state -- but that does not matter since 
trying to use a pool from a forked process will *never* work.

Also, some of a pool's methods support callbacks which can execute arbitrary 
code in a background thread.  This can create inconsistent state in a forked 
process

As with Queue.put, pool methods should pickle immediately for similar reasons.

I would suggest documenting clearly that a pool should only ever be used or 
deleted by the process which created it.  We can use register_after_fork to 
make all of a pool's methods raise an error after a fork.  

We should also document that callbacks should only be used if no more processes 
will be forked.

allow_connection_pickling
-

Currently multiprocessing.allow_connection_pickling() does not work because 
types are registered with ForkingPickler instead of copyreg -- see Issue 4892.  
However, the code in multiprocessing.reduction uses a background thread to 
support the transfer of sockets/connections between processes.

If this code is ever resurrected I think the use of register_after_fork makes 
this safe.

Managers


A manager uses a threaded server process.  This is not a problem unless you 
create a user defined manager which forks new processes.   The documentation 
should just say Don't Do That.


I think multiprocessing's threading issues are all fixable.

--
nosy: +sbt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-28 Thread Nir Aides


Nir Aides n...@winpdb.org added the comment:

Hi Gregory,

 Gregory P. Smith g...@krypto.org added the comment:
 No Python thread is ever fork safe because the Python interpreter itself can 
 never be made fork safe. 
 Nor should anyone try to make the interpreter itself safe. It is too complex 
 and effectively impossible to guarantee.

a) I think the term Guarantee is not meaningful here since the interpreter is 
probably too complex to guarantee it does not contain other serious problems.
b) If no Python thread is ever fork safe, can you illustrate how a trivial 
Python thread spinning endlessly might deadlock a child forked by another 
Python thread?

I was not able to find reports of deadlocks clearly related to multiprocessing 
worker threads so they could be practically safe already, to the point other 
Python-Dev developers would be inclined to bury this as a theoretical problem :)

Anyway, there exists at least the problem of forking from the pool worker 
thread and possibly other issues, so the code should be reviewed.
Another latent problem is multiprocessing logging which is disabled by default?


 There is no general solution to this, fork and threading is simply broken in 
 POSIX and no amount of duct tape outside of the OS kernel can fix it. 

This is why we should sanitize the multithreading module and deprecate mixing 
of threading and multiprocessing. 
I bet most developers using Python are not even aware of this problem. 
We should make sure they are through documentation.

Here is another way to look at the current situation:

1) Don't use threading for concurrency because of the GIL.
2) Don't mix threading with multiprocessing because threading and forking don't 
mix well.
3) Don't use multiprocessing because it can deadlock.

We should make sure developers are aware of (2) and can use (3) safely***.


 My only desire is that we attempt to do the right thing when possible with 
 the locks we know about within the standard library.

Right, with an atfork() mechanism.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

If Nir's analysis is right, and Antoines comment pushes me into
this direction, (i personally have not looked at that code),
then multiprocessing is completely brain-damaged and has been
implemented by a moron.
And yes, I know this is a bug tracker, and even that of Python.

Nir should merge his last two messages into a single mail to
python-dev, and those guys should give Nir or Thomas or a group of
persons who have time and mental power a hg(1) repo clone and
committer access to that and multiprocessing should be rewritten,
maybe even from scratch, but i dunno.

For the extremely unlikely case that all that doesn't happen maybe
the patch of neologix should make it?

--Steffen
Ciao, sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

Um, and just to add: i'm not watching out for anything, and it won't
and it can't be me:

?0%0[steffen@sherwood sys]$ grep -F smp CHANGELOG.svn -B3 | grep -E 
'^r[[:digit:]]+' | tail -n 1
r162 | steffen | 2006-01-18 18:29:58 +0100 (Wed, 18 Jan 2006) | 35 lines

--Steffen
Ciao, sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-19 Thread Nir Aides


Nir Aides n...@winpdb.org added the comment:

 then multiprocessing is completely brain-damaged and has been
 implemented by a moron.

Please do not use this kind of language. 
Being disrespectful to other people hurts the discussion.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

P.S.:
I have to apologize, it's Tomaž, not Thomas.
(And unless i'm mistaken this is pronounced TomAsch rather than
the english Tommes, so i was just plain wrong.)

--Steffen
Ciao, sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork