Bruce Eckel wrote: > I'd like to restart this discussion; I didn't mean to put forth active > objects as "the" solution, only that it seems to be one of the better, > more OO solutions that I've seen so far. > > What I'd really like to figure out is the "pythonic" solution for > concurrency. Guido and I got as far as agreeing that it wasn't > threads.
I've pondered this problem. Python deals programmers a double whammy when it comes to threads: not only is threading unsafe like it is in other languages, but the GIL also prevents you from using multiple processors. Thus there's more pressure to improve concurrency in Python than there is elsewhere. I like to use fork(), but fork has its own set of surprises. In particular, in the programmer's view, forking creates a disassociated copy of every object except files. Also, there's no Pythonic way for the two processes to communicate once the child has started. It's tempting to create a library around fork() that solves the communication problem, but the copied objects are still a major source of bugs. Imagine what would happen if you forked a Zope process with an open ZODB. If both the parent and child change ZODB objects, ZODB is likely to corrupt itself, since the processes share file descriptors. Thus forking can just as dangerous as threading. Therefore, I think a better Python concurrency model would be a lot like the subprocess module, but designed for calling Python code. I can already think of several ways I would use such a module. Something like the following would solve problems I've encountered with threads, forking, and the subprocess module: import pyprocess proc = pyprocess.start('mypackage.mymodule', 'myfunc', arg1, arg2=5) while proc.running(): # do something else res = proc.result() This code doesn't specify whether the subprocess should continue to exist after the function completes (or throws an exception). I can think of two ways to deal with that: 1) Provide two APIs. The first API stops the subprocess upon function completion. The second API allows the parent to call other functions in the subprocess, but never more than one function at a time. 2) Always leave subprocesses running, but use a 'with' statement to guarantee the subprocess will be closed quickly. I prefer this option. I think my suggestion fits most of your objectives. > 1) It works by default, so that novices can use it without falling > into the deep well of threading. That is, a program that you write > using threading is broken by default, and the tool you have to fix it > is "inspection." I want something that allows me to say "this is a > task. Go." and have it work without the python programmer having to > study and understand several tomes on the subject. Done, IMHO. > 2) Tasks can be automatically distributed among processors, so it > solves the problems of (a) making python run faster (b) how to utilize > multiprocessor systems. Done. The OS automatically maps subprocesses to other processors. > 3) Tasks are cheap enough that I can make thousands of them, to solve > modeling problems (in which I also lump games). This is really a > solution to a cerain type of program complexity -- if I can just > assign a task to each logical modeling unit, it makes such a system > much easier to program. Perhaps the suggested module should have a queue-oriented API. Usage would look like this: import pyprocess queue = pyprocess.ProcessQueue(max_processes=4) task = queue.put('mypackage.mymodule', 'myfunc', arg1, arg2=5) Then, you can create as many tasks as you like; parallelism will be limited to 4 concurrent tasks. A variation of ProcessQueue might manage the concurrency limit automatically. > 4) Tasks are "self-guarding," so they prevent other tasks from > interfering with them. The only way tasks can communicate with each > other is through some kind of formal mechanism (something queue-ish, > I'd imagine). Done. Subprocesses have their own Python namespace. Subprocesses receive messages through function calls and send messages by returning from functions. > 5) Deadlock is prevented by default. I suspect livelock could still > happen; I don't know if it's possible to eliminate that. No locking is done at all. (That makes me uneasy, though; have I just moved locking problems to the application developer?) > 6) It's natural to make an object that is actor-ish. That is, this > concurrency approach works intuitively with objects. Anything pickleable is legal. > 7) Complexity should be eliminated as much as possible. If it requires > greater limitations on what you can do in exchange for a clear, > simple, and safe programming model, that sounds pythonic to me. The > way I see it, if we can't easily use tasks without getting into > trouble, people won't use them. But if we have a model that allows > people to (for example) make easy use of multiple processors, they > will use that approach and the (possible) extra overhead that you pay > for the simplicity will be absorbed by the extra CPUs. I think the solution is very simple. > 8) It should not exclude the possibility of mobile tasks/active > objects, ideally with something relatively straightforward such as > Linda-style tuple spaces. The proposed module could serve as a guide for a very similar module that sends tasks to other machines. Shane _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com