Another great post, Glenn!! Very well laid-out and posed!! Thanks for taking the time to lay all that out.
> > Questions for Andy: is the type of work you want to do in independent > threads mostly pure Python? Or with libraries that you can control to > some extent? Are those libraries reentrant? Could they be made > reentrant? How much of the Python standard library would need to be > available in reentrant mode to provide useful functionality for those > threads? I think you want PyC > I think you've defined everything perfectly, and you're you're of course correct about my love for for the PyC model. :^) Like any software that's meant to be used without restrictions, our code and frameworks always use a context object pattern so that there's never and non-const global/shared data). I would go as far to say that this is the case with more performance-oriented software than you may think since it's usually a given for us to have to be parallel friendly in as many ways as possible. Perhaps Patrick can back me up there. As to what modules are "essential"... As you point out, once reentrant module implementations caught on in PyC or hybrid world, I think we'd start to see real effort to whip them into compliance-- there's just so much to be gained imho. But to answer the question, there's the obvious ones (operator, math, etc), string/buffer processing (string, re), C bridge stuff (struct, array), and OS basics (time, file system, etc). Nice-to-haves would be buffer and image decompression (zlib, libpng, etc), crypto modules, and xml. As far as I can imagine, I have to believe all of these modules already contain little, if any, global data, so I have to believe they'd be super easy to make "PyC happy". Patrick, what would you see you guys using? > > That's the rub... In our case, we're doing image and video > > manipulation--stuff not good to be messaging from address space to > > address space. The same argument holds for numerical processing with > > large data sets. The workers handing back huge data sets via > > messaging isn't very attractive. > > In the module multiprocessing environment could you not use shared > memory, then, for the large shared data items? > As I understand things, the multiprocessing puts stuff in a child process (i.e. a separate address space), so the only to get stuff to/ from it is via IPC, which can include a shared/mapped memory region. Unfortunately, a shared address region doesn't work when you have large and opaque objects (e.g. a rendered CoreVideo movie in the QuickTime API or 300 megs of audio data that just went through a DSP). Then you've got the hit of serialization if you're got intricate data structures (that would normally would need to be serialized, such as a hashtable or something). Also, if I may speak for commercial developers out there who are just looking to get the job done without new code, it's usually always preferable to just a single high level sync object (for when the job is complete) than to start a child processes and use IPC. The former is just WAY less code, plain and simple. Andy -- http://mail.python.org/mailman/listinfo/python-list