Re: 2.6, 3.0, and truly independent intepreters
On Nov 5, 5:09 pm, Paul Boddie [EMAIL PROTECTED] wrote: Anyway, to keep things constructive, I should ask (again) whether you looked at tinypy [1] and whether that might possibly satisfy your embedded requirements. Actually, I'm starting to get into the tinypy codebase and have been talking in detail with the leads for that project (I just branched it, in fact). TP indeed has all the right ingredients for a CPython ES API, so I'm currently working on a first draft. Interestingly, the TP VM is largely based on Lua's implementation and stresses compactness. One challenge is that it's design may be overly compact, making it a little tricky to extend and maintain (but I anticipate things will improve as we rev it). When I have a draft of this CPythonES API, I plan to post here for everyone to look at and give feedback on. The only thing that sucks is that I have a lot of other commitments right now, so I can't spend the time on this that I'd like to. Once we have that API finalized, I'll be able to start offering some bounties for filling in some of its implementation. In any case, I look forward to updating folks here on our progress! Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 6, 8:25 am, sturlamolden [EMAIL PROTECTED] wrote: On Nov 5, 8:44 pm, Andy O'Meara [EMAIL PROTECTED] wrote: In a few earlier posts, I went into details what's meant there: http://groups.google.com/group/comp.lang.python/browse_thread/thread/... All this says is: 1. The cost of serialization and deserialization is to large. 2. Complex data structures cannot be placed in shared memory. The first claim is unsubstantiated. It depends on how much and what you serialize. Right, but I'm telling you that it *is* substantial... Unfortunately, you can't serialize thousands of opaque OS objects (which undoubtably contain sub allocations and pointers) in a frame-based, performance centric-app. Please consider that others (such as myself) are not trying to be difficult here--turns out that we're actually professionals. Again, I'm not the type to compare credentials, but it would be nice if you considered that you aren't the final authority on real-time professional software development. The second claim is plain wrong. You can put anything you want in shared memory. The mapping address of the shared memory segment may vary, but it can be dealt with (basically use integers instead of pointers, and use the base address as offset.) I explained this in other posts: OS objects are opaque and their serialization has to be done via their APIs, which is never marketed as being fast *OR* cheap. I've gone into this many times and in many posts. Saying that it can't be done is silly before you have tried. Your attitude and unwillingless to look at the use cases listed myself and others in this thread shows that this discussion may not be a good use of your time. In any case, you haven't even acknowledged that a package can't wag the dog when it comes to app development--and that's the bottom line and root liability. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 6, 9:02 pm, sturlamolden [EMAIL PROTECTED] wrote: On Nov 7, 12:22 am, Walter Overby [EMAIL PROTECTED] wrote: I read Andy to stipulate that the pipe needs to transmit hundreds of megs of data and/or thousands of data structure instances. I doubt he'd be happy with memcpy either. My instinct is that contention for a lock could be the quicker option. If he needs to communicate that amount of data very often, he has a serious design problem. Hmmm... Your comment there seems to be an indicator that you don't have a lot of experience with real-time, performance-centric apps. Consider my previously listed examples of video rendering and programatic effects in real-time. You need to have a lot of stuff in threads being worked on, and as Walter described, using a signal rather than serialization is the clear choice. Or, consider Patrick's case where you have massive amounts of audio being run through a DSP-- it just doesn't make sense to serialize a intricate, high level object when you could otherwise just hand it off via a single sync step. Walter and Paul really get what's being said here, so that should be an indicator to take a step back for a moment and ease up a bit... C'mon, man--we're all on the same side here! :^) Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On 7 Nov, 03:02, sturlamolden [EMAIL PROTECTED] wrote: On Nov 7, 12:22 am, Walter Overby [EMAIL PROTECTED] wrote: I read Andy to stipulate that the pipe needs to transmit hundreds of megs of data and/or thousands of data structure instances. I doubt he'd be happy with memcpy either. My instinct is that contention for a lock could be the quicker option. If he needs to communicate that amount of data very often, he has a serious design problem. As far as I can tell, he wants to keep the data in one place and just pass a pointer around between execution contexts. The apparent issue with using shared memory segments for this is that he relies on existing components which have their own allocation preferences. So although you or I might choose shared memory if writing this stuff from scratch, he doesn't appear to have this option. The inquirer hasn't acknowledged my remarks about tinypy, but I know that if I were considering dropping $4 and/or 2-3 man-months, I'd at least have a look at what those people have done and whether there's any mileage in using it before starting a new, embeddable implementation of Python from scratch. Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 7, 11:46 am, Paul Boddie [EMAIL PROTECTED] wrote: As far as I can tell, he wants to keep the data in one place and just pass a pointer around between execution contexts. This would be the easiest solution if Python were designed to do this from the beginning. I have previously stated that I believe the lack of a context pointer in Python's C API is a design flaw, albeit one that is difficult to change. If the alternative is to rewrite the whole CPython interpreter, I would say it it easier to try a proxy object design instead (either using multiprocessing or an outproc ActiveX object). -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 4, 6:51 pm, Paul Boddie [EMAIL PROTECTED] wrote: The language features look a lot like what others have already been offering for a while: keywords for parallelised constructs (clik_for) which are employed by solutions for various languages (C# and various C ++ libraries spring immediately to mind); spawning and synchronisation are typically supported in existing Python solutions, although obviously not using language keywords. Yes, but there is not a 'concurrency platform' that takes care of things like load balancing and testing for race conditions. If you spawn with cilk++, the result is not that a new process or thread is spawned. The task is put in a queue (scheduled using work stealing), and executed by a pool of threads/processes. Multiprocessing makes it easy to write concurrent algorithms (as opposed to subprocess or popen), but automatic load balancing is something it does not do. It also does not identify and warn the programmer about race conditions. It does not have a barrier synchronization paradigm, but it can be constructed. java.util.concurrent.forkjoin is actually based on cilk. Something like cilk can easily be built on top of the multiprocessing module. Extra keywords can and should be avoided. But it is easier in Python than C. Keywords are used in cilk++ because they can be defined out by the preprocessor, thus restoring the original seqential code. In Python we can e.g. use a decorator instead. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 5, 8:44 pm, Andy O'Meara [EMAIL PROTECTED] wrote: In a few earlier posts, I went into details what's meant there: http://groups.google.com/group/comp.lang.python/browse_thread/thread/...http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b All this says is: 1. The cost of serialization and deserialization is to large. 2. Complex data structures cannot be placed in shared memory. The first claim is unsubstantiated. It depends on how much and what you serialize. If you use something like NumPy arrays, the cost of pickling is tiny. Erlang is a language specifically designed for concurrent programming, yet it does not allow anything to be shared. The second claim is plain wrong. You can put anything you want in shared memory. The mapping address of the shared memory segment may vary, but it can be dealt with (basically use integers instead of pointers, and use the base address as offset.) Pyro is a Python project that has investigated this. With Pyro you can put any Python object in a shared memory region. You can also use NumPy record arrays to put very complex data structures in shared memory. What do you gain by placing multiple interpreters in the same process? You will avoid the complication that the mapping address of the shared memory region may be different. But this is a problem that has been worked out and solved. Instead you get a lot of issues dealing with DLL loading and unloading (Python extension objects). The multiprocessing module has something called proxy objects, which also deals with this issue. An object is hosed in a server process, and client processes may access it through synchronized IPC calls. Inside the client process the remote object looks like any other Python object. The synchronized IPC is hidden away in an abstraction layer. In Windows, you can also construct outproc ActiveX objects, which are not that different from multiprocessing's proxy objects. If you need to place a complex object in shared memory: 1. Check if a NumPy record array may suffice (dtypes may be nested). It will if you don't have dynamically allocated pointers inside the data structure. 2. Consider using multiprocessing's proxy objects or outproc ActiveX objects. 3. Go to http://pyro.sourceforge.net, download the code and read the documentation. Saying that it can't be done is silly before you have tried. Programmers are not that good at guessing where the bottlenecks reside, even if we think we do. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Hi, I've been following this discussion, and although I'm not nearly the Python expert that others on this thread are, I think I understand Andy's point of view. His premises seem to include at least: 1. His Python code does not control the creation of the threads. That is done at the app level. 2. Perhaps more importantly, his Python code does not control the allocation of the data he needs to operate on. He's got, for example, an opaque OS object that is manipulated by CPU-intensive OS functions. sturlamolden suggests a few approaches: 1. Check if a NumPy record array may suffice (dtypes may be nested). It will if you don't have dynamically allocated pointers inside the data structure. I suspect that the OS is very likely to have dynamically allocated pointers inside their opaque structures. 2. Consider using multiprocessing's proxy objects or outproc ActiveX objects. I don't understand how this would help. If these large data structures reside only in one remote process, then the overhead of proxying the data into another process for manipulation requires too much IPC, or at least so Andy stipulates. 3. Go to http://pyro.sourceforge.net, download the code and read the documentation. I don't see how this solves the problem with 2. I admit I have only cursory knowledge, but I understand remoting approaches to have the same weakness. I understand Andy's problem to be that he needs to operate on a large amount of in-process data from several threads, and each thread mixes CPU-intensive C functions with callbacks to Python utility functions. He contends that, even though he releases the GIL in the CPU-bound C functions, the reacquisition of the GIL for the utility functions causes unacceptable contention slowdowns in the current implementation of CPython. After reading Martin's posts, I think I also understand his point of view. Is the time spent in these Python callbacks so large compared to the C functions that you really have to wait? If so, then Andy has crossed over into writing performance-critical code in Python. Andy proposes that the Python community could work on making that possible, but Martin cautions that it may be very hard to do so. If I understand them correctly, none of these concerns are silly. Walter. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 6, 6:05 pm, Walter Overby [EMAIL PROTECTED] wrote: I don't understand how this would help. If these large data structures reside only in one remote process, then the overhead of proxying the data into another process for manipulation requires too much IPC, or at least so Andy stipulates. Perhaps it will, or perhaps not. Reading or writing to a pipe has slightly more overhead than a memcpy. There are things that Python needs to do that are slower than the IPC. In this case, the real constraint would probably be contention for the object in the server, not the IPC. (And don't blame it on the GIL, because putting a lock around the object would not be any better.) 3. Go tohttp://pyro.sourceforge.net, download the code and read the documentation. I don't see how this solves the problem with 2. It puts Python objects in shared memory. Shared memory is the fastest form of IPC there is. The overhead is basically zero. The only constraint will be contention for the object. I understand Andy's problem to be that he needs to operate on a large amount of in-process data from several threads, and each thread mixes CPU-intensive C functions with callbacks to Python utility functions. He contends that, even though he releases the GIL in the CPU-bound C functions, the reacquisition of the GIL for the utility functions causes unacceptable contention slowdowns in the current implementation of CPython. Yes, callbacks to Python are expensive. But is the problem the GIL? Instead of contention for the GIL, he seems to prefer contention for a complex object. Is that any better? It too has to be protected by a lock. If I understand them correctly, none of these concerns are silly. No they are not. But I think he underestimates what multiple processes can do. The objects in 'multiprocessing' are already a lot faster than their 'threading' and 'Queue' counterparts. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 6, 2:03 pm, sturlamolden [EMAIL PROTECTED] wrote: On Nov 6, 6:05 pm, Walter Overby [EMAIL PROTECTED] wrote: I don't understand how this would help. If these large data structures reside only in one remote process, then the overhead of proxying the data into another process for manipulation requires too much IPC, or at least so Andy stipulates. Perhaps it will, or perhaps not. Reading or writing to a pipe has slightly more overhead than a memcpy. There are things that Python needs to do that are slower than the IPC. In this case, the real constraint would probably be contention for the object in the server, not the IPC. (And don't blame it on the GIL, because putting a lock around the object would not be any better.) (I'm not blaming anything on the GIL.) I read Andy to stipulate that the pipe needs to transmit hundreds of megs of data and/or thousands of data structure instances. I doubt he'd be happy with memcpy either. My instinct is that contention for a lock could be the quicker option. And don't forget, he says he's got an opaque OS object. He asked the group to explain how to send that via IPC to another process. I surely don't know how. 3. Go tohttp://pyro.sourceforge.net, download the code and read the documentation. I don't see how this solves the problem with 2. It puts Python objects in shared memory. Shared memory is the fastest form of IPC there is. The overhead is basically zero. The only constraint will be contention for the object. I don't think he has Python objects to work with. I'm persuaded when he says: when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Why aren't you persuaded? snip Yes, callbacks to Python are expensive. But is the problem the GIL? Instead of contention for the GIL, he seems to prefer contention for a complex object. Is that any better? It too has to be protected by a lock. At a couple points, Andy has expressed his preference for a single high level sync object to synchronize access to the data, at least that's my reading. What he doesn't seem to prefer is the slowdown arising from the Python callbacks acquiring the GIL. I think that would be an additional lock, and that's near the heart of Andy's concern, as I read him. If I understand them correctly, none of these concerns are silly. No they are not. But I think he underestimates what multiple processes can do. The objects in 'multiprocessing' are already a lot faster than their 'threading' and 'Queue' counterparts. Andy has complimented 'multiprocessing' as a huge huge step. He just offers a scenario where multiprocessing might not be the best solution, and so far, I see no evidence he is wrong. That's not underestimation, in my estimation! Walter. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 7, 12:22 am, Walter Overby [EMAIL PROTECTED] wrote: I read Andy to stipulate that the pipe needs to transmit hundreds of megs of data and/or thousands of data structure instances. I doubt he'd be happy with memcpy either. My instinct is that contention for a lock could be the quicker option. If he needs to communicate that amount of data very often, he has a serious design problem. A pipe can transmit hundreds of megs in a split second by the way. And don't forget, he says he's got an opaque OS object. He asked the group to explain how to send that via IPC to another process. I surely don't know how. This is a typical situation where one could use a proxy object. Let one server process own the opaque OS object, and multiple client processes access it via IPC calls to the server. I don't think he has Python objects to work with. I'm persuaded when he says: when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Why aren't you persuaded? I am persuaded that shared memory may be difficult in that particular case. I am not persuaded that multiple processes cannot be used, because one can let one server process own the object. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 4, 10:59 am, sturlamolden [EMAIL PROTECTED] wrote: On Nov 4, 4:27 pm, Andy O'Meara [EMAIL PROTECTED] wrote: People in the scientific and academic communities have to understand that the dynamics in commercial software are can be *very* different needs and have to show some open-mindedness there. You are beware that BDFL's employer is a company called Google? Python is not just used in academic settings. Turns out I have heard of Google (and how about you be a little more courteous). If you've read the posts in this thread, you'll note that the needs outlined in this thread are quite different than the needs and interests of Google. Note that my point was that python *could* and *should* be used more in end-user/desktop applications, but it can't wag the dog to use my earlier statement. Furthermore, I gave you a link to cilk++. This is a simple tool that allows you to parallelize existing C or C++ software using three small keywords. Sorry if it wasn't clear, but we need the features associated with an embedded interpreter. I checked out clik++ when you linked it and although it seems pretty cool, it's not a good fit for us for a number of reasons. Also, we like the idea of helping support a FOSS project rather than license a proprietary product (again, to be clear, using cilk isn't even appropriate for our situation). As other posts have gone into extensive detail, multiprocessing unfortunately don't handle the massive/complex data structures situation (see my posts regarding real-time video processing). That is something I don't believe. Why can't multiprocessing handle that? In a few earlier posts, I went into details what's meant there: http://groups.google.com/group/comp.lang.python/browse_thread/thread/9d995e4a1153a1b2/09aaca3d94ee7a04?lnk=st#09aaca3d94ee7a04 http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344 http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b For Christ sake, researchers write global climate models using MPI. And you think a toy problem like 'real-time video processing' is a show stopper for using multiple processes. I'm not sure why you're posting this sort of stuff when it seems like you haven't checked out earlier posts in the this thread. Also, you do yourself and the people here a disservice in the way that you're speaking to me here. You never know who you're really talking to or who's reading. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On 5 Nov, 20:44, Andy O'Meara [EMAIL PROTECTED] wrote: On Nov 4, 10:59 am, sturlamolden [EMAIL PROTECTED] wrote: For Christ sake, researchers write global climate models using MPI. And you think a toy problem like 'real-time video processing' is a show stopper for using multiple processes. I'm not sure why you're posting this sort of stuff when it seems like you haven't checked out earlier posts in the this thread. Also, you do yourself and the people here a disservice in the way that you're speaking to me here. You never know who you're really talking to or who's reading. I think your remarks about people in the scientific and academic communities went down the wrong way, giving (or perhaps reinforcing) the impression that such people live carefree lives and write software unconstrained by external factors. Anyway, to keep things constructive, I should ask (again) whether you looked at tinypy [1] and whether that might possibly satisfy your embedded requirements. As I noted before, the developers might share your outlook on a number of matters. Otherwise, you might peruse the list of Python implementations: http://wiki.python.org/moin/implementation Paul [1] http://www.tinypy.org/ -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 3, 7:11 pm, Andy O'Meara [EMAIL PROTECTED] wrote: My hope was that the increasing interest and value associated with flexible, multi-core/free-thread support is at a point where there's a critical mass of CPython developer interest (as indicated by various serious projects specifically meant to offer this support). Unfortunately, based on the posts in this thread, it's becoming clear that the scale of code changes, design changes, and testing that are necessary in order to offer this support is just too large unless the entire community is committed to the cause. I've been watching this debate from the side line. First let me say that there are several solutions to the multicore problem. Multiple independendent interpreters embedded in a process is one possibility, but not the only. Unwillingness to implement this in CPython does not imply unwillingness to exploit the next generation of processors. One thing that should be done, is to make sure the Python interpreter and standard libraries release the GIL wherever they can. The multiprocessing package has almost the same API as you would get from your suggestion, the only difference being that multiple processes is involved. This is however hidden from the user, and (almost) hidden from the programmer. Let see what multiprocessing can do: - Independent interpreters? Yes. - Shared memory? Yes. - Shared (proxy) objects? Yes. - Synchronization objects (locks, etc.)? Yes. - IPC? Yes. - Queues? Yes. - API different from threads? Not really. Here is one example of what the multiprocessing package can do, written by yours truly: http://scipy.org/Cookbook/KDTree Multicore programming is also more than using more than one thread or process. There is something called 'load balancing'. If you want to make efficient use of more than one core, not only must the serial algorithm be expressed as parallel, you must also take care to distribute the work evenly. Further, one should avoid as much resource contention as possible, and avoid races, deadlocks and livelocks. Java's concurrent package has sophisticated load balancers like the work-stealing scheduler in ForkJoin. Efficient multicore programming needs other abstractions than the 'thread' object (cf. what cilk++ is trying to do). It would certainly be possible to make Python do something similar. And whether threads or processes is responsible for the concurrency is not at all important. Today it it is easiest to achieve multicore concurrency on CPython using multiple processes. The most 'advanced' language for multicore programming today is Erlang. It uses a 'share-nothing' message-passing strategy. Python can do the same as Erlang using the Candygram package (candygram.sourceforege.net). Changing the Candygram package to use Multiprocessing instead of Python threads is not a major undertaking. The GIL is not evil by the way. SBCL also has a lock that protects the compiler. Ruby is getting a GIL. So all it comes down to is this: Why do you want multiple independent interpreters in a process, as opposed to multiple processes? Even if you did manage to embed multiple interpreters in a process, it would not give the programmer any benefit over the multiprocessing package. If you have multiple embedded interpreters, they cannot share anything. They must communicate serialized objects or use proxy objects. That is the same thing the multiprocessing package do. So why do you want this particular solution? S.M. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
If you are serious about multicore programming, take a look at: http://www.cilk.com/ Now if we could make Python do something like that, people would perhaps start to think about writing Python programs for more than one processor. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 4, 9:38 am, sturlamolden [EMAIL PROTECTED] wrote: First let me say that there are several solutions to the multicore problem. Multiple independendent interpreters embedded in a process is one possibility, but not the only.'' No one is disagrees there. However, motivation of this thread has been to make people here consider that it's much more preferable for CPython have has few restrictions as possible with how it's used. I think many people here assume that python is the showcase item in industrial and commercial use, but it's generally just one of many pieces of machinery that serve the app's function (so the tail can't wag the dog when it comes to app design). Some people in this thread have made comments such as make your app run in python or change your app requirements but in the world of production schedules and making sure payroll is met, those options just can't happen. People in the scientific and academic communities have to understand that the dynamics in commercial software are can be *very* different needs and have to show some open-mindedness there. The multiprocessing package has almost the same API as you would get from your suggestion, the only difference being that multiple processes is involved. As other posts have gone into extensive detail, multiprocessing unfortunately don't handle the massive/complex data structures situation (see my posts regarding real-time video processing). I'm not sure if you've followed all the discussion, but multiple processes is off the table (this is discussed at length, so just flip back into the thread history). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 4, 4:27 pm, Andy O'Meara [EMAIL PROTECTED] wrote: People in the scientific and academic communities have to understand that the dynamics in commercial software are can be *very* different needs and have to show some open-mindedness there. You are beware that BDFL's employer is a company called Google? Python is not just used in academic settings. Furthermore, I gave you a link to cilk++. This is a simple tool that allows you to parallelize existing C or C++ software using three small keywords. This is the kind of tool I believe would be useful. That is not an academic judgement. It makes it easy to take existing software and make it run efficiently on multicore processors. As other posts have gone into extensive detail, multiprocessing unfortunately don't handle the massive/complex data structures situation (see my posts regarding real-time video processing). That is something I don't believe. Why can't multiprocessing handle that? Is using a proxy object out of the question? Is putting the complex object in shared memory out of the question? Is having multiple copies of the object out of the question (did you see my kd- tree example)? Using multiple independent interpreters inside a process does not make this any easier. For Christ sake, researchers write global climate models using MPI. And you think a toy problem like 'real-time video processing' is a show stopper for using multiple processes. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On 4 Nov, 16:00, sturlamolden [EMAIL PROTECTED] wrote: If you are serious about multicore programming, take a look at: http://www.cilk.com/ Now if we could make Python do something like that, people would perhaps start to think about writing Python programs for more than one processor. The language features look a lot like what others have already been offering for a while: keywords for parallelised constructs (clik_for) which are employed by solutions for various languages (C# and various C ++ libraries spring immediately to mind); spawning and synchronisation are typically supported in existing Python solutions, although obviously not using language keywords. The more interesting aspects of the referenced technology seem to be hyperobjects which, as far as I can tell, are shared global objects, along with the way the work actually gets distributed and scheduled - something which would require slashing through the white paper aspects of the referenced site and actually reading the academic papers associated with the work. I've considered doing something like hyperobjects for a while, and this does fit in somewhat with recent discussions about shared memory and managing contention for that resource using the communications channels found in, amongst other solutions, the pprocess module. I currently have no real motivation to implement this myself, however. Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 30, 6:39 pm, Terry Reedy [EMAIL PROTECTED] wrote: Their professor is Lars Bak, the lead architect of the Google V8Javascriptengine. They spent some time working on V8 in the last couple months. then they will be at home with pyv8 - which is a combination of the pyjamas python-to-javascript compiler and google's v8 engine. in pyv8, thanks to v8 (and the judicious application of boost) it's possible to call out to external c-based modules. so not only do you get the benefits of the (much) faster execution speed of v8, along with its garbage collection, but also you still get access to external modules. so... their project's done, already! l. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 30, 11:09 pm, alex23 [EMAIL PROTECTED] wrote: On Oct 31, 2:05 am, Andy O'Meara [EMAIL PROTECTED] wrote: I don't follow you there. If you're referring to multiprocessing, our concerns are: - Maturity (am I willing to tell my partners and employees that I'm betting our future on a brand-new module that imposes significant restrictions as to how our app operates?) - Liability (am I ready to invest our resources into lots of new python module-specific code to find out that a platform that we want to target isn't supported or has problems?). Like it not, we're a company and we have to show sensitivity about new or fringe packages that make our codebase less agile -- C/C++ continues to win the day in that department. I don't follow this...wouldn't both of these concerns be even more true for modifying the CPython interpreter to provide the functionality you want? A great point, for sure. So, basically, the motivation and goal of this entire thread is to get an understanding for how enthusiastic/ interested the CPython dev community is at the concepts/enhancements under discussion and for all of us to better understand the root issues. So my response is basically that it was my intention to seek official/sanctioned development (and contribute developer direct support and compensation). My hope was that the increasing interest and value associated with flexible, multi-core/free-thread support is at a point where there's a critical mass of CPython developer interest (as indicated by various serious projects specifically meant to offer this support). Unfortunately, based on the posts in this thread, it's becoming clear that the scale of code changes, design changes, and testing that are necessary in order to offer this support is just too large unless the entire community is committed to the cause. Meanwhile, as many posts in the thread have pointed out, issues such as free threading and easy/clean/compartmentalized use of python are of rising importance to app developers shopping for an interpreter to embed. So unless/until CPython offers the flexibility some apps require as an embedded interpreter, we commercial guys are unfortunately forced to use alternatives to python. I just think it'd be huge win for everyone (app developers, the python dev community, and python proliferation in general) if python made its way into more commercial and industrial applications (in an embedded capacity). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Patrick Stinson wrote: Speaking of the big picture, is this how it normally works when someone says Here's some code and a problem and I'm willing to pay for a solution? In an open-source volunteer context, time is generally more valuable than money. Most people can't just drop part of their regular employment temporarily, so unless there's quite a *lot* of money being offered (enough to offer someone full-time employment, for example) it doesn't necessarily make any more man-hours available. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Okay, here's the bottom line: * This is not about the GIL. This is about *completely* isolated interpreters; most of the time when we want to remove the GIL we want a single interpreter with lots of shared data. * Your use case, although not common, is not extraordinarily rare either. It'd be nice to support. * If CPython had supported it all along we would continue to maintain it. * However, since it's not supported today, it's not worth the time invested, API incompatibility, and general breakage it would imply. * Although it's far more work than just solving your problem, if I were to remove the GIL I'd go all the way and allow shared objects. Great recap (although saying it's not about the GIL may cause some people lose track of the root issues here, but your following comment GIL removal shows that we're on the same page). So there's really only two options here: * get a short-term bodge that works, like hacking the 3rd party library to use your shared-memory allocator. Should be far less work than hacking all of CPython. The problem there is that we're not talking about a single 3rd party API/allocator--there's many, including the OS which has its own internal allocators. My video encoding example is meant to illustrate a point, but the real-world use case is where there's allocators all over the place from all kinds of APIs, and when you want your C module to reenter the interpreter often to execute python helper code. * invest yourself in solving the *entire* problem (GIL removal with shared python objects). Well, as I mentioned, I do represent a company willing an able to expend real resources here. However, as you pointed out, there's some serious work at hand here (sadly--it didn't have to be this way) and there seems to be some really polarized people here that don't seem as interested as I am to make python more attractive for app developers shopping for an interpreter to embed. From our point of view, there's two other options which unfortunately seem to be the only out the more we seem to uncover with this discussion: 3) Start a new python implementation, let's call it CPythonES, specifically targeting performance apps and uses an explicit object/ context concept to permit the free threading under discussion here. The idea would be to just implement the core language, feature set, and a handful of modules. I refer you to that list I made earlier of essential modules. 4) Drop python, switch to Lua. The interesting thing about (3) is that it'd be in the same spirit as how OpenGL ES came to be (except in place of the need for free threading was the fact the standard OpenGL API was too overgrown and painful for the embedded scale). We're currently our own in-house version of (3), but we unfortunately have other priorities at the moment that would otherwise slow this down. Given the direction of many-core machines these days, option (3) or (4), for us, isn't a question of *if*, it's a question of *when*. So that's basically where we're at right now. As to my earlier point about representing a company ready to spend real resources, please email me off-list if anyone here would have an interest in an open CPythonES project (and get full compensation). I can say for sure that we'd be able to lead with API framework design work--that's my personal strength and we have a lot of real world experience there. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/29/2008 3:45 PM, came the following characters from the keyboard of Patrick Stinson: If you are dealing with lots of data like in video or sound editing, you would just keep the data in shared memory and send the reference over IPC to the worker process. Otherwise, if you marshal and send you are looking at a temporary doubling of the memory footprint of your app because the data will be copied, and marshaling overhead. Right. Sounds, and is, easy, if the data is all directly allocated by the application. But when pieces are allocated by 3rd party libraries, that use the C-runtime allocator directly, then it becomes more difficult to keep everything in shared memory. One _could_ replace the C-runtime allocator, I suppose, but that could have some adverse effects on other code, that doesn't need its data to be in shared memory. So it is somewhat between a rock and a hard place. By avoiding shared memory, such problems are sidestepped... until you run smack into the GIL. If you do not have shared memory: You don't need threads, ergo: You don't get penalized by the GIL. Threads are only useful when you need to have that requirement of large in-memory data structures shared and modified by a pool of workers. -jesse -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: Because then we're back into the GIL not permitting threads efficient core use on CPU bound scripts running on other threads (when they otherwise could). Why do you think so? For C code that is carefully written, the GIL allows *very well* to write CPU bound scripts running on other threads. (please do get back to Jesse's original remark in case you have lost the thread :-) I don't follow you there. If you're referring to multiprocessing, our concerns are: - Maturity (am I willing to tell my partners and employees that I'm betting our future on a brand-new module that imposes significant restrictions as to how our app operates?) - Liability (am I ready to invest our resources into lots of new python module-specific code to find out that a platform that we want to target isn't supported or has problems?). Like it not, we're a company and we have to show sensitivity about new or fringe packages that make our codebase less agile -- C/C++ continues to win the day in that department. - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). It's turns out that this isn't an exotic case at all: there's a *ton* of utility gained by making calls back into the interpreter. The best example is that since code more easily maintained in python than in C, a lot of the module utility code is likely to be in python. You should really reconsider writing performance-critical code in Python. I don't follow you there... Performance-critical code in Python?? Suppose you're doing pixel-level filters on images or video, or Patrick needs to apply a DSP to some audio... Our app's performance would *tank*, in a MAJOR way (that, and/or background tasks would take 100x+ longer to do their work). Regardless of the issue under discussion, a lot of performance can be gained by using flattened data structures, less pointer, less reference counting, less objects, and so on - in the inner loops of the computation. You didn't reveal what *specific* computation you perform, so it's difficult to give specific advise. I tried to list some abbreviated examples in other posts, but here's some elaboration: - Pixel-level effects and filters, where some filters may use C procs while others may call back into the interpreter to execute logic -- while some do both, multiple times. - Image and video analysis/recognition where there's TONS of intricate data structures and logic. Those data structures and logic are easiest to develop and maintain in python, but you'll often want to call back to C procs which will, in turn, want to access Python (as well as C-level) data structures. The common pattern here is where there's a serious mix of C and python code and data structures, BUT it can all be done with a free-thread mentality since the finish point is unambiguous and distinct -- where all the results are handed back to the main app in a black and white handoff. It's *really* important for an app to freely make calls into its interpreter (or the interpreter's data structures) without having to perform lock/unlocking because that affords an app a *lot* of options and design paths. It's just not practical to be locking and locking the GIL when you want to operate on python data structures or call back into python. You seem to have placed the burden of proof on my shoulders for an app to deserve the ability to free-thread when using 3rd party packages, so how about we just agree it's not an unreasonable desire for a package (such as python) to support it and move on with the discussion. Again, if you do heavy-lifting in Python, you should consider to rewrite the performance-critical parts in C. You may find that the need for multiple CPUs goes even away. Well, the entire premise we're operating under here is that we're dealing with embarrassingly easy parallelization scenarios, so when you suggest that the need for multiple CPUs may go away, I'm worried that you're not keeping the big picture in mind. I appreciate your arguments these a PyC concept is a lot of work with some careful design work, but let's not kill the discussion just because of that. Any discussion in this newsgroup is futile, except when it either a) leads to a solution that is already possible, and the OP didn't envision, or b) is followed up by code contributions from one of the participants. If neither is likely to result, killing the discussion is the most productive thing we can do. Well, most others here seem to have a lot different definition of what qualifies as a futile discussion, so how about you allow the rest of us continue to discuss these issues and possible solutions. And, for the record, I've said multiple times I'm ready to
Re: 2.6, 3.0, and truly independent intepreters
On Thu, Oct 30, 2008 at 12:05 PM, Andy O'Meara [EMAIL PROTECTED] wrote: On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: Because then we're back into the GIL not permitting threads efficient core use on CPU bound scripts running on other threads (when they otherwise could). Why do you think so? For C code that is carefully written, the GIL allows *very well* to write CPU bound scripts running on other threads. (please do get back to Jesse's original remark in case you have lost the thread :-) I don't follow you there. If you're referring to multiprocessing, our concerns are: - Maturity (am I willing to tell my partners and employees that I'm betting our future on a brand-new module that imposes significant restrictions as to how our app operates?) - Liability (am I ready to invest our resources into lots of new python module-specific code to find out that a platform that we want to target isn't supported or has problems?). Like it not, we're a company and we have to show sensitivity about new or fringe packages that make our codebase less agile -- C/C++ continues to win the day in that department. - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). FWIW (and again, I am not saying MP is good for your problem domain) - multiprocessing works on windows, OS/X, Linux and Solaris quite well. The only platforms it has problems on right now *BSD and AIX. It has plenty of tests (I want more more more) and has a decent amount of usage is my mail box and bug list are any indication. Multiprocessing is not *new* - it's a branch of the pyprocessing package. Multiprocessing is written in C, so as for the less agile - I don't see how it's any less agile then what you've talked about. If you wanted true platform insensitivity, then Java is a better bet :) As for your final point: - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). I philosophically disagree with you here. PThreads and Shared memory as it is today, is largely based on Java's influence on the world. I would argue that the reason most people use threads as opposed to processes is simply based on ease of use and entry (which is ironic, given how many problems it causes). Not because they *need* the shared memory aspects of it, or because they could not decompose the problem into Actors/message passing, but because threads: A are there (e.g. in Java, Python, etc) B allow you to share anything (which allows you to take horrible shortcuts) C is what everyone knows at this point. Even luminaries such as Brian Goetz and many, many others have pointed out that threading, as it exists today is fundamentally difficult to get right. Ergo the renaissance (read: echo chamber) towards Erlang-style concurrency. For many real world applications - threading is just simple. This is why Multiprocessing exists at all - to attempt to make forking/IPC as simple as the API to threading. It's not foolproof, but the goal was to open the door to multiple cores with a familiar API: Quoting PEP 371: The pyprocessing package offers a method to side-step the GIL allowing applications within CPython to take advantage of multi-core architectures without asking users to completely change their programming paradigm (i.e.: dropping threaded programming for another concurrent approach - Twisted, Actors, etc). The Processing package offers CPython a known API which mirrors albeit in a PEP 8 compliant manner, that of the threading API, with known semantics and easy scalability. I would argue that most of the people taking part in this discussion are working on real world applications - sure, multiprocessing as it exists today, right now - may not support your use case, but it was evaluated to fit *many* use cases. Most of the people here are working in Pure python, or they're using a few extension modules here and there (in C). Again, when you say threads and processes, most people here are going to think import threading, fork() or import multiprocessing Please correct me if I am wrong in understanding what you want: You are making threads in another language (not via the threading API), embed python in those threads, but you want to be able to share objects/state between those threads, and independent interpreters. You want to be able to pass state from one interpreter to another via shared memory (e.g. pointers/contexts/etc). Example: ParentAppFoo makes 10 threads (in C) Each thread gets an itty bitty python interpreter ParentAppFoo gets a object(video) to render Rather then
Re: 2.6, 3.0, and truly independent intepreters
Jesse Noller wrote: Even luminaries such as Brian Goetz and many, many others have pointed out that threading, as it exists today is fundamentally difficult to get right. Ergo the renaissance (read: echo chamber) towards Erlang-style concurrency. I think this is slightly missing what Andy is saying. Andy is trying something that would look much more like Erlang-style concurrency than classic threads - green processes to use someone else's term. AFAIK, Erlang processes aren't really processes at the OS level. Instead, they are named processes because they only communicate through message passing. When multiple processes are running in the same os-level-multi-threaded interpreter, the interpreter cheats to make the message passing fast. I think Andy is thinking along the same lines. With a Python subinterpreter per thread, he is suggesting intra-process message passing as a way to get concurrency. Its actually not too far from what he is doing already, but he is fighting OS-level shared library semantics to do it. Instead, if Python supported a per-subinterpreter GIL and per-subinterpreter state, then you could theoretically get to a good place: - You only initialize subinterpreters if you need them, so single-process Python doesn't pay a large (any?) penalty - Intra-process message passing can be fast, but still has the no-shared-state benefits of the Erlang concurrency model - There are fewer changes to the Python core, because the GIL doesn't go away No, this isn't whole-hog free threading (or safe threading), there are restrictions that go along with this model - but there would be benefits. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 30, 1:00 pm, Jesse Noller [EMAIL PROTECTED] wrote: Multiprocessing is written in C, so as for the less agile - I don't see how it's any less agile then what you've talked about. Sorry for not being more specific there, but by less agile I meant that an app's codebase is less agile if python is an absolute requirement. If I was told tomorrow that for some reason we had to drop python and go with something else, it's my job to have chosen a codebase path/roadmap such that my response back isn't just well, we're screwed then. Consider modern PC games. They have huge code bases that use DirectX and OpenGL and having a roadmap of flexibility is paramount so packages they choose to use are used in a contained and hedged fashion. It's a survival tactic for a company not to entrench themselves in a package or technology if they don't have to (and that's what I keep trying to raise in the thread--that the python dev community should embrace development that makes python a leading candidate for lightweight use). Companies want to build a flexible, powerful codebases that are married to as few components as possible. - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). I would argue that the reason most people use threads as opposed to processes is simply based on ease of use and entry (which is ironic, given how many problems it causes). No, we're in agreement here -- I was just trying to offer a more detailed explanation of ease of use. It's easy because memory is shared and no IPC, serialization, or special allocator code is required. And as we both agree, it's far from easy once those threads to interact with each other. But again, my goal here is to stay on the embarrassingly easy parallelization scenarios. I would argue that most of the people taking part in this discussion are working on real world applications - sure, multiprocessing as it exists today, right now - may not support your use case, but it was evaluated to fit *many* use cases. And as I've mentioned, it's a totally great endeavor to be super proud of. That suite of functionality alone opens some *huge* doors for python and I hope folks that use it appreciate how much time and thought that undoubtably had to go into it. You get total props, for sure, and you're work is a huge and unique credit to the community. Please correct me if I am wrong in understanding what you want: You are making threads in another language (not via the threading API), embed python in those threads, but you want to be able to share objects/state between those threads, and independent interpreters. You want to be able to pass state from one interpreter to another via shared memory (e.g. pointers/contexts/etc). Example: ParentAppFoo makes 10 threads (in C) Each thread gets an itty bitty python interpreter ParentAppFoo gets a object(video) to render Rather then marshal that object, you pass a pointer to the object to the children You want to pass that pointer to an existing, or newly created itty bitty python interpreter for mangling Itty bitty python interpreter passes the object back to a C module via a pointer/context If the above is wrong, I think possible outlining it in the above form may help people conceptualize it - I really don't think you're talking about python-level processes or threads. Yeah, you have it right-on there, with added fact that the C and python execution (and data access) are highly intertwined (so getting and releasing the GIL would have to be happening all over). For example, consider and the dynamics, logic, algorithms, and data structures associated with image and video effects and image and video image recognition/analysis. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Thu, Oct 30, 2008 at 1:54 PM, Andy O'Meara [EMAIL PROTECTED] wrote: On Oct 30, 1:00 pm, Jesse Noller [EMAIL PROTECTED] wrote: Multiprocessing is written in C, so as for the less agile - I don't see how it's any less agile then what you've talked about. Sorry for not being more specific there, but by less agile I meant that an app's codebase is less agile if python is an absolute requirement. If I was told tomorrow that for some reason we had to drop python and go with something else, it's my job to have chosen a codebase path/roadmap such that my response back isn't just well, we're screwed then. Consider modern PC games. They have huge code bases that use DirectX and OpenGL and having a roadmap of flexibility is paramount so packages they choose to use are used in a contained and hedged fashion. It's a survival tactic for a company not to entrench themselves in a package or technology if they don't have to (and that's what I keep trying to raise in the thread--that the python dev community should embrace development that makes python a leading candidate for lightweight use). Companies want to build a flexible, powerful codebases that are married to as few components as possible. - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). I would argue that the reason most people use threads as opposed to processes is simply based on ease of use and entry (which is ironic, given how many problems it causes). No, we're in agreement here -- I was just trying to offer a more detailed explanation of ease of use. It's easy because memory is shared and no IPC, serialization, or special allocator code is required. And as we both agree, it's far from easy once those threads to interact with each other. But again, my goal here is to stay on the embarrassingly easy parallelization scenarios. That's why when I'm using threads, I stick to Queues. :) I would argue that most of the people taking part in this discussion are working on real world applications - sure, multiprocessing as it exists today, right now - may not support your use case, but it was evaluated to fit *many* use cases. And as I've mentioned, it's a totally great endeavor to be super proud of. That suite of functionality alone opens some *huge* doors for python and I hope folks that use it appreciate how much time and thought that undoubtably had to go into it. You get total props, for sure, and you're work is a huge and unique credit to the community. Thanks - I'm just a cheerleader and pusher-into-core, R Oudkerk is the implementor. He and everyone else who has helped deserve more credit than me by far. My main interest, and the reason I brought it up (again) is that I'm interested in making it better :) Please correct me if I am wrong in understanding what you want: You are making threads in another language (not via the threading API), embed python in those threads, but you want to be able to share objects/state between those threads, and independent interpreters. You want to be able to pass state from one interpreter to another via shared memory (e.g. pointers/contexts/etc). Example: ParentAppFoo makes 10 threads (in C) Each thread gets an itty bitty python interpreter ParentAppFoo gets a object(video) to render Rather then marshal that object, you pass a pointer to the object to the children You want to pass that pointer to an existing, or newly created itty bitty python interpreter for mangling Itty bitty python interpreter passes the object back to a C module via a pointer/context If the above is wrong, I think possible outlining it in the above form may help people conceptualize it - I really don't think you're talking about python-level processes or threads. Yeah, you have it right-on there, with added fact that the C and python execution (and data access) are highly intertwined (so getting and releasing the GIL would have to be happening all over). For example, consider and the dynamics, logic, algorithms, and data structures associated with image and video effects and image and video image recognition/analysis. okie doke! -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On 30 Okt, 14:12, Andy O'Meara [EMAIL PROTECTED] wrote: 3) Start a new python implementation, let's call it CPythonES [...] 4) Drop python, switch to Lua. Have you looked at tinypy? I'm not sure about the concurrency aspects of the implementation, but the developers are not completely unfamiliar with game development, and there is a certain amount of influence from Lua: http://www.tinypy.org/ It might also be a more appropriate starting point than CPython for experimentation. Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: You should really reconsider writing performance-critical code in Python. I don't follow you there... Performance-critical code in Python?? Martin meant what he said better later Again, if you do heavy-lifting in Python, you should consider to rewrite the performance-critical parts in C. I tried to list some abbreviated examples in other posts, but here's some elaboration: ... The common pattern here is where there's a serious mix of C and python code and data structures, I get the feeling that what you are doing is more variegated that what most others are doing with Python. And the reason is that what you are doing is apparently not possible with *stock* CPython. Again, it is a chicken-and-egg type problem. You might find this of interest from the PyDev list just hours ago. Hi to all Python developers For a student project in a course on virtual machines, we are evaluating the possibility to experiment with removing the GIL from CPython We have read the arguments against doing this at http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock. But we think it might be possible to do this with a different approach than what has been tried till now. The main reason for the necessity of the GIL is reference counting. We believe that most of the slowdown in the free threading implementation of Greg Stein was due to the need of atomic refcounting, as this mail seems to confirm: http://mail.python.org/pipermail/python-ideas/2007-April/000414.html So we want to change CPython into having a real garbage collector - removing all reference counting, and then the need for locks (or atomic inc/dec ops) should be highly alleviated. Preferably the GC should be a high-performance one for instance a generational one. We believe that it can run quite a lot faster than ref-counting. Shared datastructures would get their lock obviously. Immutable objects (especially shared global objects, like True, False, Null) would not. Most of the interpreter structure would be per-thread, at that point. We do not know how Greg Stein did his locking in the free threads patch, but as a part of the course we learned there exists much faster ways of locking than using OS-locks (faster for the uncontented case) that are used in e.g. the HOT-SPOT java-compiler. This might make free threading in python more attractive than some pessimists think. (http://blogs.sun.com/dave/entry/biased_locking_in_hotspot) In particular, we are talking about making the uncontended case go fast, not about the independent part of stack-allocating the mutex structure, which can only be done and is only needed in Java. These ideas are similar to the ones used by Linux fast mutexes (futexes), the implementation of mutexes in NPTL. We have read this mail thread - so it seems that our idea surfaced, but Greg didn't completely love it (he wanted to optimize refcounting instead): http://mail.python.org/pipermail/python-ideas/2007-April/000436.html He was not totally negative however. His main objections are about: - cache locality (He is in our opinion partially right, as seen in some other paper time ago - any GC, copying GC in particular, doubles the amount of used memory, so it's less cache-friendly). But still GCs are overall competitive or faster than explicit management, and surely much faster of refcounting. We know it is the plan for PyPy to work in this way, and also that Jython and Ironpython works like that (using the host vm's GC), so it seems to be somehow agreeable with the python semantics (perhaps not really with __del__ but they are not really nice anyway). Was this ever tried for CPython? Any other comments, encouragements or warnings on the project-idea? Best regards: Paolo, Sigurd [EMAIL PROTECTED] Guido's response It's not that I have any love for the GIL, it just is the best compromise I could find. I expect that you won't be able to do better, but I wish you luck anyway. And a bit more explanation from Van Lindberg Just an FYI, these two particular students already introduced themselves on the PyPy list. Paolo is a masters student with experience in the Linux kernel; Sigurd is a PhD candidate. Their professor is Lars Bak, the lead architect of the Google V8 Javascript engine. They spent some time working on V8 in the last couple months. I agree that you should continue the discussion. Just let Martin ignore it for awhile until you need further input from him. Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Why do you think so? For C code that is carefully written, the GIL allows *very well* to write CPU bound scripts running on other threads. (please do get back to Jesse's original remark in case you have lost the thread :-) I don't follow you there. If you're referring to multiprocessing No, I'm not. I refer to regular, plain, multi-threading. It's turns out that this isn't an exotic case at all: there's a *ton* of utility gained by making calls back into the interpreter. The best example is that since code more easily maintained in python than in C, a lot of the module utility code is likely to be in python. You should really reconsider writing performance-critical code in Python. I don't follow you there... Performance-critical code in Python?? I probably expressed myself incorrectly (being not a native speaker of English): If you were writing performance-critical in Python, you should reconsider (i.e. you should rewrite it in C). It's not clear whether this calling back into Python is in the performance-critical path. If it is, then reconsider. I tried to list some abbreviated examples in other posts, but here's some elaboration: - Pixel-level effects and filters, where some filters may use C procs while others may call back into the interpreter to execute logic -- while some do both, multiple times. Ok. For a plain C proc, release the GIL before the proc, and reacquire it afterwards. For a proc that calls into the interpreter: a) if it is performance-critical, reconsider writing it in C, or reformulate so that it stops being performance critical (e.g. through caching) b) else, reacquire the GIL before calling back into Python, then release the GIL before continuing the proc - Image and video analysis/recognition where there's TONS of intricate data structures and logic. Those data structures and logic are easiest to develop and maintain in python, but you'll often want to call back to C procs which will, in turn, want to access Python (as well as C-level) data structures. Not sure what the processing is, or what processing you need to do. The data structures themselves are surely not performance critical (not being algorithms). If you really run Python algorithms on these structures, then my approach won't help you (except for the general recommendation to find some expensive sub-algorithm and rewrite that in C, so that it both becomes faster and can release the GIL). It's just not practical to be locking and locking the GIL when you want to operate on python data structures or call back into python. This I don't understand. I find that fairly easy to do. You seem to have placed the burden of proof on my shoulders for an app to deserve the ability to free-thread when using 3rd party packages, so how about we just agree it's not an unreasonable desire for a package (such as python) to support it and move on with the discussion. Not at all - I don't want a proof. I just want agreement on Jesse Noller's claim # A c-level module, on the other hand, can sidestep/release # the GIL at will, and go on it's merry way and process away. If neither is likely to result, killing the discussion is the most productive thing we can do. Well, most others here seem to have a lot different definition of what qualifies as a futile discussion, so how about you allow the rest of us continue to discuss these issues and possible solutions. And, for the record, I've said multiple times I'm ready to contribute monetarily, professionally, and personally, so if that doesn't qualify as the precursor to code contributions from one of the participants then I don't know WHAT does. Ok, I apologize for having misunderstood you here. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Wed, Oct 29, 2008 at 4:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/29/2008 3:45 PM, came the following characters from the keyboard of Patrick Stinson: If you are dealing with lots of data like in video or sound editing, you would just keep the data in shared memory and send the reference over IPC to the worker process. Otherwise, if you marshal and send you are looking at a temporary doubling of the memory footprint of your app because the data will be copied, and marshaling overhead. Right. Sounds, and is, easy, if the data is all directly allocated by the application. But when pieces are allocated by 3rd party libraries, that use the C-runtime allocator directly, then it becomes more difficult to keep everything in shared memory. good point. One _could_ replace the C-runtime allocator, I suppose, but that could have some adverse effects on other code, that doesn't need its data to be in shared memory. So it is somewhat between a rock and a hard place. ewww scary. mousetraps for sale? By avoiding shared memory, such problems are sidestepped... until you run smack into the GIL. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On approximately 10/30/2008 6:26 AM, came the following characters from the keyboard of Jesse Noller: On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/29/2008 3:45 PM, came the following characters from the keyboard of Patrick Stinson: If you are dealing with lots of data like in video or sound editing, you would just keep the data in shared memory and send the reference over IPC to the worker process. Otherwise, if you marshal and send you are looking at a temporary doubling of the memory footprint of your app because the data will be copied, and marshaling overhead. Right. Sounds, and is, easy, if the data is all directly allocated by the application. But when pieces are allocated by 3rd party libraries, that use the C-runtime allocator directly, then it becomes more difficult to keep everything in shared memory. One _could_ replace the C-runtime allocator, I suppose, but that could have some adverse effects on other code, that doesn't need its data to be in shared memory. So it is somewhat between a rock and a hard place. By avoiding shared memory, such problems are sidestepped... until you run smack into the GIL. If you do not have shared memory: You don't need threads, ergo: You don't get penalized by the GIL. Threads are only useful when you need to have that requirement of large in-memory data structures shared and modified by a pool of workers. The whole point of this thread is to talk about large in-memory data structures that are shared and modified by a pool of workers. My reference to shared memory was specifically referring to the concept of sharing memory between processes... a particular OS feature that is called shared memory. The need for sharing memory among a pool of workers is still the premise. Threads do that automatically, without the need for the OS shared memory feature, that brings with it the need for a special allocator to allocate memory in the shared memory area vs the rest of the address space. Not to pick on you, particularly, Jesse, but this particular response made me finally understand why there has been so much repetition of the same issues and positions over and over and over in this thread: instead of comprehending the whole issue, people are responding to small fragments of it, with opinions that may be perfectly reasonable for that fragment, but missing the big picture, or the explanation made when the same issue was raised in a different sub-thread. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Speaking of the big picture, is this how it normally works when someone says Here's some code and a problem and I'm willing to pay for a solution? I've never really walked that path with a project of this complexity (I guess it's the backwards-compatibility that makes it confusing), but is this problem just too complex so we have to keep talking and talking on forum after forum? Afraid to fork? I know I am. How many people are qualified to tackle Andy's problem? Are all of them busy or uninterested? Is the current code in a tight spot where it just can't be fixed without really jabbing that FORK in so deep that the patch will die when your project does? Personally I think this problem is super-awesome on the hobbyest's fun scale. I'd totally take the time to let my patch do the talking but I haven't read enough of the (2.5) code. So, I resort to simply reading the newsgroups and python code to better understand the mechanics problem :( On Thu, Oct 30, 2008 at 2:54 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/30/2008 6:26 AM, came the following characters from the keyboard of Jesse Noller: On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/29/2008 3:45 PM, came the following characters from the keyboard of Patrick Stinson: If you are dealing with lots of data like in video or sound editing, you would just keep the data in shared memory and send the reference over IPC to the worker process. Otherwise, if you marshal and send you are looking at a temporary doubling of the memory footprint of your app because the data will be copied, and marshaling overhead. Right. Sounds, and is, easy, if the data is all directly allocated by the application. But when pieces are allocated by 3rd party libraries, that use the C-runtime allocator directly, then it becomes more difficult to keep everything in shared memory. One _could_ replace the C-runtime allocator, I suppose, but that could have some adverse effects on other code, that doesn't need its data to be in shared memory. So it is somewhat between a rock and a hard place. By avoiding shared memory, such problems are sidestepped... until you run smack into the GIL. If you do not have shared memory: You don't need threads, ergo: You don't get penalized by the GIL. Threads are only useful when you need to have that requirement of large in-memory data structures shared and modified by a pool of workers. The whole point of this thread is to talk about large in-memory data structures that are shared and modified by a pool of workers. My reference to shared memory was specifically referring to the concept of sharing memory between processes... a particular OS feature that is called shared memory. The need for sharing memory among a pool of workers is still the premise. Threads do that automatically, without the need for the OS shared memory feature, that brings with it the need for a special allocator to allocate memory in the shared memory area vs the rest of the address space. Not to pick on you, particularly, Jesse, but this particular response made me finally understand why there has been so much repetition of the same issues and positions over and over and over in this thread: instead of comprehending the whole issue, people are responding to small fragments of it, with opinions that may be perfectly reasonable for that fragment, but missing the big picture, or the explanation made when the same issue was raised in a different sub-thread. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 30, 8:23 pm, Patrick Stinson [EMAIL PROTECTED] wrote: Speaking of the big picture, is this how it normally works when someone says Here's some code and a problem and I'm willing to pay for a solution? I've never really walked that path with a project of this complexity (I guess it's the backwards-compatibility that makes it confusing), but is this problem just too complex so we have to keep talking and talking on forum after forum? Afraid to fork? I know I am. How many people are qualified to tackle Andy's problem? Are all of them busy or uninterested? Is the current code in a tight spot where it just can't be fixed without really jabbing that FORK in so deep that the patch will die when your project does? Personally I think this problem is super-awesome on the hobbyest's fun scale. I'd totally take the time to let my patch do the talking but I haven't read enough of the (2.5) code. So, I resort to simply reading the newsgroups and python code to better understand the mechanics problem :( The scale of this issue is why so little progress gets made, yes. I intend to solve it regardless of getting paid (and have been working on various aspects for quite a while now), but as you can see from this thread it's very difficult to convince anybody that my approach is the *right* approach. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 31, 2:05 am, Andy O'Meara [EMAIL PROTECTED] wrote: I don't follow you there. If you're referring to multiprocessing, our concerns are: - Maturity (am I willing to tell my partners and employees that I'm betting our future on a brand-new module that imposes significant restrictions as to how our app operates?) - Liability (am I ready to invest our resources into lots of new python module-specific code to find out that a platform that we want to target isn't supported or has problems?). Like it not, we're a company and we have to show sensitivity about new or fringe packages that make our codebase less agile -- C/C++ continues to win the day in that department. I don't follow this...wouldn't both of these concerns be even more true for modifying the CPython interpreter to provide the functionality you want? -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Close, I work currently for EastWest :) Well, I actually like almost everything else about CPython, considering my audio work the only major problem I've had is with the GIL. I like the purist community, and I like the code, since integrating it on both platforms has been relatively clean, and required *zero* support. Frankly, with the exception of some windows deployment issues relating to static linking of libpython and some extensions, it's been a dream lib to use. Further, I really appreciate the discussions that happen in these lists, and I think that this particular problem is a wonderful example of a situation that requires tons of miscellaneous opinions and input from all angles - especially at this stage. I think that this problem has lots of standing discussion and lots of potential solutions and/or workarounds, and it would be cool for someone to aggregate and paraphrase that stuff into a page to assist those thinking about doing some patching. That's probably something that the coder would do themselves though. On Fri, Oct 24, 2008 at 10:25 AM, Andy O'Meara [EMAIL PROTECTED] wrote: So we are sitting this music platform with unimaginable possibilities in the music world (of which python does not play a role), but those little CPU spikes caused by the GIL at low latencies won't let us have it. AFAIK, there is no music scripting language out there that would come close, and yet we are so close! This is a big deal. Perfectly said, Patrick. It pains me to know how widespread python *could* be in commercial software! Also, good points about people being longwinded and that code talks. Sadly, the time alone I've spend in the last couple days on this thread is scary, but I'm committed now, I guess. :^( I look at the length of the posts of some of these guys and I have to wonder what the heck they do for a living! As I mentioned, however, I'm close to just blowing the whistle on this crap and start making CPythonES (as I call it, in the spirit of the ES in OpenGLES). Like you, we just want the core features of python in a clean, tidy, *reliable* fashion--something that we can ship and not lose sleep (or support hours) over. Basically, I imagine developing an interpreter designed for dev houses like yours and mine (you're Ableton or Propellerhead, right?)--a python version of lua, if you will. The nice thing about it is that is could start fresh and small, but I have a feeling it would really catch on because every commercial dev house would choose it over CPython any day of the week and it would be completely disjoint form CPython. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Wow, man. Excellent post. You want a job? The gui could use PyA threads for sure, and the audio thread could use PyC threads. It would not be a problem to limit the audio thread to only reentrant libraries. This kind of thought is what I had in mind about finding a compromise, especially in the way that PyD would not break old code assuming that it could eventually be ported. On Fri, Oct 24, 2008 at 11:02 AM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/24/2008 8:42 AM, came the following characters from the keyboard of Andy O'Meara: Glenn, great post and points! Thanks. I need to admit here that while I've got a fair bit of professional programming experience, I'm quite new to Python -- I've not learned its internals, nor even the full extent of its rich library. So I have some questions that are partly about the goals of the applications being discussed, partly about how Python is constructed, and partly about how the library is constructed. I'm hoping to get a better understanding of all of these; perhaps once a better understanding is achieved, limitations will be understood, and maybe solutions be achievable. Let me define some speculative Python interpreters; I think the first is today's Python: PyA: Has a GIL. PyA threads can run within a process; but are effectively serialized to the places where the GIL is obtained/released. Needs the GIL because that solves lots of problems with non-reentrant code (an example of non-reentrant code, is code that uses global (C global, or C static) variables – note that I'm not talking about Python vars declared global... they are only module global). In this model, non-reentrant code could include pieces of the interpreter, and/or extension modules. PyB: No GIL. PyB threads acquire/release a lock around each reference to a global variable (like with feature). Requires massive recoding of all code that contains global variables. Reduces performance significantly by the increased cost of obtaining and releasing locks. PyC: No locks. Instead, recoding is done to eliminate global variables (interpreter requires a state structure to be passed in). Extension modules that use globals are prohibited... this eliminates large portions of the library, or requires massive recoding. PyC threads do not share data between threads except by explicit interfaces. PyD: (A hybrid of PyA PyC). The interpreter is recoded to eliminate global variables, and each interpreter instance is provided a state structure. There is still a GIL, however, because globals are potentially still used by some modules. Code is added to detect use of global variables by a module, or some contract is written whereby a module can be declared to be reentrant and global-free. PyA threads will obtain the GIL as they would today. PyC threads would be available to be created. PyC instances refuse to call non-reentrant modules, but also need not obtain the GIL... PyC threads would have limited module support initially, but over time, most modules can be migrated to be reentrant and global-free, so they can be used by PyC instances. Most 3rd-party libraries today are starting to care about reentrancy anyway, because of the popularity of threads. The assumptions here are that: Data-1) A Python interpreter doesn't provide any mechanism to share normal data among threads, they are independent... but message passing works. Data-2) A Python interpreter could be extended to provide mechanisms to share special data, and the data would come with an implicit lock. Data-3) A Python interpreter could be extended to provide unlocked access to special data, requiring the application to handle the synchronization between threads. Data of type 2 could be used to control access to data of type 3. This type of data could be large, or frequently referenced data, but only by a single thread at a time, with major handoffs to a different thread synchronized by the application in whatever way it chooses. Context-1) A Python interpreter would know about threads it spawns, and could pass in a block of context (in addition to the state structure) as a parameter to a new thread. That block of context would belong to the thread as long as it exists, and return to the spawner when the thread completes. An embedded interpreter would also be given a block of context (in addition to the state structure). This would allow application context to be created and passed around. Pointers to shared memory structures, might be typical context in the embedded case. Context-2) Embedded Python interpreters could be spawned either as PyA threads or PyC threads. PyC threads would be limited to modules that are reentrant. I think that PyB and PyC are the visions that people see, which argue against implementing independent interpreters. PyB isn't truly independent, because data are shared, recoding is required, and performance suffers. Ick. PyC requires
Re: 2.6, 3.0, and truly independent intepreters
On Fri, Oct 24, 2008 at 12:51 PM, Andy O'Meara [EMAIL PROTECTED] wrote: Another great post, Glenn!! Very well laid-out and posed!! Thanks for taking the time to lay all that out. Questions for Andy: is the type of work you want to do in independent threads mostly pure Python? Or with libraries that you can control to some extent? Are those libraries reentrant? Could they be made reentrant? How much of the Python standard library would need to be available in reentrant mode to provide useful functionality for those threads? I think you want PyC I think you've defined everything perfectly, and you're you're of course correct about my love for for the PyC model. :^) Like any software that's meant to be used without restrictions, our code and frameworks always use a context object pattern so that there's never and non-const global/shared data). I would go as far to say that this is the case with more performance-oriented software than you may think since it's usually a given for us to have to be parallel friendly in as many ways as possible. Perhaps Patrick can back me up there. And I will. As to what modules are essential... As you point out, once reentrant module implementations caught on in PyC or hybrid world, I think we'd start to see real effort to whip them into compliance-- there's just so much to be gained imho. But to answer the question, there's the obvious ones (operator, math, etc), string/buffer processing (string, re), C bridge stuff (struct, array), and OS basics (time, file system, etc). Nice-to-haves would be buffer and image decompression (zlib, libpng, etc), crypto modules, and xml. As far as I can imagine, I have to believe all of these modules already contain little, if any, global data, so I have to believe they'd be super easy to make PyC happy. Patrick, what would you see you guys using? We don't need anything :) Since our goal is just to use python as a scripting language/engine to our MIDI application, all we really need is to make calls to the api that we expose using __builtins__. You know, the standard python library is pretty siick, but the syntax, object model, and import mechanics of python itself is an **equally exportable function** of the code. Funny that I'm lucky enough to say: Screw the extension modules - I just want the LANGUAGE. But, I can't have it. That's the rub... In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. The same argument holds for numerical processing with large data sets. The workers handing back huge data sets via messaging isn't very attractive. In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? As I understand things, the multiprocessing puts stuff in a child process (i.e. a separate address space), so the only to get stuff to/ from it is via IPC, which can include a shared/mapped memory region. Unfortunately, a shared address region doesn't work when you have large and opaque objects (e.g. a rendered CoreVideo movie in the QuickTime API or 300 megs of audio data that just went through a DSP). Then you've got the hit of serialization if you're got intricate data structures (that would normally would need to be serialized, such as a hashtable or something). Also, if I may speak for commercial developers out there who are just looking to get the job done without new code, it's usually always preferable to just a single high level sync object (for when the job is complete) than to start a child processes and use IPC. The former is just WAY less code, plain and simple. Andy -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On 28 Okt, 21:03, Rhamphoryncus [EMAIL PROTECTED] wrote: * get a short-term bodge that works, like hacking the 3rd party library to use your shared-memory allocator. Should be far less work than hacking all of CPython. Did anyone come up with a reason why shared memory couldn't be used for the purpose described by the inquirer? With the disadvantages of serialisation circumvented, that would leave issues of contention, and on such matters I have to say that I'm skeptical about solutions which try and make concurrent access to CPython objects totally transparent, mostly because it appears to be quite a lot of work to get right (as POSH illustrates, and as your own safethread work shows), and also because systems where contention is spread over a large surface (any object can potentially be accessed by any process at any time) are likely to incur a lot of trouble for the dubious benefit of being vague about which objects are actually being shared. Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 29, 7:20 am, Paul Boddie [EMAIL PROTECTED] wrote: On 28 Okt, 21:03, Rhamphoryncus [EMAIL PROTECTED] wrote: * get a short-term bodge that works, like hacking the 3rd party library to use your shared-memory allocator. Should be far less work than hacking all of CPython. Did anyone come up with a reason why shared memory couldn't be used for the purpose described by the inquirer? With the disadvantages of serialisation circumvented, that would leave issues of contention, and on such matters I have to say that I'm skeptical about solutions which try and make concurrent access to CPython objects totally transparent, mostly because it appears to be quite a lot of work to get right (as POSH illustrates, and as your own safethread work shows), and also because systems where contention is spread over a large surface (any object can potentially be accessed by any process at any time) are likely to incur a lot of trouble for the dubious benefit of being vague about which objects are actually being shared. I believe large existing libraries were the reason. Thus my suggestion of the evil fork+mmap abuse. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
If you are dealing with lots of data like in video or sound editing, you would just keep the data in shared memory and send the reference over IPC to the worker process. Otherwise, if you marshal and send you are looking at a temporary doubling of the memory footprint of your app because the data will be copied, and marshaling overhead. On Fri, Oct 24, 2008 at 3:50 PM, Andy O'Meara [EMAIL PROTECTED] wrote: Are you familiar with the API at all? Multiprocessing was designed to mimic threading in about every way possible, the only restriction on shared data is that it must be serializable, but event then you can override or customize the behavior. Also, inter process communication is done via pipes. It can also be done with messages if you want to tweak the manager(s). I apologize in advance if I don't understand something correctly, but as I understand them, everything has to be serialized in order to go through IPC. So when you're talking about thousands of objects, buffers, and/or large OS opaque objects (e.g. memory-resident video and images), that seems like a pretty rough hit of run-time resources. Please don't misunderstand my comments to suggest that multiprocessing isn't great stuff. On the contrary, it's very impressive and it singlehandedly catapults python *way* closer to efficient CPU bound processing than it ever was before. All I mean to say is that in the case where using a shared address space with a worker pthread per spare core to do CPU bound work, it's a really big win not to have to serialize stuff. And in the case of hundreds of megs of data and/or thousands of data structure instances, it's a deal breaker to serialize and unserialize everything just so that it can be sent though IPC. It's a deal breaker for most performance-centric apps because of the unnecessary runtime resource hit and because now all those data structures being passed around have to have accompanying serialization code written (and maintained) for them. That's actually what I meant when I made the comment that a high level sync object in a shared address space is better then sending it all through IPC (when the data sets are wild and crazy). From a C/C++ point of view, I would venture to say that it's always a huge win to just stick those embarrassingly easy parallelization cases into the thread with a sync object than forking and using IPC and having to write all the serialization code. And in the case of huge data types-- such as video or image rendering--it makes me nervous to think of serializing it all just so it can go through IPC when it could just be passed using a pointer change and a single sync object. So, if I'm missing something and there's a way so pass data structures without serialization, then I'd definitely like to learn more (sorry in advance if I missed something there). When I took a look at multiprocessing my concerns where: - serialization (discussed above) - maturity (are we ready to bet the farm that mp is going to work properly on the platforms we need it to?) Again, I'm psyched that multiprocessing appeared in 2.6 and it's a huge huge step in getting everyone to unlock the power of python! But, then some of the tidbits described above are additional data points for you and others to chew on. I can tell you they're pretty important points for any performance-centric software provider (us, game developers--from EA to Ambrosia, and A/V production app developers like Patrick). Andy -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 26, 6:57 pm, Andy O'Meara [EMAIL PROTECTED] wrote: Grrr... I posted a ton of lengthy replies to you and other recent posts here using Google and none of them made it, argh. Poof. There's nothing that fires more up more than lost work, so I'll have to revert short and simple answers for the time being. Argh, damn. On Oct 25, 1:26 am, greg [EMAIL PROTECTED] wrote: Andy O'Meara wrote: I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. Moreover, I think this is probably the *only* way that totally independent interpreters could be realized. Converting the whole C API to use this strategy would be a very big project. Also, on the face of it, it seems like it would render all existing C extension code obsolete, although it might be possible to do something clever with macros to create a compatibility layer. Another thing to consider is that passing all these extra pointers around everywhere is bound to have some effect on performance. I'm with you on all counts, so no disagreement there. On the passing a ptr everywhere issue, perhaps one idea is that all objects could have an additionalfieldthat would point back to their parent context (ie. their interpreter). So the only prototypes that would have to be modified to contain the context ptr would be the ones that don't inherently operate on objects (e.g. importing a module). Trying to directly share objects like this is going to create contention. The refcounting becomes the sequential portion of Amdahl's Law. This is why safethread doesn't scale very well: I share a massive amount of objects. An alternative, actually simpler, is to create proxies to your real object. The proxy object has a pointer to the real object and the context containing it. When you call a method it serializes the arguments, acquires the target context's GIL (while releasing yours), and deserializes in the target context. Once the method returns it reverses the process. There's two reasons why this may perform well for you: First, operations done purely in C may cheat (if so designed). A copy from one memory buffer to another memory buffer may be given two proxies as arguments, but then operate directly on the target objects (ie without serialization). Second, if a target context is idle you can enter it (acquiring its GIL) without any context switch. Of course that scenario is full of maybes, which is why I have little interest in it.. An even better scenario is if your memory buffer's methods are in pure C and it's a simple object (no pointers). You can stick the memory buffer in shared memory and have multiple processes manipulate it from C. More maybes. An evil trick if you need pointers, but control the allocation, is to take advantage of the fork model. Have a master process create a bunch of blank files (temp files if linux doesn't allow /dev/zero), mmap them all using MAP_SHARED, then fork and utilize. The addresses will be inherited from the master process, so any pointers within them will be usable across all processes. If you ever want to return memory to the system you can close that file, then have all processes use MAP_SHARED|MAP_FIXED to overwrite it. Evil, but should be disturbingly effective, and still doesn't require modifying CPython. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn Linderman wrote: so a 3rd party library might be called to decompress the stream into a set of independently allocated chunks, each containing one frame (each possibly consisting of several allocations of memory for associated metadata) that is independent of other frames We use a combination of a dictionary + RGB data for this purpose. Using a dictionary works out pretty nicely for the metadata, and obviously one attribute holds the frame data as a binary blob. http://www.kamaelia.org/Components/pydoc/Kamaelia.Codec.YUV4MPEG gives some idea structure and usage. The example given there is this: Pipeline( RateControlledFileReader(video.dirac,readmode=bytes, ...), DiracDecoder(), FrameToYUV4MPEG(), SimpleFileWriter(output.yuv4mpeg) ).run() Now all of those components are generator components. That's useful since: a) we can structure the code to show what it does more clearly, and it still run efficiently inside a single process b) We can change this over to using multiple processes trivially: ProcessPipeline( RateControlledFileReader(video.dirac,readmode=bytes, ...), DiracDecoder(), FrameToYUV4MPEG(), SimpleFileWriter(output.yuv4mpeg) ).run() This version uses multiple processes (under the hood using Paul Boddies pprocess library, since this support predates the multiprocessing module support in python). The big issue with *this* version however is that due to pprocess (and friends) pickling data to be sent across OS pipes, the data throughput on this would be lowsy. Specifically in this example, if we could change it such that the high level API was this: ProcessPipeline( RateControlledFileReader(video.dirac,readmode=bytes, ...), DiracDecoder(), FrameToYUV4MPEG(), SimpleFileWriter(output.yuv4mpeg) use_shared_memory_IPC = True, ).run() That would be pretty useful, for some hopefully obvious reasons. I suppose ideally we'd just use shared_memory_IPC for everything and just go back to this: ProcessPipeline( RateControlledFileReader(video.dirac,readmode=bytes, ...), DiracDecoder(), FrameToYUV4MPEG(), SimpleFileWriter(output.yuv4mpeg) ).run() But essentially for us, this is an optimisation problem, not a how do I even begin to use this problem. Since it is an optimisation problem, it also strikes me as reasonable to consider it OK to special purpose and specialise such links until you get an approach that's reasonable for general purpose data. In theory, poshmodule.sourceforge.net, with a bit of TLC would be a good candidate or good candidate starting point for that optimisation work (since it does work in Linux, contrary to a reply in the thread - I've not tested it under windows :). If someone's interested in building that, then someone redoing our MiniAxon tutorial using processes shared memory IPC rather than generators would be a relatively gentle/structured approach to dealing with this: * http://www.kamaelia.org/MiniAxon/ The reason I suggest that is because any time we think about fiddling and creating a new optimisation approach or concurrency approach, we tend to build a MiniAxon prototype to flesh out the various issues involved. Michael -- http://www.kamaelia.org/Home -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Philip Semanchuk wrote: On Oct 25, 2008, at 7:53 AM, Michael Sparks wrote: Glenn Linderman wrote: In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? If the poshmodule had a bit of TLC, it would be extremely useful for this,... http://poshmodule.sourceforge.net/ Last time I checked that was Windows-only. Has that changed? I've only tested it under Linux where it worked, but does clearly need a bit of work :) The only IPC modules for Unix that I'm aware of are one which I adopted (for System V semaphores shared memory) and one which I wrote (for POSIX semaphores shared memory). http://NikitaTheSpider.com/python/shm/ http://semanchuk.com/philip/posix_ipc/ I'll take a look at those - poshmodule does need a bit of TLC and doesn't appear to be maintained. If anyone wants to wrap POSH cleverness around them, go for it! If not, maybe I'll make the time someday. I personally don't have the time do do this, but I'd be very interested in hearing someone building an up-to-date version. (Indeed, something like this would be extremely useful for everyone to have in the standard library now that the multiprocessing library is in the standard library) Michael. -- http://www.kamaelia.org/Home -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 26, 10:11 pm, James Mills [EMAIL PROTECTED] wrote: On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara [EMAIL PROTECTED] wrote: I think we miscommunicated there--I'm actually agreeing with you. I was trying to make the same point you were: that intricate and/or large structures are meant to be passed around by a top-level pointer, not using and serialization/messaging. This is what I've been trying to explain to others here; that IPC and shared memory unfortunately aren't viable options, leaving app threads (rather than child processes) as the solution. Andy, Why don't you just use a temporary file system (ram disk) to store the data that your app is manipulating. All you need to pass around then is a file descriptor. --JamesMills Unfortunately, it's the penalty of serialization and unserialization. When you're talking about stuff like memory-resident images and video (complete with their intricate and complex codecs), then the only option is to be passing around a couple pointers rather then take the hit of serialization (which is huge for video, for example). I've gone into more detail in some other posts but I could have missed something. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 27, 4:05 am, Martin v. Löwis [EMAIL PROTECTED] wrote: Andy O'Meara wrote: Well, when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Otherwise, please describe in detail how I'd get an opaque OS object (e.g. an OS ref that refers to memory- resident video) from the child process back to the parent process. WHAT PARENT PROCESS? In the same address space, to me, means a single process only, not multiple processes, and no parent process anywhere. If you have just multiple threads, the notion of passing data from a child process back to the parent process is meaningless. I know... I was just responding to you and others here keep beating the fork drum. I just trying make it clear that a shared address space is the only way to go. Ok, good, so we're in agreement that threads is the only way to deal with the intricate and complex data set issue in a performance-centric application. Again, the big picture that I'm trying to plant here is that there really is a serious need for truly independent interpreters/contexts in a shared address space. I understand that this is your mission in this thread. However, why is that your problem? Why can't you just use the existing (limited) multiple-interpreters machinery, and solve your problems with that? Because then we're back into the GIL not permitting threads efficient core use on CPU bound scripts running on other threads (when they otherwise could). Just so we're on the same page, when they otherwise could is relevant here because that's the important given: that each interpreter (context) truly never has any context with others. An example would be python scripts that generate video programatically using an initial set of params and use an in-house C module to construct frame (which in turn make and modify python C objects that wrap to intricate codec related data structures). Suppose you wanted to render 3 of these at the same time, one on each thread (3 threads). With the GIL in place, these threads can't anywhere close to their potential. Your response thus far is that the C module should release the GIL before it commences its heavy lifting. Well, the problem is that if during its heavy lifting it needs to call back into its interpreter. It's turns out that this isn't an exotic case at all: there's a *ton* of utility gained by making calls back into the interpreter. The best example is that since code more easily maintained in python than in C, a lot of the module utility code is likely to be in python. Unsurprisingly, this is the situation myself and many others are in: where we want to subsequently use the interpreter within the C module (so, as I understand it, the proposal to have the C module release the GIL unfortunately doesn't work as a general solution). For most industry-caliber packages, the expectation and convention (unless documented otherwise) is that the app can make as many contexts as its wants in whatever threads it wants because the convention is that the app is must (a) never use one context's objects in another context, and (b) never use a context at the same time from more than one thread. That's all I'm really trying to look at here. And that's indeed the case for Python, too. The app can make as many subinterpreters as it wants to, and it must not pass objects from one subinterpreter to another one, nor should it use a single interpreter from more than one thread (although that is actually supported by Python - but it surely won't hurt if you restrict yourself to a single thread per interpreter). I'm not following you there... I thought we're all in agreement that the existing C modules are FAR from being reentrant, regularly making use of static/global objects. The point I had made before is that other industry-caliber packages specifically don't have restrictions in *any* way. I appreciate your arguments these a PyC concept is a lot of work with some careful design work, but let's not kill the discussion just because of that. The fact remains that the video encoding scenario described above is a pretty reasonable situation, and as more people are commenting in this thread, there's an increasing need to offer apps more flexibility when it comes to multi-threaded use. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn Linderman wrote: So your 50% number is just a scare tactic, it would seem, based on wild guesses. Was there really any benefit to the comment? All I was really trying to say is that it would be a mistake to assume that the overhead will be negligible, as that would be just as much a wild guess as 50%. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 25, 9:46 am, M.-A. Lemburg [EMAIL PROTECTED] wrote: These discussion pop up every year or so and I think that most of them are not really all that necessary, since the GIL isn't all that bad. Thing is, if the topic keeps coming up, then that may be an indicator that change is truly needed. Someone much wiser than me once shared that a measure of the usefulness and quality of a package (or API) is how easily it can be added to an application--of any flavors--without the application needing to change. So in the rising world of idle cores and worker threads, I do see an increasing concern over the GIL. Although I recognize that the debate is lengthy, heated, and has strong arguments on both sides, my reading on the issue makes me feel like there's a bias for the pro-GIL side because of the volume of design and coding work associated with considering various alternatives (such as Glenn's Py* concepts). And I DO respect and appreciate where the pro-GIL people come from: who the heck wants to do all that work and recoding so that a tiny percent of developers can benefit? And my best response is that as unfortunate as it is, python needs to be more multi-threaded app- friendly if we hope to attract the next generation of app developers that want to just drop python into their app (and not have to change their app around python). For example, Lua has that property, as evidenced by its rapidly growing presence in commercial software (Blizzard uses it heavily, for example). Furthermore, there are lots of ways to tune the CPython VM to make it more or less responsive to thread switches via the various sys.set*() functions in the sys module. Most computing or I/O intense C extensions, built-in modules and object implementations already release the GIL for you, so it usually doesn't get in the way all that often. The main issue I take there is that it's often highly useful for C modules to make subsequent calls back into the interpreter. I suppose the response to that is to call the GIL before reentry, but it just seems to be more code and responsibility in scenarios where it's no necessary. Although that code and protocol may come easy to veteran CPython developers, let's not forget that an important goal is to attract new developers and companies to the scene, where they get their thread-independent code up and running using python without any unexpected reengineering. Again, why are companies choosing Lua over Python when it comes to an easy and flexible drop-in interpreter? And please take my points here to be exploratory, and not hostile or accusatory, in nature. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 27, 10:55 pm, Glenn Linderman [EMAIL PROTECTED] wrote: And I think we still are miscommunicating! Or maybe communicating anyway! So when you said object, I actually don't know whether you meant Python object or something else. I assumed Python object, which may not have been correct... but read on, I think the stuff below clears it up. Then when you mentioned thousands of objects, I imagined thousands of Python objects, and somehow transforming the blob into same... and back again. My apologies to you and others here on my use of objects -- I'm use the term generically and mean it to *not* refer to python objects (for the all the reasons discussed here). Python only makes up a small part of our app, hence my habit of objects to refer to other APIs' allocated and opaque objects (including our own and OS APIs). For all the reasons we've discussed, in our world, python objects don't travel around outside of our python C modules -- when python objects need to be passed to other parts of the app, they're converted into their non- python (portable) equivalents (ints, floats, buffers, etc--but most of the time, the objects are PyCObjects, so they can enter and leave a python context with negligible overhead). I venture to say this is pretty standard when any industry app uses a package (such as python), for various reasons: - Portability/Future (e.g. if we do decode to drop Python and go with Lua, the changes are limited to only one region of code). - Sanity (having any API's objects show up in places far away goes against easy-to-follow code). - MT flexibility (because we always never use static/global storage, we have all kinds of options when it comes to multithreading). For example, recall that by throwing python in multiple dynamic libs, we were able to achieve the GIL-less interpreter independence that we want (albeit ghetto and a pain). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 28, 9:30 am, Andy O'Meara [EMAIL PROTECTED] wrote: On Oct 25, 9:46 am, M.-A. Lemburg [EMAIL PROTECTED] wrote: These discussion pop up every year or so and I think that most of them are not really all that necessary, since the GIL isn't all that bad. Thing is, if the topic keeps coming up, then that may be an indicator that change is truly needed. Someone much wiser than me once shared that a measure of the usefulness and quality of a package (or API) is how easily it can be added to an application--of any flavors--without the application needing to change. So in the rising world of idle cores and worker threads, I do see an increasing concern over the GIL. Although I recognize that the debate is lengthy, heated, and has strong arguments on both sides, my reading on the issue makes me feel like there's a bias for the pro-GIL side because of the volume of design and coding work associated with considering various alternatives (such as Glenn's Py* concepts). And I DO respect and appreciate where the pro-GIL people come from: who the heck wants to do all that work and recoding so that a tiny percent of developers can benefit? And my best response is that as unfortunate as it is, python needs to be more multi-threaded app- friendly if we hope to attract the next generation of app developers that want to just drop python into their app (and not have to change their app around python). For example, Lua has that property, as evidenced by its rapidly growing presence in commercial software (Blizzard uses it heavily, for example). Furthermore, there are lots of ways to tune the CPython VM to make it more or less responsive to thread switches via the various sys.set*() functions in the sys module. Most computing or I/O intense C extensions, built-in modules and object implementations already release the GIL for you, so it usually doesn't get in the way all that often. The main issue I take there is that it's often highly useful for C modules to make subsequent calls back into the interpreter. I suppose the response to that is to call the GIL before reentry, but it just seems to be more code and responsibility in scenarios where it's no necessary. Although that code and protocol may come easy to veteran CPython developers, let's not forget that an important goal is to attract new developers and companies to the scene, where they get their thread-independent code up and running using python without any unexpected reengineering. Again, why are companies choosing Lua over Python when it comes to an easy and flexible drop-in interpreter? And please take my points here to be exploratory, and not hostile or accusatory, in nature. Andy Okay, here's the bottom line: * This is not about the GIL. This is about *completely* isolated interpreters; most of the time when we want to remove the GIL we want a single interpreter with lots of shared data. * Your use case, although not common, is not extraordinarily rare either. It'd be nice to support. * If CPython had supported it all along we would continue to maintain it. * However, since it's not supported today, it's not worth the time invested, API incompatibility, and general breakage it would imply. * Although it's far more work than just solving your problem, if I were to remove the GIL I'd go all the way and allow shared objects. So there's really only two options here: * get a short-term bodge that works, like hacking the 3rd party library to use your shared-memory allocator. Should be far less work than hacking all of CPython. * invest yourself in solving the *entire* problem (GIL removal with shared python objects). -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Because then we're back into the GIL not permitting threads efficient core use on CPU bound scripts running on other threads (when they otherwise could). Why do you think so? For C code that is carefully written, the GIL allows *very well* to write CPU bound scripts running on other threads. (please do get back to Jesse's original remark in case you have lost the thread :-) An example would be python scripts that generate video programatically using an initial set of params and use an in-house C module to construct frame (which in turn make and modify python C objects that wrap to intricate codec related data structures). Suppose you wanted to render 3 of these at the same time, one on each thread (3 threads). With the GIL in place, these threads can't anywhere close to their potential. Your response thus far is that the C module should release the GIL before it commences its heavy lifting. Well, the problem is that if during its heavy lifting it needs to call back into its interpreter. So it should reacquire the GIL then. Assuming the other threads all do their heavy lifting, it should immediately get the GIL, fetch some data, release the GIL, and continue to do heavy lifting. If it's truly CPU-bound, I hope it doesn't spend most of its time in Python API, but in true computation. It's turns out that this isn't an exotic case at all: there's a *ton* of utility gained by making calls back into the interpreter. The best example is that since code more easily maintained in python than in C, a lot of the module utility code is likely to be in python. You should really reconsider writing performance-critical code in Python. Regardless of the issue under discussion, a lot of performance can be gained by using flattened data structures, less pointer, less reference counting, less objects, and so on - in the inner loops of the computation. You didn't reveal what *specific* computation you perform, so it's difficult to give specific advise. Unsurprisingly, this is the situation myself and many others are in: where we want to subsequently use the interpreter within the C module (so, as I understand it, the proposal to have the C module release the GIL unfortunately doesn't work as a general solution). Not if you do the actual computation in Python, no. However, this subthread started with Jesse's remark that you *can* release the GIL in C code. Again, if you do heavy-lifting in Python, you should consider to rewrite the performance-critical parts in C. You may find that the need for multiple CPUs goes even away. I appreciate your arguments these a PyC concept is a lot of work with some careful design work, but let's not kill the discussion just because of that. Any discussion in this newsgroup is futile, except when it either a) leads to a solution that is already possible, and the OP didn't envision, or b) is followed up by code contributions from one of the participants. If neither is likely to result, killing the discussion is the most productive thing we can do. Regards, Maritn -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? Wrong. [...] So I think the disconnect here is that maybe you're envisioning threads being created *in* python. To be clear, we're talking out making threads at the app level and making it a given for the app to take its safety in its own hands. No. Whether or not threads are created by Python or the application does not matter for my Wrong evaluation: in either case, C module execution can easily side-step/release the GIL. As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space. That's not true. Well, when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Otherwise, please describe in detail how I'd get an opaque OS object (e.g. an OS ref that refers to memory- resident video) from the child process back to the parent process. WHAT PARENT PROCESS? In the same address space, to me, means a single process only, not multiple processes, and no parent process anywhere. If you have just multiple threads, the notion of passing data from a child process back to the parent process is meaningless. Again, the big picture that I'm trying to plant here is that there really is a serious need for truly independent interpreters/contexts in a shared address space. I understand that this is your mission in this thread. However, why is that your problem? Why can't you just use the existing (limited) multiple-interpreters machinery, and solve your problems with that? For most industry-caliber packages, the expectation and convention (unless documented otherwise) is that the app can make as many contexts as its wants in whatever threads it wants because the convention is that the app is must (a) never use one context's objects in another context, and (b) never use a context at the same time from more than one thread. That's all I'm really trying to look at here. And that's indeed the case for Python, too. The app can make as many subinterpreters as it wants to, and it must not pass objects from one subinterpreter to another one, nor should it use a single interpreter from more than one thread (although that is actually supported by Python - but it surely won't hurt if you restrict yourself to a single thread per interpreter). Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space. That's not true. Um... So let's say you have a opaque object ref from the OS that represents hundreds of megs of data (e.g. memory-resident video). How do you get that back to the parent process without serialization and IPC? What parent process? I thought you were talking about multi-threading? What should really happen is just use the same address space so just a pointer changes hands. THAT's why I'm saying that a separate address space is generally a deal breaker when you have large or intricate data sets (ie. when performance matters). Right. So use a single address space, multiple threads, and perform the heavy computations in C code. I don't see how Python is in the way at all. Many people do that, and it works just fine. That's what Jesse (probably) meant with his remark A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. Please reconsider this; it might be a solution to your problem. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re:: 2.6, 3.0, and truly independent intepreters
Andy O'Meara Wrote: Um... So let's say you have a opaque object ref from the OS that represents hundreds of megs of data (e.g. memory-resident video). How do you get that back to the parent process without serialization and IPC? What should really happen is just use the same address space so just a pointer changes hands. THAT's why I'm saying that a separate address space is generally a deal breaker when you have large or intricate data sets (ie. when performance matters). You can try to assign the buffer in the shared memory space, that can be managed by Nikita the Spider's shm module. Then you can implement what would be essentially a systolic array structure, passing the big buffer along to the processes who may, or may not, be running on different processors, to do whatever magic each process has to do, to complete the whole transformation. (filter, fft, decimation, compression, mpeg, whatever...) aside This may be faster than forking an OS thread - don't subprocesses get a COPY of the parent's environment? \aside But this will give you only one process running at a time, as you can't do stuff simultaneously to the same data. So you will need to split a real big ram area into your big buffers so that each of the processes you contemplate running seperately can be given one 100 M area (out of the shared big one) to own to do its magic on. When it is finished, it passes the ownership back, and the block is assigned to the next process in the sequence, while a new block from the OS is assigned to the first process, and so on. So you still have shared ram IPC, but there is no serialisation. And you don't move the data, unless you want to. You can update or twiddle in place. Its the serialisation that kills the performance. And the pointers can be passed by the same mechanism, if I understand what shm does after a quick look. So you can build a real ripsnorter - it rips this, while it snorts the previous and tears the antepenultimate... - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Grrr... I posted a ton of lengthy replies to you and other recent posts here using Google and none of them made it, argh. Poof. There's nothing that fires more up more than lost work, so I'll have to revert short and simple answers for the time being. Argh, damn. On Oct 25, 1:26 am, greg [EMAIL PROTECTED] wrote: Andy O'Meara wrote: I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. Moreover, I think this is probably the *only* way that totally independent interpreters could be realized. Converting the whole C API to use this strategy would be a very big project. Also, on the face of it, it seems like it would render all existing C extension code obsolete, although it might be possible to do something clever with macros to create a compatibility layer. Another thing to consider is that passing all these extra pointers around everywhere is bound to have some effect on performance. I'm with you on all counts, so no disagreement there. On the passing a ptr everywhere issue, perhaps one idea is that all objects could have an additional field that would point back to their parent context (ie. their interpreter). So the only prototypes that would have to be modified to contain the context ptr would be the ones that don't inherently operate on objects (e.g. importing a module). On Oct 25, 1:54 am, greg [EMAIL PROTECTED] wrote: Andy O'Meara wrote: - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. I hope you realize that starting up one of these interpreters is going to be fairly expensive. It will have to create its own versions of all the builtin constants and type objects, and import its own copy of all the modules it uses. Yeah, for sure. And I'd say that's a pretty well established convention already out there for any industry package. The pattern I'd expect to see is where the app starts worker threads, starts interpreters in one or more of each, and throws jobs to different ones (and the interpreter would persist to move on to subsequent jobs). One wonders if it wouldn't be cheaper just to fork the process. Shared memory can be used to transfer large lumps of data if needed. As I mentioned, wen you're talking about intricate data structures, OS opaque objects (ie. that have their own internal allocators), or huge data sets, even a shared memory region unfortunately can't fit the bill. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? Wrong. Let's take a step back and remind ourselves of the big picture. The goal is to have independent interpreters running in pthreads that the app starts and controls. Each interpreter never at any point is doing any thread-related stuff in any way. For example, each script job just does meat an potatoes CPU work, using callbacks that, say, programatically use OS APIs to edit and transform frame data. So I think the disconnect here is that maybe you're envisioning threads being created *in* python. To be clear, we're talking out making threads at the app level and making it a given for the app to take its safety in its own hands. As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space. That's not true. Well, when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Otherwise, please describe in detail how I'd get an opaque OS object (e.g. an OS ref that refers to memory- resident video) from the child process back to the parent process. Again, the big picture that I'm trying to plant here is that there really is a serious need for truly independent interpreters/contexts in a shared address space. Consider stuff like libpng, zlib, ipgjpg, or whatever, the use pattern is always the same: make a context object, do your work in the context, and take it down. For most industry-caliber packages, the expectation and convention (unless documented otherwise) is that the app can make as many contexts as its wants in whatever threads it wants because the convention is that the app is must (a) never use one context's objects in another context, and (b) never use a context at the same time from more than one thread. That's all I'm really trying to look at here. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
And in the case of hundreds of megs of data ... and I would be surprised at someone that would embed hundreds of megs of data into an object such that it had to be serialized... seems like the proper design is to point at the data, or a subset of it, in a big buffer. Then data transfers would just transfer the offset/length and the reference to the buffer. and/or thousands of data structure instances, ... and this is another surprise! You have thousands of objects (data structure instances) to move from one thread to another? I think we miscommunicated there--I'm actually agreeing with you. I was trying to make the same point you were: that intricate and/or large structures are meant to be passed around by a top-level pointer, not using and serialization/messaging. This is what I've been trying to explain to others here; that IPC and shared memory unfortunately aren't viable options, leaving app threads (rather than child processes) as the solution. Of course, I know that data get large, but typical multimedia streams are large, binary blobs. I was under the impression that processing them usually proceeds along the lines of keeping offsets into the blobs, and interpreting, etc. Editing is usually done by making a copy of a blob, transforming it or a subset in some manner during the copy process, resulting in a new, possibly different-sized blob. Your instincts are right. I'd only add on that when you're talking about data structures associated with an intricate video format, the complexity and depth of the data structures is insane -- the LAST thing you want to burn cycles on is serializing and unserializing that stuff (so IPC is out)--again, we're already on the same page here. I think at one point you made the comment that shared memory is a solution to handle large data sets between a child process and the parent. Although this is certainty true in principle, it doesn't hold up in practice since complex data structures often contain 3rd party and OS API objects that have their own allocators. For example, in video encoding, there's TONS of objects that comprise memory-resident video from all kinds of APIs, so the idea of having them allocated from shared/mapped memory block isn't even possible. Again, I only raise this to offer evidence that doing real-world work in a child process is a deal breaker--a shared address space is just way too much to give up. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara [EMAIL PROTECTED] wrote: I think we miscommunicated there--I'm actually agreeing with you. I was trying to make the same point you were: that intricate and/or large structures are meant to be passed around by a top-level pointer, not using and serialization/messaging. This is what I've been trying to explain to others here; that IPC and shared memory unfortunately aren't viable options, leaving app threads (rather than child processes) as the solution. Andy, Why don't you just use a temporary file system (ram disk) to store the data that your app is manipulating. All you need to pass around then is a file descriptor. --JamesMills -- -- -- Problems are solved by method -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. I hope you realize that starting up one of these interpreters is going to be fairly expensive. It will have to create its own versions of all the builtin constants and type objects, and import its own copy of all the modules it uses. One wonders if it wouldn't be cheaper just to fork the process. Shared memory can be used to transfer large lumps of data if needed. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn Linderman wrote: If Py_None corresponds to None in Python syntax ... then it is a fixed constant and could be left global, probably. No, it couldn't, because it's a reference-counted object like any other Python object, and therefore needs to be protected against simultaneous refcount manipulation by different threads. So each interpreter would need its own instance of Py_None. The same goes for all the other built-in constants and type objects -- there are dozens of these. The cost is one more push on every function call, Which sounds like it could be a rather high cost! If (just a wild guess) each function has an average of 2 parameters, then this is increasing the amount of argument pushing going on by 50%... On many platforms, there is the concept of TLS, or thread-local storage. That's another possibility, although doing it that way would require you to have a separate thread for each interpreter, which you mightn't always want. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. Have you considered using shared memory? Using mmap or equivalent, you can arrange for a block of memory to be shared between processes. Then you can dump the big lump of data to be transferred in there, and send a short message through a pipe to the other process to let it know it's there. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Rhamphoryncus wrote: A list is not shareable, so it can only be used within the monitor it's created within, but the list type object is shareable. Type objects contain dicts, which allow arbitrary values to be stored in them. What happens if one thread puts a private object in there? It becomes visible to other threads using the same type object. If it's not safe for sharing, bad things happen. Python's data model is not conducive to making a clear distinction between private and shared objects, except at the level of an entire interpreter. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
If Py_None corresponds to None in Python syntax (sorry I'm not familiar with Python internals yet; glad you are commenting, since you are), then it is a fixed constant and could be left global, probably. If None remains global, then type(None) also remains global, and type(None),__bases__[0]. Then type(None).__bases__[0].__subclasses__() will yield interesting results. This is essentially the status quo. But if we want a separate None for each interpreter, or if we just use Py_None as an example global variable to use to answer the question then here goes There are a number of problems with that approach. The biggest one is that it is theoretical. Of course I'm aware of thread-local variables, and the abstract possibility of collecting all global variables in a single data structure (in fact, there is already an interpreter structure and per-interpreter state in Python). I wasn't claiming that it was impossible to solve that problem - just that it is not simple. If you want to find out what all the problems are, please try implementing it for real. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Hi Andy, Andy wrote: However, we require true thread/interpreter independence so python 2 has been frustrating at time, to say the least. Please don't start with but really, python supports multiple interpreters because I've been there many many times with people. And, yes, I'm aware of the multiprocessing module added in 2.6, but that stuff isn't lightweight and isn't suitable at all for many environments (including ours). This is a very conflicting set of statements and whilst you appear to be extremely clear on what you want here, and why multiprocessing, and associated techniques are not appropriate, this does sound very conflicting. I'm guessing I'm not the only person who finds this a little odd. Based on the size of the thread, having read it all, I'm guessing also that you're not going to have an immediate solution but a work around. However, also based on reading it, I think it's a usecase that would be generally useful in embedding python. So, I'll give it a stab as to what I think you're after. The scenario as I understand it is this: * You have an application written in C,C++ or similar. * You've been providing users the ability to script it or customise it in some fashion using scripts. Based on the conversation: * This worked well, and you really liked the results, but... * You only had one interpreter embedded in the system * You were allowing users to use multiple scripts Suddenly you go from: Single script, single memory space. To multiple scripts, unconstrained shared shared memory space. That then causes pain for you and your users. So as a result, you decided to look for this scenario: * A mechanism that allows each script to think it's the only script running on the python interpreter. * But to still have only one embedded instance of the interpreter. * With the primary motivation to eliminate the unconstrained shared memory causing breakage to your software. So, whilst the multiprocessing module gives you this: * With the primary motivation to eliminate the unconstrained shared memory causing breakage to your software. It's (for whatever reason) too heavyweight for you, due to the multiprocess usage. At a guess the reason for this is because you allow the user to run lots of these little scripts. Essentially what this means is that you want green processes. One workaround of achieving that may be to find a way to force threads in python to ONLY be allowed access to (and only update) thread local values, rather than default to shared values. The reason I say that, is because the closest you get to green processes in python at the moment is /inside/ a python generator. It's nowhere near the level you want, but it's what made me think of the idea of green processes. Specifically if you have the canonical example of a python generator: def fib(): a,b = 1,1 while 1: a,b = b, a+b yield 1 Then no matter how many times I run that, the values are local, and can't impact each other. Now clearly this isn't what you want, but on some level it's *similar*. You want to be able to do: run(this_script) and then when (this_script) is running only use a local environment. Now, if you could change the threading API, such that there was a means of forcing all value lookups to look in thread local store before looking outside the thread local store [1], then this would give you a much greater level of safety. [1] I don't know if there is or isn't I've not been sufficiently interested to look... I suspect that this would also be a very nice easy win for many multi-threaded applications as well, reducing accidental data sharing. Indeed, reversing things such that rather than doing this: myLocal = threading.local() myLocal.X = 5 Allowing a thread to force the default to be the other way round: systemGlobals = threading.globals() systemGlobals = 5 Would make a big difference. Furthermore, it would also mean that the following: import MyModule from MyOtherModule import whizzy thing I don't know if such a change would be sufficient to stop the python interpreter going bang for extension modules though :-) I suspect also that this change, whilst potentially fraught with difficulties, would be incredibly useful in python implementations that are GIL-free (such as Jython or IronPython) Now, this for me is entirely theoretical because I don't know much about python's threading implementation (because I've never needed to), but it does seem to me to be the easier win than looking for truly independent interpreters... It would also be more generally useful, since it would make accidental sharing of data (which is where threads really hurt people most) much harder. Since it was raised in the thread, I'd like to say use Kamaelia, but your usecase is slightly different as I understand it. You want to take existing stuff that won't be written in any particular way, to
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: Yeah, that's the idea--let the highest levels run and coordinate the show. Yes, this works really well in python and it's lots of fun. We've found so far you need at minimum the following parts to a co-ordination little language: Pipeline Graphline Carousel Seq OneShot PureTransformer TPipe Filter Backplane PublishTo SubscribeTo The interesting thing to me about this is in most systems these would be patterns of behaviour in activities, whereas in python/kamaelia these are concrete things you can drop things into. As you'd expect this all becomes highly declarative. In practice the world is slightly messier than a theoretical document would like to suggest, primarily because if you consider things like pygame, sometimes you have only have a resource instantiated once in a single process. So you do need a mechanism for advertising services inside a process and looking those up. (The Backplane idea though helps with wrapping those up a lot I admit, for certain sorts of service :) And sometimes you do need to just share data, and when you do that's when STM is useful. But concurrent python systems are fun to build :-) Michael. -- http://www.kamaelia.org/GetKamaelia -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn Linderman wrote: In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? If the poshmodule had a bit of TLC, it would be extremely useful for this, since it does (surprisingly) still work with python 2.5, but does need a bit of TLC to make it usable. http://poshmodule.sourceforge.net/ Michael -- http://www.kamaelia.org/GetKamaelia -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: basically, it seems that we're talking about the embarrassingly parallel scenario raised in that paper We build applications in Kamaelia and then discover afterwards that they're embarrassingly parallel and just work. (we have an introspector that can look inside running systems and show us the structure that's going on - very useful for debugging) My current favourite example of this is a tool created to teaching small children to read and write: http://www.kamaelia.org/SpeakAndWrite Uses gesture recognition and speech synthesis, has a top level view of around 15 concurrent components, with signficant numbers of nested ones. (OK, that's not embarrasingly parallel since it's only around 50 things, but the whiteboard with around 200 concurrent things, is) The trick is to stop viewing concurrency as the problem, but to find a way to use it as a tool for making it easier to write code. That program was a 10 hour or so hack. You end up focussing on the problem you want to solve, and naturally gain a concurrent friendly system. Everything else (GIL's, shared memory etc) then just becomes an optimisation problem - something only to be done if you need it. My previous favourite examples were based around digital TV, or user generated content transcode pipelines. My reason for preferring the speak and write at the moment is because its a problem you wouldn't normally think of as benefitting from concurrency, when in this case it benefitted by being made easier to write in the first place. Regards, Michael -- http://www.kamaelia.org/GetKamaelia -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Jesse Noller wrote: http://www.kamaelia.org/Home Thanks for the mention :) I don't think it's a good fit for the original poster's question, but a solution to the original poster's question would be generally useful IMO, _especially_ on python implementations without a GIL (where threads are the more natural approach to using multiple processes multiple processors). The approach I think would be useful would perhaps by allowing python to have some concept of green processes - that is threads that can only see thread local values or they search/update thread local space before checking globals, ie flipping X = threading.local() X.foo = bar To something like: X = greenprocesses.shared() X.foo = bar Or even just changing the search for values from: * Search local context * Search global context To: * Search thread local context * Search local context * Search global context Would probably be quite handy, and eliminate whole classes of bugs for people using threads. (Probably introduce all sorts of new ones of course, but perhaps easier to isolate ones) However, I suspect this is also *a lot* easier to say than to implement :-) (that said, I did hack on the python internals once (cf pep 318) so it might be quite pleasant to try) It's also independent of any discussions regarding the GIL of course since it would just make life generally safer for people. BTW, regarding Kamaelia - regarding something you said on your blog - whilst the components list on /Components looks like a large amount of extra stuff you have to comprehend to use, you don't. (The interdependency between components is actually very low.) The core that someone needs to understand is the contents of this: http://www.kamaelia.org/MiniAxon/ Which is sufficient to get someone started. (based on testing with a couple of dozen novice developers now :) If someone doesn't want to rewrite their app to be kamaelia based, they can cherry pick stuff, by running kamaelia's scheduler in the background and using components in a file-handle like fashion: * http://www.kamaelia.org/AxonHandle The reason /Components contains all those things isn't because we're trying to make it into a swiss army knife, it's because it's been useful in domains that have generated those components which are generally reusable :-) Michael. -- http://www.kamaelia.org/GetKamaelia -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
These discussion pop up every year or so and I think that most of them are not really all that necessary, since the GIL isn't all that bad. Some pointers into the past: * http://effbot.org/pyfaq/can-t-we-get-rid-of-the-global-interpreter-lock.htm Fredrik on the GIL * http://mail.python.org/pipermail/python-dev/2000-April/003605.html Greg Stein's proposal to move forward on free threading * http://www.sauria.com/~twl/conferences/pycon2005/20050325/Python%20at%20Google.notes (scroll down to the QA section) Greg Stein on whether the GIL really does matter that much Furthermore, there are lots of ways to tune the CPython VM to make it more or less responsive to thread switches via the various sys.set*() functions in the sys module. Most computing or I/O intense C extensions, built-in modules and object implementations already release the GIL for you, so it usually doesn't get in the way all that often. So you have the option of using a single process with multiple threads, allowing efficient sharing of data. Or you use multiple processes and OS mechanisms to share data (shared memory, memory mapped files, message passing, pipes, shared file descriptors, etc.). Both have their pros and cons. There's no general answer to the problem of how to make best use of multi-core processors, multiple linked processors or any of the more advanced parallel processing mechanisms (http://en.wikipedia.org/wiki/Parallel_computing). The answers will always have to be application specific. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn Linderman wrote: On approximately 10/24/2008 8:39 PM, came the following characters from the keyboard of Terry Reedy: Glenn Linderman wrote: For example, Python presently has a rather stupid algorithm for string concatenation. Yes, CPython2.x, x=5 did. Python the language has syntax and semantics. Python implementations have algorithms that fulfill the defined semantics. I can buy that, but when Python is not qualified, CPython should be assumed, as it predominates. People do that, and it sometimes leads to unnecessary confusion. As to the present discussion, is it about * changing Python, the language * changing all Python implementations * changing CPython, the leading implementation * branching CPython with a compiler switch, much as there was one for including Unicode or not. * forking CPython * modifying an existing module * adding a new module * making better use of the existing facilities * some combination of the above Of course, the latest official release should probably also be assumed, but that is so recent, People do that, and it sometimes leads to unnecessary confusion. People routine posted version specific problems and questions without specifying the version (or platform when relevant). In a month or so, there will be *2* latest official releases. There will be more confusion without qualification. few have likely upgraded as yet... I should have qualified the statement. * Is the target of this discussion 2.7 or 3.1 (some changes would be 3.1 only). [diversion to the side topic] If there is more than one reference to a guaranteed immutable object, such as a string, the 'stupid' algorithm seem necessary to me. In-place modification of a shared immutable would violate semantics. Absolutely. But after the first iteration, there is only one reference to string. Which is to say, 'string' is the only reference to its object it refers too. You are right, so I presume that the optimization described would then kick in. But I have not read the code, and CPython optimizations are not part of the *language* reference. [back to the main topic] There is some discussion/debate/confusion about how much of the stdlib is 'standard Python library' versus 'standard CPython library'. [And there is some feeling that standard Python modules should have a default Python implementation that any implementation can use until it optionally replaces it with a faster compiled version.] Hence my question about the target of this discussion and the first three options listed above. Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 25, 2008, at 7:53 AM, Michael Sparks wrote: Glenn Linderman wrote: In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? If the poshmodule had a bit of TLC, it would be extremely useful for this, since it does (surprisingly) still work with python 2.5, but does need a bit of TLC to make it usable. http://poshmodule.sourceforge.net/ Last time I checked that was Windows-only. Has that changed? The only IPC modules for Unix that I'm aware of are one which I adopted (for System V semaphores shared memory) and one which I wrote (for POSIX semaphores shared memory). http://NikitaTheSpider.com/python/shm/ http://semanchuk.com/philip/posix_ipc/ If anyone wants to wrap POSH cleverness around them, go for it! If not, maybe I'll make the time someday. Cheers Philip -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
There are a number of problems with that approach. The biggest one is that it is theoretical. Not theoretical. Used successfully in Perl. Perhaps it is indeed what Perl does, I know nothing about that. However, it *is* theoretical for Python. Please trust me that there are many many many many pitfalls in it, each needing a separate solution, most likely with no equivalent in Perl. If you had a working patch, *then* it would be practical. Granted Perl is quite a different language than Python, but then there are some basic similarities in the concepts. Yes - just as much as both are implemented in C :-( Perhaps you should list the problems, instead of vaguely claiming that there are a number of them. Hard to respond to such a vague claim. As I said: go implement it, and you will find out. Unless you are really going at an implementation, I don't want to spend my time explaining it to you. But the approach is sound; nearly any monolithic program can be turned into a multithreaded program containing one monolith per thread using such a technique. I'm not debating that. I just claim that it is far from simple. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? Wrong. (even if the interpreter is 100% independent, yikes). Again, wrong. For example, have a python C module designed to programmatically generate images (and video frames) in RAM for immediate and subsequent use in animation. Meanwhile, we'd like to have a pthread with its own interpreter with an instance of this module and have it dequeue jobs as they come in (in fact, there'd be one of these threads for each excess core present on the machine). I don't understand how this example involves multiple threads. You mention a single thread (running the module), and you mention designing a module. Where is the second thread? Glenn seems to be following me here... The point is to have any many threads as the app wants, each in it's own world, running without restriction (performance wise). Maybe the app wants to run a thread for each extra core on the machine. Perhaps the disconnect here is that when I've been saying start a thread, I mean the app starts an OS thread (e.g. pthread) with the given that any contact with other threads is managed at the app level (as opposed to starting threads through python). So, as far as python knows, there's zero mention or use of threading in any way, *anywhere*. As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space. That's not true. Um... So let's say you have a opaque object ref from the OS that represents hundreds of megs of data (e.g. memory-resident video). How do you get that back to the parent process without serialization and IPC? What should really happen is just use the same address space so just a pointer changes hands. THAT's why I'm saying that a separate address space is generally a deal breaker when you have large or intricate data sets (ie. when performance matters). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:40 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: It seems to me that the very simplest move would be to remove global static data so the app could provide all thread-related data, which Andy suggests through references to the QuickTime API. This would suggest compiling python without thread support so as to leave it up to the application. I'm not sure whether you realize that this is not simple at all. Consider this fragment if (string == Py_None || index = state-lastmark || !state-mark[index] || !state-mark[index+1]) { if (empty) /* want empty string */ i = j = 0; else { Py_INCREF(Py_None); return Py_None; The way to think about is that, ideally in PyC, there are never any global variables. Instead, all globals are now part of a context (ie. a interpreter) and it would presumably be illegal to ever use them in a different context. I'd say this is already the expectation and convention for any modern, industry-grade software package marketed as extension for apps. Industry app developers just want to drop in a 3rd party package, make as many contexts as they want (in as many threads as they want), and expect to use each context without restriction (since they're ensuring contexts never interact with each other). For example, if I use zlib, libpng, or libjpg, I can make as many contexts as I want and put them in whatever threads I want. In the app, the only thing I'm on the hook for is to: (a) never use objects from one context in another context, and (b) ensure that I'm never make any calls into a module from more than one thread at the same time. Both of these requirements are trivial to follow in the embarrassingly easy parallelization scenarios, and that's why I started this thread in the first place. :^) Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 10:24 pm, Glenn Linderman [EMAIL PROTECTED] wrote: And in the case of hundreds of megs of data ... and I would be surprised at someone that would embed hundreds of megs of data into an object such that it had to be serialized... seems like the proper design is to point at the data, or a subset of it, in a big buffer. Then data transfers would just transfer the offset/length and the reference to the buffer. and/or thousands of data structure instances, ... and this is another surprise! You have thousands of objects (data structure instances) to move from one thread to another? Heh, no, we're actually in agreement here. I'm saying that in the case where the data sets are large and/or intricate, a single top- level pointer changing hands is *always* the way to go rather than serialization. For example, suppose you had some nifty python code and C procs that were doing lots of image analysis, outputting tons of intricate and rich data structures. Once the thread is done with that job, all that output is trivially transferred back to the appropriate thread by a pointer changing hands. Of course, I know that data get large, but typical multimedia streams are large, binary blobs. I was under the impression that processing them usually proceeds along the lines of keeping offsets into the blobs, and interpreting, etc. Editing is usually done by making a copy of a blob, transforming it or a subset in some manner during the copy process, resulting in a new, possibly different-sized blob. No, you're definitely right-on, with the the additional point that the representation of multimedia usually employs intricate and diverse data structures (imagine the data structure representation of a movie encoded in modern codec, such as H.264, complete with paths, regions, pixel flow, geometry, transformations, and textures). As we both agree, that's something that you *definitely* want to move around via a single pointer (and not in a serialized form). Hence, my position that apps that use python can't be forced to go through IPC or else: (a) there's a performance/resource waste to serialize and unserialize large or intricate data sets, and (b) they're required to write and maintain serialization code that otherwise doesn't serve any other purpose. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. Moreover, I think this is probably the *only* way that totally independent interpreters could be realized. Converting the whole C API to use this strategy would be a very big project. Also, on the face of it, it seems like it would render all existing C extension code obsolete, although it might be possible to do something clever with macros to create a compatibility layer. Another thing to consider is that passing all these extra pointers around everywhere is bound to have some effect on performance. Good points--I would agree with you on all counts there. On the passing a context everywhere performance hit, perhaps one idea is that all objects could have an additional field that would point back to their parent context (ie. their interpreter). So the only prototypes that would have to be modified to contain the context ptr would be the ones that inherently don't take any objects. This would conveniently and generally correspond to procs associated with interpreter control (e.g. importing modules, shutting down modules, etc). Andy O'Meara wrote: - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. I hope you realize that starting up one of these interpreters is going to be fairly expensive. Absolutely. I had just left that issue out in an effort to keep the discussion pointed, but it's a great point to raise. My response is that, like any 3rd party industry package, I'd say this is the expectation (that context startup and shutdown is non-trivial and to should be minimized for performance reasons). For simplicity, my examples didn't talk about this issue but in practice, it'd be typical for apps to have their worker interpreters persist as they chew through jobs. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 25, 12:29 am, greg [EMAIL PROTECTED] wrote: Rhamphoryncus wrote: A list is not shareable, so it can only be used within the monitor it's created within, but the list type object is shareable. Type objects contain dicts, which allow arbitrary values to be stored in them. What happens if one thread puts a private object in there? It becomes visible to other threads using the same type object. If it's not safe for sharing, bad things happen. Python's data model is not conducive to making a clear distinction between private and shared objects, except at the level of an entire interpreter. shareable type objects (enabled by a __future__ import) use a shareddict, which requires all keys and values to themselves be shareable objects. Although it's a significant semantic change, in many cases it's easy to deal with: replace mutable (unshareable) global constants with immutable ones (ie list - tuple, set - frozenset). If you've got some global state you move it into a monitor (which doesn't scale, but that's your design). The only time this really fails is when you're deliberately storing arbitrary mutable objects from any thread, and later inspecting them from any other thread (such as our new ABC system's cache). If you want to store an object, but only to give it back to the original thread, I've got a way to do that. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn Linderman wrote: On approximately 10/25/2008 12:01 AM, came the following characters from the keyboard of Martin v. Löwis: If None remains global, then type(None) also remains global, and type(None),__bases__[0]. Then type(None).__bases__[0].__subclasses__() will yield interesting results. This is essentially the status quo. I certainly don't grok the implications of what you say above, as I barely grok the semantics of it. Not only is there a link from a class to its base classes, there is a link to all its subclasses as well. Since every class is ultimately a subclass of 'object', this means that starting from *any* object, you can work your way up the __bases__ chain until you get to 'object', then walk the sublass hierarchy and find every class in the system. This means that if any object at all is shared, then all class objects, and any object reachable from them, are shared as well. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy wrote: 1) Independent interpreters (this is the easier one--and solved, in principle anyway, by PEP 3121, by Martin v. Löwis Something like that is necessary for independent interpreters, but not sufficient. There are also all the built-in constants and type objects to consider. Most of these are statically allocated at the moment. 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps No, it's there because it's necessary for acceptable performance when multiple threads are running in one interpreter. Independent interpreters wouldn't mean the absence of a GIL; it would only mean each interpreter having its own GIL. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
You seem confused. PEP 3121 is for isolated interpreters (ie emulated processes), not threading. Just a small remark: this wasn't the primary objective of the PEP. The primary objective was to support module cleanup in a reliable manner, to allow eventually to get modules garbage-collected properly. However, I also kept the isolated interpreters feature in mind there. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Instead of appdomains (one interpreter per thread), or free threading, you could use multiple processes. Take a look at the new multiprocessing module in Python 2.6. It has roughly the same interface as Python's threading and queue modules, but uses processes instead of threads. Processes are scheduled independently by the operating system. The objects in the multiprocessing module also tend to have much better performance than their threading and queue counterparts. If you have a problem with threads due to the GIL, the multiprocessing module with most likely take care of it. There is a fundamental problem with using homebrew loading of multiple (but renamed) copies of PythonXX.dll that is easily overlooked. That is, extension modules (.pyd) are DLLs as well. Even if required by two interpreters, they will only be loaded into the process image once. Thus you have to rename all of them as well, or you will get havoc with refcounts. Not to speak of what will happen if a Windows HANDLE is closed by one interpreter while still needed by another. It is almost guaranteed to bite you, sooner or later. There are other options as well: - Use IronPython. It does not have a GIL. - Use Jython. It does not have a GIL. - Use pywin32 to create isolated outproc COM servers in Python. (I'm not sure what the effect of inproc servers would be.) - Use os.fork() if your platform supports it (Linux, Unix, Apple, Cygwin, Windows Vista SUA). This is the standard posix way of doing multiprocessing. It is almost unbeatable if you have a fast copy-on- write implementation of fork (that is, all platforms except Cygwin). -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:35 am, sturlamolden [EMAIL PROTECTED] wrote: Instead of appdomains (one interpreter per thread), or free threading, you could use multiple processes. Take a look at the new multiprocessing module in Python 2.6. That's mentioned earlier in the thread. There is a fundamental problem with using homebrew loading of multiple (but renamed) copies of PythonXX.dll that is easily overlooked. That is, extension modules (.pyd) are DLLs as well. Tell me about it--there's all kinds of problems and maintenance liabilities with our approach. That's why I'm here talking about this stuff. There are other options as well: - Use IronPython. It does not have a GIL. - Use Jython. It does not have a GIL. - Use pywin32 to create isolated outproc COM servers in Python. (I'm not sure what the effect of inproc servers would be.) - Use os.fork() if your platform supports it (Linux, Unix, Apple, Cygwin, Windows Vista SUA). This is the standard posix way of doing multiprocessing. It is almost unbeatable if you have a fast copy-on- write implementation of fork (that is, all platforms except Cygwin). This is discussed earlier in the thread--they're unfortunately all out. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Terry Reedy wrote: Everything in DLLs is compiled C extensions. I see about 15 for Windows 3.0. Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was inhabited by only 15 libraries? *sigh* ... although ... wait, didn't Win3.0 have more than that already? Maybe you meant Windows 1.0? SCNR-ly, Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 3:58 pm, Andy O'Meara [EMAIL PROTECTED] wrote: This is discussed earlier in the thread--they're unfortunately all out. It occurs to me that tcl is doing what you want. Have you ever thought of not using Python? That aside, the fundamental problem is what I perceive a fundamental design flaw in Python's C API. In Java JNI, each function takes a JNIEnv* pointer as their first argument. There is nothing the prevents you from embedding several JVMs in a process. Python can create embedded subinterpreters, but it works differently. It swaps subinterpreters like a finite state machine: only one is concurrently active, and the GIL is shared. The approach is fine, except it kills free threading of subinterpreters. The argument seems to be that Apache's mod_python somehow depends on it (for reasons I don't understand). -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 2:12 am, greg [EMAIL PROTECTED] wrote: Andy wrote: 1) Independent interpreters (this is the easier one--and solved, in principle anyway, by PEP 3121, by Martin v. Löwis Something like that is necessary for independent interpreters, but not sufficient. There are also all the built-in constants and type objects to consider. Most of these are statically allocated at the moment. Agreed--I was just trying to speak generally. Or, put another way, there's no hope for independent interpreters without the likes of PEP 3121. Also, as Martin pointed out, there's the issue of module cleanup some guys here may underestimate (and I'm glad Martin pointed out the importance of it). Without the module cleanup, every time a dynamic library using python loads and unloads you've got leaks. This issue is a real problem for us since our software is loaded and unloaded many many times in a host app (iTunes, WMP, etc). I hadn't raised it here yet (and I don't want to turn the discussion to this), but lack of multiple load and unload support has been another painful issue that we didn't expect to encounter when we went with python. 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps No, it's there because it's necessary for acceptable performance when multiple threads are running in one interpreter. Independent interpreters wouldn't mean the absence of a GIL; it would only mean each interpreter having its own GIL. I see what you're saying, but let's note that what you're talking about at this point is an interpreter containing protection from the client level violating (supposed) direction put forth in python multithreaded guidelines. Glenn Linderman's post really gets at what's at hand here. It's really important to consider that it's not a given that python (or any framework) has to be designed against hazardous use. Again, I refer you to the diagrams and guidelines in the QuickTime API: http://developer.apple.com/technotes/tn/tn2125.html They tell you point-blank what you can and can't do, and it's that's simple. Their engineers can then simply create the implementation around those specs and not weigh any of the implementation down with sync mechanisms. I'm in the camp that simplicity and convention wins the day when it comes to an API. It's safe to say that software engineers expect and assume that a thread that doesn't have contact with other threads (except for explicit, controlled message/object passing) will run unhindered and safely, so I raise an eyebrow at the GIL (or any internal helper sync stuff) holding up an thread's performance when the app is designed to not need lower-level global locks. Anyway, let's talk about solutions. My company looking to support python dev community endeavor that allows the following: - an app makes N worker threads (using the OS) - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. Generally, we're talking about CPU-bound work here. - each interpreter has the essentials (e.g. math support, string support, re support, and so on -- I realize this is open-ended, but work with me here). Let's guesstimate about what kind of work we're talking about here and if this is even in the realm of possibility. If we find that it *is* possible, let's figure out what level of work we're talking about. From there, I can get serious about writing up a PEP/spec, paid support, and so on. Regards, Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
That aside, the fundamental problem is what I perceive a fundamental design flaw in Python's C API. In Java JNI, each function takes a JNIEnv* pointer as their first argument. There is nothing the prevents you from embedding several JVMs in a process. Python can create embedded subinterpreters, but it works differently. It swaps subinterpreters like a finite state machine: only one is concurrently active, and the GIL is shared. Bingo, it seems that you've hit it right on the head there. Sadly, that's why I regard this thread largely futile (but I'm an optimist when it comes to cool software communities so here I am). I've been afraid to say it for fear of getting mauled by everyone here, but I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. *winces* This is discussed earlier in the thread--they're unfortunately all out. It occurs to me that tcl is doing what you want. Have you ever thought of not using Python? Bingo again. Our research says that the options are tcl, perl (although it's generally untested and not recommended by the community--definitely dealbreakers for a commercial user like us), and lua. Also, I'd rather saw off my own right arm than adopt perl, so that's out. :^) As I mentioned, we're looking to either (1) support a python dev community effort, (2) make our own high-performance python interpreter (that uses an env object as you described), or (3) drop python and go to lua. I'm favoring them in the order I list them, but the more I discuss the issue with folks here, the more people seem to be unfortunately very divided on (1). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
I'm not finished reading the whole thread yet, but I've got some things below to respond to this post with. On Thu, Oct 23, 2008 at 9:30 AM, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/23/2008 12:24 AM, came the following characters from the keyboard of Christian Heimes: Andy wrote: 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps (see my QuickTime API example). Perhaps if we could go back in time, we would not put the GIL in place, strict guidelines regarding multithreaded use would have been established, and PEP 3121 would have been mandatory for C modules. Then again--screw that, if I could go back in time, I'd just go for the lottery tickets!! :^) I've been following this discussion with interest, as it certainly seems that multi-core/multi-CPU machines are the coming thing, and many applications will need to figure out how to use them effectively. I'm very - not absolute, but very - sure that Guido and the initial designers of Python would have added the GIL anyway. The GIL makes Python faster on single core machines and more stable on multi core machines. Other language designers think the same way. Ruby recently got a GIL. The article http://www.infoq.com/news/2007/05/ruby-threading-futures explains the rationales for a GIL in Ruby. The article also holds a quote from Guido about threading in general. Several people inside and outside the Python community think that threads are dangerous and don't scale. The paper http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf sums it up nicely, It explains why modern processors are going to cause more and more trouble with the Java approach to threads, too. Reading this PDF paper is extremely interesting (albeit somewhat dependent on understanding abstract theories of computation; I have enough math background to follow it, sort of, and most of the text can be read even without fully understanding the theoretical abstractions). I have already heard people talking about Java applications are buggy. I don't believe that general sequential programs written in Java are any buggier than programs written in other languages... so I had interpreted that to mean (based on some inquiry) that complex, multi-threaded Java applications are buggy. And while I also don't believe that complex, multi-threaded programs written in Java are any buggier than complex, multi-threaded programs written in other languages, it does seem to be true that Java is one of the currently popular languages in which to write complex, multi-threaded programs, because of its language support for threads and concurrency primitives. These reports were from people that are not programmers, but are field IT people, that have bought and/or support software and/or hardware with drivers, that are written in Java, and seem to have non-ideal behavior, (apparently only) curable by stopping/restarting the application or driver, or sometimes requiring a reboot. The paper explains many traps that lead to complex, multi-threaded programs being buggy, and being hard to test. I have worked with parallel machines, applications, and databases for 25 years, and can appreciate the succinct expression of the problems explained within the paper, and can, from experience, agree with its premises and conclusions. Parallel applications only have been commercial successes when the parallelism is tightly constrained to well-controlled patterns that could be easily understood. Threads, especially in cooperation with languages that use memory pointers, have the potential to get out of control, in inexplicable ways. Python *must* gain means of concurrent execution of CPU bound code eventually to survive on the market. But it must get the right means or we are going to suffer the consequences. This statement, after reading the paper, seems somewhat in line with the author's premise that language acceptability requires that a language be self-contained/monolithic, and potentially sufficient to implement itself. That seems to also be one of the reasons that Java is used today for threaded applications. It does seem to be true, given current hardware trends, that _some mechanism_ must be provided to obtain the benefit of multiple cores/CPUs to a single application, and that Python must either implement or interface to that mechanism to continue to be a viable language for large scale application development. Andy seems to want an implementation of independent Python processes implemented as threads within a single address space, that can be coordinated by an outer application. This actually corresponds to the model promulgated in the paper as being most likely to succeed.
Re: 2.6, 3.0, and truly independent intepreters
Glenn, great post and points! Andy seems to want an implementation of independent Python processes implemented as threads within a single address space, that can be coordinated by an outer application. This actually corresponds to the model promulgated in the paper as being most likely to succeed. Yeah, that's the idea--let the highest levels run and coordinate the show. It does seem simpler and more efficient to simply copy data from one memory location to another, rather than send it in a message, especially if the data are large. That's the rub... In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. The same argument holds for numerical processing with large data sets. The workers handing back huge data sets via messaging isn't very attractive. One thing Andy hasn't yet explained (or I missed) is why any of his application is coded in a language other than Python. Our software runs in real time (so performance is paramount), interacts with other static libraries, depends on worker threads to perform real-time image manipulation, and leverages Windows and Mac OS API concepts and features. Python's performance hits have generally been a huge challenge with our animators because they often have to go back and massage their python code to improve execution performance. So, in short, there are many reasons why we use python as a part rather than a whole. The other area of pain that I mentioned in one of my other posts is that what we ship, above all, can't be flaky. The lack of module cleanup (intended to be addressed by PEP 3121), using a duplicate copy of the python dynamic lib, and namespace black magic to achieve independent interpreters are all examples that have made using python for us much more challenging and time-consuming then we ever anticipated. Again, if it turns out nothing can be done about our needs (which appears to be more and more like the case), I think it's important for everyone here to consider the points raised here in the last week. Moreover, realize that the python dev community really stands to gain from making python usable as a tool (rather than a monolith). This fact alone has caused lua to *rapidly* rise in popularity with software companies looking to embed a powerful, lightweight interpreter in their software. As a python language fan an enthusiast, don't let lua win! (I say this endearingly of course--I have the utmost respect for both communities and I only want to see CPython be an attractive pick when a company is looking to embed a language that won't intrude upon their app's design). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
We are in the same position as Andy here. I think that something that would help people like us produce something in code form is a collection of information outlining the problem and suggested solutions, appropriate parts of the CPython's current threading API, and pros and cons of the many various proposed solutions to the different levels of the problem. The most valuable information I've found is contained in the many (lengthy!) discussions like this one, a few related PEP's, and the CPython docs, but has anyone condensed the state of the problem into a wiki or something similar? Maybe we should start one? For example, Guido's post here http://www.artima.com/weblogs/viewpost.jsp?thread=214235describes some possible solutions to the problem, like interpreter-specific locks, or fine-grained object locks, and he also mentions the primary requirement of not harming from the performance of single-threaded apps. As I understand it, that requirement does not rule out new build configurations that provide some level of concurrency, as long as you can still compile python so as to perform as well on single-threaded apps. To add to the heap of use cases, the most important thing to us is to simple have the python language and the sip/PyQt modules available to us. All we wanted to do was embed the interpreter and language core as a local scripting engine, so had we patched python to provide concurrent execution, we wouldn't have cared about all of the other unsuppported extension modules since our scripts are quite application-specific. It seems to me that the very simplest move would be to remove global static data so the app could provide all thread-related data, which Andy suggests through references to the QuickTime API. This would suggest compiling python without thread support so as to leave it up to the application. Anyway, I'm having fun reading all of these papers and news postings, but it's true that code talks, and it could be a little easier if the state of the problems was condensed. This could be an intense and fun project, but frankly it's a little tough to keep it all in my head. Is there a wiki or something out there or should we start one, or do I just need to read more code? On Fri, Oct 24, 2008 at 6:40 AM, Andy O'Meara [EMAIL PROTECTED] wrote: On Oct 24, 2:12 am, greg [EMAIL PROTECTED] wrote: Andy wrote: 1) Independent interpreters (this is the easier one--and solved, in principle anyway, by PEP 3121, by Martin v. Löwis Something like that is necessary for independent interpreters, but not sufficient. There are also all the built-in constants and type objects to consider. Most of these are statically allocated at the moment. Agreed--I was just trying to speak generally. Or, put another way, there's no hope for independent interpreters without the likes of PEP 3121. Also, as Martin pointed out, there's the issue of module cleanup some guys here may underestimate (and I'm glad Martin pointed out the importance of it). Without the module cleanup, every time a dynamic library using python loads and unloads you've got leaks. This issue is a real problem for us since our software is loaded and unloaded many many times in a host app (iTunes, WMP, etc). I hadn't raised it here yet (and I don't want to turn the discussion to this), but lack of multiple load and unload support has been another painful issue that we didn't expect to encounter when we went with python. 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps No, it's there because it's necessary for acceptable performance when multiple threads are running in one interpreter. Independent interpreters wouldn't mean the absence of a GIL; it would only mean each interpreter having its own GIL. I see what you're saying, but let's note that what you're talking about at this point is an interpreter containing protection from the client level violating (supposed) direction put forth in python multithreaded guidelines. Glenn Linderman's post really gets at what's at hand here. It's really important to consider that it's not a given that python (or any framework) has to be designed against hazardous use. Again, I refer you to the diagrams and guidelines in the QuickTime API: http://developer.apple.com/technotes/tn/tn2125.html They tell you point-blank what you can and can't do, and it's that's simple. Their engineers can then simply create the implementation around those specs and not weigh any of the implementation down with sync mechanisms. I'm in the camp that simplicity and convention wins the day when it comes to an API. It's safe to say that software engineers expect and assume that a thread that doesn't have contact with other threads
Re: 2.6, 3.0, and truly independent intepreters
As a side note to the performance question, we are executing python code in an audio thread that is used in all of the top-end music production environments. We have found the language to perform extremely well when executed at control-rate frequency, meaning we aren't doing DSP computations, just responding to less-frequent events like user input and MIDI messages. So we are sitting this music platform with unimaginable possibilities in the music world (of which python does not play a role), but those little CPU spikes caused by the GIL at low latencies won't let us have it. AFAIK, there is no music scripting language out there that would come close, and yet we are so close! This is a big deal. On Fri, Oct 24, 2008 at 7:42 AM, Andy O'Meara [EMAIL PROTECTED] wrote: Glenn, great post and points! Andy seems to want an implementation of independent Python processes implemented as threads within a single address space, that can be coordinated by an outer application. This actually corresponds to the model promulgated in the paper as being most likely to succeed. Yeah, that's the idea--let the highest levels run and coordinate the show. It does seem simpler and more efficient to simply copy data from one memory location to another, rather than send it in a message, especially if the data are large. That's the rub... In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. The same argument holds for numerical processing with large data sets. The workers handing back huge data sets via messaging isn't very attractive. One thing Andy hasn't yet explained (or I missed) is why any of his application is coded in a language other than Python. Our software runs in real time (so performance is paramount), interacts with other static libraries, depends on worker threads to perform real-time image manipulation, and leverages Windows and Mac OS API concepts and features. Python's performance hits have generally been a huge challenge with our animators because they often have to go back and massage their python code to improve execution performance. So, in short, there are many reasons why we use python as a part rather than a whole. The other area of pain that I mentioned in one of my other posts is that what we ship, above all, can't be flaky. The lack of module cleanup (intended to be addressed by PEP 3121), using a duplicate copy of the python dynamic lib, and namespace black magic to achieve independent interpreters are all examples that have made using python for us much more challenging and time-consuming then we ever anticipated. Again, if it turns out nothing can be done about our needs (which appears to be more and more like the case), I think it's important for everyone here to consider the points raised here in the last week. Moreover, realize that the python dev community really stands to gain from making python usable as a tool (rather than a monolith). This fact alone has caused lua to *rapidly* rise in popularity with software companies looking to embed a powerful, lightweight interpreter in their software. As a python language fan an enthusiast, don't let lua win! (I say this endearingly of course--I have the utmost respect for both communities and I only want to see CPython be an attractive pick when a company is looking to embed a language that won't intrude upon their app's design). Andy -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Stefan Behnel wrote: Terry Reedy wrote: Everything in DLLs is compiled C extensions. I see about 15 for Windows 3.0. Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was inhabited by only 15 libraries? *sigh* ... although ... wait, didn't Win3.0 have more than that already? Maybe you meant Windows 1.0? SCNR-ly, Is that the equivalent of a smilely? or did you really not understand what I wrote? -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Fri, Oct 24, 2008 at 10:40 AM, Andy O'Meara [EMAIL PROTECTED] wrote: 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps No, it's there because it's necessary for acceptable performance when multiple threads are running in one interpreter. Independent interpreters wouldn't mean the absence of a GIL; it would only mean each interpreter having its own GIL. I see what you're saying, but let's note that what you're talking about at this point is an interpreter containing protection from the client level violating (supposed) direction put forth in python multithreaded guidelines. Glenn Linderman's post really gets at what's at hand here. It's really important to consider that it's not a given that python (or any framework) has to be designed against hazardous use. Again, I refer you to the diagrams and guidelines in the QuickTime API: http://developer.apple.com/technotes/tn/tn2125.html They tell you point-blank what you can and can't do, and it's that's simple. Their engineers can then simply create the implementation around those specs and not weigh any of the implementation down with sync mechanisms. I'm in the camp that simplicity and convention wins the day when it comes to an API. It's safe to say that software engineers expect and assume that a thread that doesn't have contact with other threads (except for explicit, controlled message/object passing) will run unhindered and safely, so I raise an eyebrow at the GIL (or any internal helper sync stuff) holding up an thread's performance when the app is designed to not need lower-level global locks. Anyway, let's talk about solutions. My company looking to support python dev community endeavor that allows the following: - an app makes N worker threads (using the OS) - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. Generally, we're talking about CPU-bound work here. - each interpreter has the essentials (e.g. math support, string support, re support, and so on -- I realize this is open-ended, but work with me here). Let's guesstimate about what kind of work we're talking about here and if this is even in the realm of possibility. If we find that it *is* possible, let's figure out what level of work we're talking about. From there, I can get serious about writing up a PEP/spec, paid support, and so on. Point of order! Just for my own sanity if anything :) I think some minor clarifications are in order. What are threads within Python: Python has built in support for POSIX light weight threads. This is what most people are talking about when they see, hear and say threads - they mean Posix Pthreads (http://en.wikipedia.org/wiki/POSIX_Threads) this is not what you (Adam) seem to be asking for. PThreads are attractive due to the fact they exist within a single interpreter, can share memory all willy nilly, etc. Python does in fact, use OS-Level pthreads when you request multiple threads. The Global Interpreter Lock is fundamentally designed to make the interpreter easier to maintain and safer: Developers do not need to worry about other code stepping on their namespace. This makes things thread-safe, inasmuch as having multiple PThreads within the same interpreter space modifying global state and variable at once is, well, bad. A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. POSIX Threads/pthreads/threads as we get from Java, allow unsafe programming styles. These programming styles are of the shared everything deadlock lol kind. The GIL *partially* protects against some of the pitfalls. You do not seem to be asking for pthreads :) http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock http://en.wikipedia.org/wiki/Multi-threading However, then there are processes. The difference between threads and processes is that they do *not share memory* but they can share state via shared queues/pipes/message passing - what you seem to be asking for - is the ability to completely fork independent Python interpreters, with their own namespace and coordinate work via a shared queue accessed with pipes or some other communications mechanism. Correct? Multiprocessing, as it exists within python 2.6 today actually forks (see trunk/Lib/multiprocessing/forking.py) a completely independent interpreter per process created and then construct pipes to inter-communicate, and queue to do work coordination. I am not suggesting this is good for you - I'm trying to get to exactly what you're asking for. Fundamentally, allowing total free-threading with Posix threads,
Re: 2.6, 3.0, and truly independent intepreters
On Fri, Oct 24, 2008 at 12:30 PM, Jesse Noller [EMAIL PROTECTED] wrote: On Fri, Oct 24, 2008 at 10:40 AM, Andy O'Meara [EMAIL PROTECTED] wrote: 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps No, it's there because it's necessary for acceptable performance when multiple threads are running in one interpreter. Independent interpreters wouldn't mean the absence of a GIL; it would only mean each interpreter having its own GIL. I see what you're saying, but let's note that what you're talking about at this point is an interpreter containing protection from the client level violating (supposed) direction put forth in python multithreaded guidelines. Glenn Linderman's post really gets at what's at hand here. It's really important to consider that it's not a given that python (or any framework) has to be designed against hazardous use. Again, I refer you to the diagrams and guidelines in the QuickTime API: http://developer.apple.com/technotes/tn/tn2125.html They tell you point-blank what you can and can't do, and it's that's simple. Their engineers can then simply create the implementation around those specs and not weigh any of the implementation down with sync mechanisms. I'm in the camp that simplicity and convention wins the day when it comes to an API. It's safe to say that software engineers expect and assume that a thread that doesn't have contact with other threads (except for explicit, controlled message/object passing) will run unhindered and safely, so I raise an eyebrow at the GIL (or any internal helper sync stuff) holding up an thread's performance when the app is designed to not need lower-level global locks. Anyway, let's talk about solutions. My company looking to support python dev community endeavor that allows the following: - an app makes N worker threads (using the OS) - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. Generally, we're talking about CPU-bound work here. - each interpreter has the essentials (e.g. math support, string support, re support, and so on -- I realize this is open-ended, but work with me here). Let's guesstimate about what kind of work we're talking about here and if this is even in the realm of possibility. If we find that it *is* possible, let's figure out what level of work we're talking about. From there, I can get serious about writing up a PEP/spec, paid support, and so on. Point of order! Just for my own sanity if anything :) I think some minor clarifications are in order. What are threads within Python: Python has built in support for POSIX light weight threads. This is what most people are talking about when they see, hear and say threads - they mean Posix Pthreads (http://en.wikipedia.org/wiki/POSIX_Threads) this is not what you (Adam) seem to be asking for. PThreads are attractive due to the fact they exist within a single interpreter, can share memory all willy nilly, etc. Python does in fact, use OS-Level pthreads when you request multiple threads. The Global Interpreter Lock is fundamentally designed to make the interpreter easier to maintain and safer: Developers do not need to worry about other code stepping on their namespace. This makes things thread-safe, inasmuch as having multiple PThreads within the same interpreter space modifying global state and variable at once is, well, bad. A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. POSIX Threads/pthreads/threads as we get from Java, allow unsafe programming styles. These programming styles are of the shared everything deadlock lol kind. The GIL *partially* protects against some of the pitfalls. You do not seem to be asking for pthreads :) http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock http://en.wikipedia.org/wiki/Multi-threading However, then there are processes. The difference between threads and processes is that they do *not share memory* but they can share state via shared queues/pipes/message passing - what you seem to be asking for - is the ability to completely fork independent Python interpreters, with their own namespace and coordinate work via a shared queue accessed with pipes or some other communications mechanism. Correct? Multiprocessing, as it exists within python 2.6 today actually forks (see trunk/Lib/multiprocessing/forking.py) a completely independent interpreter per process created and then construct pipes to inter-communicate, and queue to do work coordination. I am not suggesting this is good for you - I'm trying
Re: 2.6, 3.0, and truly independent intepreters
The Global Interpreter Lock is fundamentally designed to make the interpreter easier to maintain and safer: Developers do not need to worry about other code stepping on their namespace. This makes things thread-safe, inasmuch as having multiple PThreads within the same interpreter space modifying global state and variable at once is, well, bad. A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? (even if the interpreter is 100% independent, yikes). For example, have a python C module designed to programmatically generate images (and video frames) in RAM for immediate and subsequent use in animation. Meanwhile, we'd like to have a pthread with its own interpreter with an instance of this module and have it dequeue jobs as they come in (in fact, there'd be one of these threads for each excess core present on the machine). As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space (basically, it seems that we're talking about the embarrassingly parallel scenario raised in that paper). Why does it have to be in same address space? Convenience and simplicity--the same reasons that most APIs let you hang yourself if the app does dumb things with threads. Also, when the data sets that you need to send to and from each process is large, using the same address space makes more and more sense. So, just to clarify - Andy, do you want one interpreter, $N threads (e.g. PThreads) or the ability to fork multiple heavyweight processes? Sorry if I haven't been clear, but we're talking the app starting a pthread, making a fresh/clean/independent interpreter, and then being responsible for its safety at the highest level (with the payoff of each of these threads executing without hinderance). No different than if you used most APIs out there where step 1 is always to make and init a context object and the final step is always to destroy/take- down that context object. I'm a lousy writer sometimes, but I feel bad if you took the time to describe threads vs processes. The only reason I raised IPC with my messaging isn't very attractive comment was to respond to Glenn Linderman's points regarding tradeoffs of shared memory vs no. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Fri, Oct 24, 2008 at 3:17 PM, Andy O'Meara [EMAIL PROTECTED] wrote: I'm a lousy writer sometimes, but I feel bad if you took the time to describe threads vs processes. The only reason I raised IPC with my messaging isn't very attractive comment was to respond to Glenn Linderman's points regarding tradeoffs of shared memory vs no. I actually took the time to bring anyone listening in up to speed, and to clarify so I could better understand your use case. Don't feel bad, things in the thread are moving fast and I just wanted to clear it up. Ideally, we all want to improve the language, and the interpreter. However trying to push it towards a particular use case is dangerous given the idea of general use. -jesse -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 1:02 pm, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/24/2008 8:42 AM, came the following characters from the keyboard of Andy O'Meara: Glenn, great post and points! Thanks. I need to admit here that while I've got a fair bit of professional programming experience, I'm quite new to Python -- I've not learned its internals, nor even the full extent of its rich library. So I have some questions that are partly about the goals of the applications being discussed, partly about how Python is constructed, and partly about how the library is constructed. I'm hoping to get a better understanding of all of these; perhaps once a better understanding is achieved, limitations will be understood, and maybe solutions be achievable. Let me define some speculative Python interpreters; I think the first is today's Python: PyA: Has a GIL. PyA threads can run within a process; but are effectively serialized to the places where the GIL is obtained/released. Needs the GIL because that solves lots of problems with non-reentrant code (an example of non-reentrant code, is code that uses global (C global, or C static) variables – note that I'm not talking about Python vars declared global... they are only module global). In this model, non-reentrant code could include pieces of the interpreter, and/or extension modules. PyB: No GIL. PyB threads acquire/release a lock around each reference to a global variable (like with feature). Requires massive recoding of all code that contains global variables. Reduces performance significantly by the increased cost of obtaining and releasing locks. PyC: No locks. Instead, recoding is done to eliminate global variables (interpreter requires a state structure to be passed in). Extension modules that use globals are prohibited... this eliminates large portions of the library, or requires massive recoding. PyC threads do not share data between threads except by explicit interfaces. PyD: (A hybrid of PyA PyC). The interpreter is recoded to eliminate global variables, and each interpreter instance is provided a state structure. There is still a GIL, however, because globals are potentially still used by some modules. Code is added to detect use of global variables by a module, or some contract is written whereby a module can be declared to be reentrant and global-free. PyA threads will obtain the GIL as they would today. PyC threads would be available to be created. PyC instances refuse to call non-reentrant modules, but also need not obtain the GIL... PyC threads would have limited module support initially, but over time, most modules can be migrated to be reentrant and global-free, so they can be used by PyC instances. Most 3rd-party libraries today are starting to care about reentrancy anyway, because of the popularity of threads. PyE: objects are reclassified as shareable or non-shareable, many types are now only allowed to be shareable. A module and its classes become shareable with the use of a __future__ import, and their shareddict uses a read-write lock for scalability. Most other shareable objects are immutable. Each thread is run in its own private monitor, and thus protected from the normal threading memory module nasties. Alas, this gives you all the semantics, but you still need scalable garbage collection.. and CPython's refcounting needs the GIL. Our software runs in real time (so performance is paramount), interacts with other static libraries, depends on worker threads to perform real-time image manipulation, and leverages Windows and Mac OS API concepts and features. Python's performance hits have generally been a huge challenge with our animators because they often have to go back and massage their python code to improve execution performance. So, in short, there are many reasons why we use python as a part rather than a whole. [...] As a python language fan an enthusiast, don't let lua win! (I say this endearingly of course--I have the utmost respect for both communities and I only want to see CPython be an attractive pick when a company is looking to embed a language that won't intrude upon their app's design). I agree with the problem, and desire to make python fill all niches, but let's just say I'm more ambitious with my solution. ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Another great post, Glenn!! Very well laid-out and posed!! Thanks for taking the time to lay all that out. Questions for Andy: is the type of work you want to do in independent threads mostly pure Python? Or with libraries that you can control to some extent? Are those libraries reentrant? Could they be made reentrant? How much of the Python standard library would need to be available in reentrant mode to provide useful functionality for those threads? I think you want PyC I think you've defined everything perfectly, and you're you're of course correct about my love for for the PyC model. :^) Like any software that's meant to be used without restrictions, our code and frameworks always use a context object pattern so that there's never and non-const global/shared data). I would go as far to say that this is the case with more performance-oriented software than you may think since it's usually a given for us to have to be parallel friendly in as many ways as possible. Perhaps Patrick can back me up there. As to what modules are essential... As you point out, once reentrant module implementations caught on in PyC or hybrid world, I think we'd start to see real effort to whip them into compliance-- there's just so much to be gained imho. But to answer the question, there's the obvious ones (operator, math, etc), string/buffer processing (string, re), C bridge stuff (struct, array), and OS basics (time, file system, etc). Nice-to-haves would be buffer and image decompression (zlib, libpng, etc), crypto modules, and xml. As far as I can imagine, I have to believe all of these modules already contain little, if any, global data, so I have to believe they'd be super easy to make PyC happy. Patrick, what would you see you guys using? That's the rub... In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. The same argument holds for numerical processing with large data sets. The workers handing back huge data sets via messaging isn't very attractive. In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? As I understand things, the multiprocessing puts stuff in a child process (i.e. a separate address space), so the only to get stuff to/ from it is via IPC, which can include a shared/mapped memory region. Unfortunately, a shared address region doesn't work when you have large and opaque objects (e.g. a rendered CoreVideo movie in the QuickTime API or 300 megs of audio data that just went through a DSP). Then you've got the hit of serialization if you're got intricate data structures (that would normally would need to be serialized, such as a hashtable or something). Also, if I may speak for commercial developers out there who are just looking to get the job done without new code, it's usually always preferable to just a single high level sync object (for when the job is complete) than to start a child processes and use IPC. The former is just WAY less code, plain and simple. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On approximately 10/24/2008 1:09 PM, came the following characters from the keyboard of Rhamphoryncus: On Oct 24, 1:02 pm, Glenn Linderman [EMAIL PROTECTED] wrote: On approximately 10/24/2008 8:42 AM, came the following characters from the keyboard of Andy O'Meara: Glenn, great post and points! Thanks. I need to admit here that while I've got a fair bit of professional programming experience, I'm quite new to Python -- I've not learned its internals, nor even the full extent of its rich library. So I have some questions that are partly about the goals of the applications being discussed, partly about how Python is constructed, and partly about how the library is constructed. I'm hoping to get a better understanding of all of these; perhaps once a better understanding is achieved, limitations will be understood, and maybe solutions be achievable. Let me define some speculative Python interpreters; I think the first is today's Python: PyA: Has a GIL. PyA threads can run within a process; but are effectively serialized to the places where the GIL is obtained/released. Needs the GIL because that solves lots of problems with non-reentrant code (an example of non-reentrant code, is code that uses global (C global, or C static) variables – note that I'm not talking about Python vars declared global... they are only module global). In this model, non-reentrant code could include pieces of the interpreter, and/or extension modules. PyB: No GIL. PyB threads acquire/release a lock around each reference to a global variable (like with feature). Requires massive recoding of all code that contains global variables. Reduces performance significantly by the increased cost of obtaining and releasing locks. PyC: No locks. Instead, recoding is done to eliminate global variables (interpreter requires a state structure to be passed in). Extension modules that use globals are prohibited... this eliminates large portions of the library, or requires massive recoding. PyC threads do not share data between threads except by explicit interfaces. PyD: (A hybrid of PyA PyC). The interpreter is recoded to eliminate global variables, and each interpreter instance is provided a state structure. There is still a GIL, however, because globals are potentially still used by some modules. Code is added to detect use of global variables by a module, or some contract is written whereby a module can be declared to be reentrant and global-free. PyA threads will obtain the GIL as they would today. PyC threads would be available to be created. PyC instances refuse to call non-reentrant modules, but also need not obtain the GIL... PyC threads would have limited module support initially, but over time, most modules can be migrated to be reentrant and global-free, so they can be used by PyC instances. Most 3rd-party libraries today are starting to care about reentrancy anyway, because of the popularity of threads. PyE: objects are reclassified as shareable or non-shareable, many types are now only allowed to be shareable. A module and its classes become shareable with the use of a __future__ import, and their shareddict uses a read-write lock for scalability. Most other shareable objects are immutable. Each thread is run in its own private monitor, and thus protected from the normal threading memory module nasties. Alas, this gives you all the semantics, but you still need scalable garbage collection.. and CPython's refcounting needs the GIL. Hmm. So I think your PyE is an instance is an attempt to be more explicit about what I said above in PyC: PyC threads do not share data between threads except by explicit interfaces. I consider your definitions of shared data types somewhat orthogonal to the types of threads, in that both PyA and PyC threads could use these new shared data items. I think/hope that you meant that many types are now only allowed to be non-shareable? At least, I think that should be the default; they should be within the context of a single, independent interpreter instance, so other interpreters don't even know they exist, much less how to share them. If so, then I understand most of the rest of your paragraph, and it could be a way of providing shared objects, perhaps. I don't understand the comment that CPython's refcounting needs the GIL... yes, it needs the GIL if multiple threads see the object, but not for private objects... only one threads uses the private objects... so today's refcounting should suffice... with each interpreter doing its own refcounting and collecting its own garbage. Shared objects would have to do refcounting in a protected way, under some lock. One easy solution would be to have just two types of objects; non-shared private objects in a thread, and global shared objects; access to global shared objects would require grabbing the GIL, and then accessing the object, and releasing the GIL. An interface could allow for grabbing
Re: 2.6, 3.0, and truly independent intepreters
On Fri, Oct 24, 2008 at 4:51 PM, Andy O'Meara [EMAIL PROTECTED] wrote: In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? As I understand things, the multiprocessing puts stuff in a child process (i.e. a separate address space), so the only to get stuff to/ from it is via IPC, which can include a shared/mapped memory region. Unfortunately, a shared address region doesn't work when you have large and opaque objects (e.g. a rendered CoreVideo movie in the QuickTime API or 300 megs of audio data that just went through a DSP). Then you've got the hit of serialization if you're got intricate data structures (that would normally would need to be serialized, such as a hashtable or something). Also, if I may speak for commercial developers out there who are just looking to get the job done without new code, it's usually always preferable to just a single high level sync object (for when the job is complete) than to start a child processes and use IPC. The former is just WAY less code, plain and simple. Are you familiar with the API at all? Multiprocessing was designed to mimic threading in about every way possible, the only restriction on shared data is that it must be serializable, but event then you can override or customize the behavior. Also, inter process communication is done via pipes. It can also be done with messages if you want to tweak the manager(s). -jesse -- http://mail.python.org/mailman/listinfo/python-list