subject:"2.6, 3.0, and truly independent intepreters"

On Nov 4, 6:51 pm, Paul Boddie [EMAIL PROTECTED] wrote:

 The language features look a lot like what others have already been
 offering for a while: keywords for parallelised constructs (clik_for)
 which are employed by solutions for various languages (C# and various C
 ++ libraries spring immediately to mind); spawning and synchronisation
 are typically supported in existing Python solutions, although
 obviously not using language keywords.

Yes, but there is not a 'concurrency platform' that takes care of
things like load balancing and testing for race conditions. If you
spawn with cilk++, the result is not that a new process or thread is
spawned. The task is put in a queue (scheduled using work stealing),
and executed by a pool of threads/processes.   Multiprocessing makes
it easy to write concurrent algorithms (as opposed to subprocess or
popen), but automatic load balancing is something it does not do. It
also does not identify and warn the programmer about race conditions.
It does not have a barrier synchronization paradigm, but it can be
constructed.

java.util.concurrent.forkjoin is actually based on cilk.

Something like cilk can easily be built on top of the multiprocessing
module. Extra keywords can and should be avoided. But it is easier in
Python than C. Keywords are used in cilk++ because they can be defined
out by the preprocessor, thus restoring the original seqential code.
In Python we can e.g. use a decorator instead.


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Nov 5, 8:44 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

In a few earlier posts, I went into details what's meant there:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/...http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b

All this says is:

1. The cost of serialization and deserialization is to large.
2. Complex data structures cannot be placed in shared memory.

The first claim is unsubstantiated. It depends on how much and what
you serialize. If you use something like NumPy arrays, the cost of
pickling is tiny. Erlang is a language specifically designed for
concurrent programming, yet it does not allow anything to be shared.

The second claim is plain wrong. You can put anything you want in
shared memory. The mapping address of the shared memory segment may
vary, but it can be dealt with (basically use integers instead of
pointers, and use the base address as offset.) Pyro is a Python
project that has investigated this. With Pyro you can put any Python
object in a shared memory region. You can also use NumPy record arrays
to put very complex data structures in shared memory.

What do you gain by placing multiple interpreters in the same process?
You will avoid the complication that the mapping address of the shared
memory region may be different. But this is a problem that has been
worked out and solved. Instead you get a lot of issues dealing with
DLL loading and unloading (Python extension objects).

The multiprocessing module has something called proxy objects, which
also deals with this issue. An object is hosed in a server process,
and client processes may access it through synchronized IPC calls.
Inside the client process the remote object looks like any other
Python object. The synchronized IPC is hidden away in an abstraction
layer. In Windows, you can also construct outproc ActiveX objects,
which are not that different from multiprocessing's proxy objects.

If you need to place a complex object in shared memory:

1. Check if a NumPy record array may suffice (dtypes may be nested).
It will if you don't have dynamically allocated pointers inside the
data structure.

2. Consider using multiprocessing's proxy objects or outproc ActiveX
objects.

3. Go to http://pyro.sourceforge.net, download the code and read the
documentation.

Saying that it can't be done is silly before you have tried.
Programmers are not that good at guessing where the bottlenecks
reside, even if we think we do.

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-06 Thread Walter Overby

Hi,

I've been following this discussion, and although I'm not nearly the
Python expert that others on this thread are, I think I understand
Andy's point of view.  His premises seem to include at least:

1. His Python code does not control the creation of the threads.  That
is done at the app level.
2. Perhaps more importantly, his Python code does not control the
allocation of the data he needs to operate on.  He's got, for example,
an opaque OS object that is manipulated by CPU-intensive OS
functions.

sturlamolden suggests a few approaches:

 1. Check if a NumPy record array may suffice (dtypes may be nested).
 It will if you don't have dynamically allocated pointers inside the
 data structure.

I suspect that the OS is very likely to have dynamically allocated
pointers inside their opaque structures.

 2. Consider using multiprocessing's proxy objects or outproc ActiveX
 objects.

I don't understand how this would help.  If these large data
structures reside only in one remote process, then the overhead of
proxying the data into another process for manipulation requires too
much IPC, or at least so Andy stipulates.

 3. Go to http://pyro.sourceforge.net, download the code and read the
 documentation.

I don't see how this solves the problem with 2.  I admit I have only
cursory knowledge, but I understand remoting approaches to have the
same weakness.

I understand Andy's problem to be that he needs to operate on a large
amount of in-process data from several threads, and each thread mixes
CPU-intensive C functions with callbacks to Python utility functions.
He contends that, even though he releases the GIL in the CPU-bound C
functions, the reacquisition of the GIL for the utility functions
causes unacceptable contention slowdowns in the current implementation
of CPython.

After reading Martin's posts, I think I also understand his point of
view.  Is the time spent in these Python callbacks so large compared
to the C functions that you really have to wait?  If so, then Andy has
crossed over into writing performance-critical code in Python.  Andy
proposes that the Python community could work on making that possible,
but Martin cautions that it may be very hard to do so.

If I understand them correctly, none of these concerns are silly.

Walter.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Nov 6, 6:05 pm, Walter Overby [EMAIL PROTECTED] wrote:

 I don't understand how this would help.  If these large data
 structures reside only in one remote process, then the overhead of
 proxying the data into another process for manipulation requires too
 much IPC, or at least so Andy stipulates.

Perhaps it will, or perhaps not. Reading or writing to a pipe has
slightly more overhead than a memcpy. There are things that Python
needs to do that are slower than the IPC. In this case, the real
constraint would probably be contention for the object in the server,
not the IPC. (And don't blame it on the GIL, because putting a lock
around the object would not be any better.)


  3. Go tohttp://pyro.sourceforge.net, download the code and read the
  documentation.

 I don't see how this solves the problem with 2.

It puts Python objects in shared memory. Shared memory is the fastest
form of IPC there is. The overhead is basically zero. The only
constraint will be contention for the object.


 I understand Andy's problem to be that he needs to operate on a large
 amount of in-process data from several threads, and each thread mixes
 CPU-intensive C functions with callbacks to Python utility functions.
 He contends that, even though he releases the GIL in the CPU-bound C
 functions, the reacquisition of the GIL for the utility functions
 causes unacceptable contention slowdowns in the current implementation
 of CPython.

Yes, callbacks to Python are expensive. But is the problem the GIL?
Instead of contention for the GIL, he seems to prefer contention for a
complex object. Is that any better? It too has to be protected by a
lock.


 If I understand them correctly, none of these concerns are silly.

No they are not. But I think he underestimates what multiple processes
can do. The objects in 'multiprocessing' are already a lot faster than
their 'threading' and 'Queue' counterparts.



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-06 Thread Walter Overby

On Nov 6, 2:03 pm, sturlamolden [EMAIL PROTECTED] wrote:
 On Nov 6, 6:05 pm, Walter Overby [EMAIL PROTECTED] wrote:

  I don't understand how this would help.  If these large data
  structures reside only in one remote process, then the overhead of
  proxying the data into another process for manipulation requires too
  much IPC, or at least so Andy stipulates.

 Perhaps it will, or perhaps not. Reading or writing to a pipe has
 slightly more overhead than a memcpy. There are things that Python
 needs to do that are slower than the IPC. In this case, the real
 constraint would probably be contention for the object in the server,
 not the IPC. (And don't blame it on the GIL, because putting a lock
 around the object would not be any better.)

(I'm not blaming anything on the GIL.)

I read Andy to stipulate that the pipe needs to transmit hundreds of
megs of data and/or thousands of data structure instances.  I doubt
he'd be happy with memcpy either.  My instinct is that contention for
a lock could be the quicker option.

And don't forget, he says he's got an opaque OS object.  He asked
the group to explain how to send that via IPC to another process.  I
surely don't know how.

   3. Go tohttp://pyro.sourceforge.net, download the code and read the
   documentation.

  I don't see how this solves the problem with 2.

 It puts Python objects in shared memory. Shared memory is the fastest
 form of IPC there is. The overhead is basically zero. The only
 constraint will be contention for the object.

I don't think he has Python objects to work with.  I'm persuaded when
he says: when you're talking about large, intricate data structures
(which include opaque OS object refs that use process-associated
allocators), even a shared memory region between the child process and
the parent can't do the job.

Why aren't you persuaded?

snip

 Yes, callbacks to Python are expensive. But is the problem the GIL?
 Instead of contention for the GIL, he seems to prefer contention for a
 complex object. Is that any better? It too has to be protected by a
 lock.

At a couple points, Andy has expressed his preference for a single
high level sync object to synchronize access to the data, at least
that's my reading.  What he doesn't seem to prefer is the slowdown
arising from the Python callbacks acquiring the GIL.  I think that
would be an additional lock, and that's near the heart of Andy's
concern, as I read him.

  If I understand them correctly, none of these concerns are silly.

 No they are not. But I think he underestimates what multiple processes
 can do. The objects in 'multiprocessing' are already a lot faster than
 their 'threading' and 'Queue' counterparts.

Andy has complimented 'multiprocessing' as a huge huge step.  He
just offers a scenario where multiprocessing might not be the best
solution, and so far, I see no evidence he is wrong.  That's not
underestimation, in my estimation!

Walter.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Nov 7, 12:22 am, Walter Overby [EMAIL PROTECTED] wrote:

 I read Andy to stipulate that the pipe needs to transmit hundreds of
 megs of data and/or thousands of data structure instances.  I doubt
 he'd be happy with memcpy either.  My instinct is that contention for
 a lock could be the quicker option.

If he needs to communicate that amount of data very often, he has a
serious design problem.

A pipe can transmit hundreds of megs in a split second by the way.


 And don't forget, he says he's got an opaque OS object.  He asked
 the group to explain how to send that via IPC to another process.  I
 surely don't know how.

This is a typical situation where one could use a proxy object. Let
one server process own the opaque OS object, and multiple client
processes access it via IPC calls to the server.


 I don't think he has Python objects to work with.  I'm persuaded when
 he says: when you're talking about large, intricate data structures
 (which include opaque OS object refs that use process-associated
 allocators), even a shared memory region between the child process and
 the parent can't do the job.

 Why aren't you persuaded?

I am persuaded that shared memory may be difficult in that particular
case. I am not persuaded that multiple processes cannot be used,
because one can let one server process own the object.



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-05 Thread Andy O'Meara

On Nov 4, 10:59 am, sturlamolden [EMAIL PROTECTED] wrote:
On Nov 4, 4:27 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

People
in the scientific and academic communities have to understand that the
dynamics in commercial software are can be *very* different needs and
have to show some open-mindedness there.

You are beware that BDFL's employer is a company called Google? Python
is not just used in academic settings.

Turns out I have heard of Google (and how about you be a little more
courteous). If you've read the posts in this thread, you'll note that
the needs outlined in this thread are quite different than the needs
and interests of Google. Note that my point was that python *could*
and *should* be used more in end-user/desktop applications, but it
can't wag the dog to use my earlier statement.

Furthermore, I gave you a link to cilk++. This is a simple tool that
allows you to parallelize existing C or C++ software using three small
keywords.

Sorry if it wasn't clear, but we need the features associated with an
embedded interpreter. I checked out clik++ when you linked it and
although it seems pretty cool, it's not a good fit for us for a number
of reasons. Also, we like the idea of helping support a FOSS project
rather than license a proprietary product (again, to be clear, using
cilk isn't even appropriate for our situation).

As other posts have gone into extensive detail, multiprocessing
unfortunately don't handle the massive/complex data structures
situation (see my posts regarding real-time video processing).

That is something I don't believe. Why can't multiprocessing handle
that?

In a few earlier posts, I went into details what's meant there:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/9d995e4a1153a1b2/09aaca3d94ee7a04?lnk=st#09aaca3d94ee7a04
http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344
http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b

For Christ sake, researchers
write global climate models using MPI. And you think a toy problem
like 'real-time video processing' is a show stopper for using multiple
processes.

I'm not sure why you're posting this sort of stuff when it seems like
you haven't checked out earlier posts in the this thread. Also, you
do yourself and the people here a disservice in the way that you're
speaking to me here. You never know who you're really talking to or
who's reading.

Andy

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-05 Thread Paul Boddie

On 5 Nov, 20:44, Andy O'Meara [EMAIL PROTECTED] wrote:
 On Nov 4, 10:59 am, sturlamolden [EMAIL PROTECTED] wrote:

  For Christ sake, researchers
  write global climate models using MPI. And you think a toy problem
  like 'real-time video processing' is a show stopper for using multiple
  processes.

 I'm not sure why you're posting this sort of stuff when it seems like
 you haven't checked out earlier posts in the this thread.  Also, you
 do yourself and the people here a disservice in the way that you're
 speaking to me here.  You never know who you're really talking to or
 who's reading.

I think your remarks about people in the scientific and academic
communities went down the wrong way, giving (or perhaps reinforcing)
the impression that such people live carefree lives and write software
unconstrained by external factors.

Anyway, to keep things constructive, I should ask (again) whether you
looked at tinypy [1] and whether that might possibly satisfy your
embedded requirements. As I noted before, the developers might share
your outlook on a number of matters. Otherwise, you might peruse the
list of Python implementations:

http://wiki.python.org/moin/implementation

Paul

[1] http://www.tinypy.org/
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread sturlamolden

On Nov 3, 7:11 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

 My hope was that the increasing interest and value associated with
 flexible, multi-core/free-thread support is at a point where there's
 a critical mass of CPython developer interest (as indicated by various
 serious projects specifically meant to offer this support).
 Unfortunately, based on the posts in this thread, it's becoming clear
 that the scale of code changes, design changes, and testing that are
 necessary in order to offer this support is just too large unless the
 entire community is committed to the cause.

I've been watching this debate from the side line.

First let me say that there are several solutions to the multicore
problem. Multiple independendent interpreters embedded in a process is
one possibility, but not the only. Unwillingness to implement this in
CPython does not imply unwillingness to exploit the next generation of
processors.

One thing that should be done, is to make sure the Python interpreter
and standard libraries release the GIL wherever they can.

The multiprocessing package has almost the same API as you would get
from your suggestion, the only difference being that multiple
processes is involved. This is however hidden from the user, and
(almost) hidden from the programmer.

Let see what multiprocessing can do:

- Independent interpreters? Yes.
- Shared memory? Yes.
- Shared (proxy) objects? Yes.
- Synchronization objects (locks, etc.)? Yes.
- IPC? Yes.
- Queues? Yes.
- API different from threads? Not really.

Here is one example of what the multiprocessing package can do,
written by yours truly:

http://scipy.org/Cookbook/KDTree

Multicore programming is also more than using more than one thread or
process. There is something called 'load balancing'. If you want to
make efficient use of more than one core, not only must the serial
algorithm be expressed as parallel, you must also take care to
distribute the work evenly. Further, one should avoid as much resource
contention as possible, and avoid races, deadlocks and livelocks.
Java's concurrent package has sophisticated load balancers like the
work-stealing scheduler in ForkJoin. Efficient multicore programming
needs other abstractions than the 'thread' object (cf. what cilk++ is
trying to do). It would certainly be possible to make Python do
something similar. And whether threads or processes is responsible for
the concurrency is not at all important. Today it it is easiest to
achieve multicore concurrency on CPython using multiple processes.

The most 'advanced' language for multicore programming today is
Erlang. It uses a 'share-nothing' message-passing strategy. Python can
do the same as Erlang using the Candygram package
(candygram.sourceforege.net). Changing the Candygram package to use
Multiprocessing instead of Python threads is not a major undertaking.

The GIL is not evil by the way. SBCL also has a lock that protects the
compiler. Ruby is getting a GIL.

So all it comes down to is this:

Why do you want multiple independent interpreters in a process, as
opposed to multiple processes?

Even if you did manage to embed multiple interpreters in a process, it
would not give the programmer any benefit over the multiprocessing
package. If you have multiple embedded interpreters, they cannot share
anything. They must communicate serialized objects or use proxy
objects. That is the same thing the multiprocessing package do.

So why do you want this particular solution?






S.M.
































--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread sturlamolden


If you are serious about multicore programming, take a look at:

http://www.cilk.com/

Now if we could make Python do something like that, people would
perhaps start to think about writing Python programs for more than one
processor.

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread Andy O'Meara

On Nov 4, 9:38 am, sturlamolden [EMAIL PROTECTED] wrote:


 First let me say that there are several solutions to the multicore
 problem. Multiple independendent interpreters embedded in a process is
 one possibility, but not the only.''

No one is disagrees there.  However, motivation of this thread has
been to make people here consider that it's much more preferable for
CPython have has few restrictions as possible with how it's used.  I
think many people here assume that python is the showcase item in
industrial and commercial use, but it's generally just one of many
pieces of machinery that serve the app's function (so the tail can't
wag the dog when it comes to app design).  Some people in this thread
have made comments such as make your app run in python or change
your app requirements but in the world of production schedules and
making sure payroll is met, those options just can't happen.  People
in the scientific and academic communities have to understand that the
dynamics in commercial software are can be *very* different needs and
have to show some open-mindedness there.


 The multiprocessing package has almost the same API as you would get
 from your suggestion, the only difference being that multiple
 processes is involved.

As other posts have gone into extensive detail, multiprocessing
unfortunately don't handle the massive/complex data structures
situation (see my posts regarding real-time video processing).  I'm
not sure if you've followed all the discussion, but multiple processes
is off the table (this is discussed at length, so just flip back into
the thread history).


Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread sturlamolden

On Nov 4, 4:27 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

 People
 in the scientific and academic communities have to understand that the
 dynamics in commercial software are can be *very* different needs and
 have to show some open-mindedness there.

You are beware that BDFL's employer is a company called Google? Python
is not just used in academic settings.

Furthermore, I gave you a link to cilk++. This is a simple tool that
allows you to parallelize existing C or C++ software using three small
keywords. This is the kind of tool I believe would be useful. That is
not an academic judgement. It makes it easy to take existing software
and make it run efficiently on multicore processors.



 As other posts have gone into extensive detail, multiprocessing
 unfortunately don't handle the massive/complex data structures
 situation (see my posts regarding real-time video processing).  

That is something I don't believe. Why can't multiprocessing handle
that? Is using a proxy object out of the question? Is putting the
complex object in shared memory out of the question? Is having
multiple copies of the object out of the question (did you see my kd-
tree example)? Using multiple independent interpreters inside a
process does not make this any easier. For Christ sake, researchers
write global climate models using MPI. And you think a toy problem
like 'real-time video processing' is a show stopper for using multiple
processes.




--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread Paul Boddie

On 4 Nov, 16:00, sturlamolden [EMAIL PROTECTED] wrote:
 If you are serious about multicore programming, take a look at:

 http://www.cilk.com/

 Now if we could make Python do something like that, people would
 perhaps start to think about writing Python programs for more than one
 processor.

The language features look a lot like what others have already been
offering for a while: keywords for parallelised constructs (clik_for)
which are employed by solutions for various languages (C# and various C
++ libraries spring immediately to mind); spawning and synchronisation
are typically supported in existing Python solutions, although
obviously not using language keywords. The more interesting aspects of
the referenced technology seem to be hyperobjects which, as far as I
can tell, are shared global objects, along with the way the work
actually gets distributed and scheduled - something which would
require slashing through the white paper aspects of the referenced
site and actually reading the academic papers associated with the
work.

I've considered doing something like hyperobjects for a while, and
this does fit in somewhat with recent discussions about shared memory
and managing contention for that resource using the communications
channels found in, amongst other solutions, the pprocess module. I
currently have no real motivation to implement this myself, however.

Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread lkcl

On Oct 30, 6:39 pm, Terry Reedy [EMAIL PROTECTED] wrote:
 Their professor is Lars Bak, the lead architect of the Google 
 V8Javascriptengine. They spent some time working on V8 in the last couple
 months.

 then they will be at home with pyv8 - which is a combination of the
pyjamas python-to-javascript compiler and google's v8 engine.

 in pyv8, thanks to v8 (and the judicious application of boost) it's
possible to call out to external c-based modules.

 so not only do you get the benefits of the (much) faster execution
speed of v8, along with its garbage collection, but also you still get
access to external modules.

 so... their project's done, already!

 l.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-11-03 Thread Andy O'Meara

On Oct 30, 11:09 pm, alex23 [EMAIL PROTECTED] wrote:
 On Oct 31, 2:05 am, Andy O'Meara [EMAIL PROTECTED] wrote:

  I don't follow you there.  If you're referring to multiprocessing, our
  concerns are:

  - Maturity (am I willing to tell my partners and employees that I'm
  betting our future on a brand-new module that imposes significant
  restrictions as to how our app operates?)
  - Liability (am I ready to invest our resources into lots of new
  python module-specific code to find out that a platform that we want
  to target isn't supported or has problems?).  Like it not, we're a
  company and we have to show sensitivity about new or fringe packages
  that make our codebase less agile -- C/C++ continues to win the day in
  that department.

 I don't follow this...wouldn't both of these concerns be even more
 true for modifying the CPython interpreter to provide the
 functionality you want?


A great point, for sure.  So, basically, the motivation and goal of
this entire thread is to get an understanding for how enthusiastic/
interested the CPython dev community is at the concepts/enhancements
under discussion and for all of us to better understand the root
issues.  So my response is basically that it was my intention to seek
official/sanctioned development (and contribute developer direct
support and compensation).

My hope was that the increasing interest and value associated with
flexible, multi-core/free-thread support is at a point where there's
a critical mass of CPython developer interest (as indicated by various
serious projects specifically meant to offer this support).
Unfortunately, based on the posts in this thread, it's becoming clear
that the scale of code changes, design changes, and testing that are
necessary in order to offer this support is just too large unless the
entire community is committed to the cause.

Meanwhile, as many posts in the thread have pointed out, issues such
as free threading and easy/clean/compartmentalized use of python are
of rising importance to app developers shopping for an interpreter to
embed.  So unless/until CPython offers the flexibility some apps
require as an embedded interpreter, we commercial guys are
unfortunately forced to use alternatives to python.  I just think it'd
be huge win for everyone (app developers, the python dev community,
and python proliferation in general) if python made its way into more
commercial and industrial applications (in an embedded capacity).


Andy






--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-31 Thread greg


Patrick Stinson wrote:

Speaking of the big picture, is this how it normally works when
someone says Here's some code and a problem and I'm willing to pay
for a solution?


In an open-source volunteer context, time is generally more
valuable than money. Most people can't just drop part of
their regular employment temporarily, so unless there's
quite a *lot* of money being offered (enough to offer someone
full-time employment, for example) it doesn't necessarily
make any more man-hours available.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Andy O'Meara



 Okay, here's the bottom line:
 * This is not about the GIL.  This is about *completely* isolated
 interpreters; most of the time when we want to remove the GIL we want
 a single interpreter with lots of shared data.
 * Your use case, although not common, is not extraordinarily rare
 either.  It'd be nice to support.
 * If CPython had supported it all along we would continue to maintain
 it.
 * However, since it's not supported today, it's not worth the time
 invested, API incompatibility, and general breakage it would imply.
 * Although it's far more work than just solving your problem, if I
 were to remove the GIL I'd go all the way and allow shared objects.


Great recap (although saying it's not about the GIL may cause some
people lose track of the root issues here, but your following comment
GIL removal shows that we're on the same page).

 So there's really only two options here:
 * get a short-term bodge that works, like hacking the 3rd party
 library to use your shared-memory allocator.  Should be far less work
 than hacking all of CPython.

The problem there is that we're not talking about a single 3rd party
API/allocator--there's many, including the OS which has its own
internal allocators.  My video encoding example is meant to illustrate
a point, but the real-world use case is where there's allocators all
over the place from all kinds of APIs, and when you want your C module
to reenter the interpreter often to execute python helper code.

 * invest yourself in solving the *entire* problem (GIL removal with
 shared python objects).

Well, as I mentioned, I do represent a company willing an able to
expend real resources here.  However, as you pointed out, there's some
serious work at hand here (sadly--it didn't have to be this way) and
there seems to be some really polarized people here that don't seem as
interested as I am to make python more attractive for app developers
shopping for an interpreter to embed.

From our point of view, there's two other options which unfortunately
seem to be the only out the more we seem to uncover with this
discussion:

3) Start a new python implementation, let's call it CPythonES,
specifically targeting performance apps and uses an explicit object/
context concept to permit the free threading under discussion here.
The idea would be to just implement the core language, feature set,
and a handful of modules.  I refer you to that list I made earlier of
essential modules.

4) Drop python, switch to Lua.

The interesting thing about (3) is that it'd be in the same spirit as
how OpenGL ES came to be (except in place of the need for free
threading was the fact the standard OpenGL API was too overgrown and
painful for the embedded scale).

We're currently our own in-house version of (3), but we unfortunately
have other priorities at the moment that would otherwise slow this
down.  Given the direction of many-core machines these days, option
(3) or (4), for us, isn't a question of *if*, it's a question of
*when*.  So that's basically where we're at right now.

As to my earlier point about representing a company ready to spend
real resources, please email me off-list if anyone here would have an
interest in an open CPythonES project (and get full compensation).
I can say for sure that we'd be able to lead with API framework design
work--that's my personal strength and we have a lot of real world
experience there.

Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Jesse Noller

On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/29/2008 3:45 PM, came the following characters from the
 keyboard of Patrick Stinson:

 If you are dealing with lots of data like in video or sound editing,
 you would just keep the data in shared memory and send the reference
 over IPC to the worker process. Otherwise, if you marshal and send you
 are looking at a temporary doubling of the memory footprint of your
 app because the data will be copied, and marshaling overhead.

 Right.  Sounds, and is, easy, if the data is all directly allocated by the
 application.  But when pieces are allocated by 3rd party libraries, that use
 the C-runtime allocator directly, then it becomes more difficult to keep
 everything in shared memory.

 One _could_ replace the C-runtime allocator, I suppose, but that could have
 some adverse effects on other code, that doesn't need its data to be in
 shared memory.  So it is somewhat between a rock and a hard place.

 By avoiding shared memory, such problems are sidestepped... until you run
 smack into the GIL.

If you do not have shared memory: You don't need threads, ergo: You
don't get penalized by the GIL. Threads are only useful when you need
to have that requirement of large in-memory data structures shared and
modified by a pool of workers.

-jesse
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Andy O'Meara

On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  Because then we're back into the GIL not permitting threads efficient
  core use on CPU bound scripts running on other threads (when they
  otherwise could).

 Why do you think so? For C code that is carefully written, the GIL
 allows *very well* to write CPU bound scripts running on other threads.
 (please do get back to Jesse's original remark in case you have lost
 the thread :-)


I don't follow you there.  If you're referring to multiprocessing, our
concerns are:

- Maturity (am I willing to tell my partners and employees that I'm
betting our future on a brand-new module that imposes significant
restrictions as to how our app operates?)
- Liability (am I ready to invest our resources into lots of new
python module-specific code to find out that a platform that we want
to target isn't supported or has problems?).  Like it not, we're a
company and we have to show sensitivity about new or fringe packages
that make our codebase less agile -- C/C++ continues to win the day in
that department.
- Shared memory -- for the reasons listed in my other posts, IPC or a
shared/mapped memory region doesn't work for our situation (and I
venture to say, for many real world situations otherwise you'd see end-
user/common apps use forking more often than threading).



  It's turns out that this isn't an exotic case
  at all: there's a *ton* of utility gained by making calls back into
  the interpreter. The best example is that since code more easily
  maintained in python than in C, a lot of the module utility code is
  likely to be in python.

 You should really reconsider writing performance-critical code in
 Python.

I don't follow you there...  Performance-critical code in Python??
Suppose you're doing pixel-level filters on images or video, or
Patrick needs to apply a DSP to some audio...  Our app's performance
would *tank*, in a MAJOR way (that, and/or background tasks would take
100x+ longer to do their work).

 Regardless of the issue under discussion, a lot of performance
 can be gained by using flattened data structures, less pointer,
 less reference counting, less objects, and so on - in the inner loops
 of the computation. You didn't reveal what *specific* computation you
 perform, so it's difficult to give specific advise.

I tried to list some abbreviated examples in other posts, but here's
some elaboration:

- Pixel-level effects and filters, where some filters may use C procs
while others may call back into the interpreter to execute logic --
while some do both, multiple times.
- Image and video analysis/recognition where there's TONS of intricate
data structures and logic.  Those data structures and logic are
easiest to develop and maintain in python, but you'll often want to
call back to C procs which will, in turn, want to access Python (as
well as C-level) data structures.

The common pattern here is where there's a serious mix of C and python
code and data structures, BUT it can all be done with a free-thread
mentality since the finish point is unambiguous and distinct -- where
all the results are handed back to the main app in a black and
white handoff.  It's *really* important for an app to freely make
calls into its interpreter (or the interpreter's data structures)
without having to perform lock/unlocking because that affords an app a
*lot* of options and design paths.  It's just not practical to be
locking and locking the GIL when you want to operate on python data
structures or call back into python.

You seem to have placed the burden of proof on my shoulders for an app
to deserve the ability to free-thread when using 3rd party packages,
so how about we just agree it's not an unreasonable desire for a
package (such as python) to support it and move on with the
discussion.


 Again, if you do heavy-lifting in Python, you should consider to rewrite
 the performance-critical parts in C. You may find that the need for
 multiple CPUs goes even away.

Well, the entire premise we're operating under here is that we're
dealing with embarrassingly easy parallelization scenarios, so when
you suggest that the need for multiple CPUs may go away, I'm worried
that you're not keeping the big picture in mind.


  I appreciate your arguments these a PyC concept is a lot of work with
  some careful design work, but let's not kill the discussion just
  because of that.

 Any discussion in this newsgroup is futile, except when it either
 a) leads to a solution that is already possible, and the OP didn't
 envision, or
 b) is followed up by code contributions from one of the participants.

 If neither is likely to result, killing the discussion is the most
 productive thing we can do.


Well, most others here seem to have a lot different definition of what
qualifies as a futile discussion, so how about you allow the rest of
us continue to discuss these issues and possible solutions.  And, for
the record, I've said multiple times I'm ready to

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Jesse Noller

On Thu, Oct 30, 2008 at 12:05 PM, Andy O'Meara [EMAIL PROTECTED] wrote:
 On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  Because then we're back into the GIL not permitting threads efficient
  core use on CPU bound scripts running on other threads (when they
  otherwise could).

 Why do you think so? For C code that is carefully written, the GIL
 allows *very well* to write CPU bound scripts running on other threads.
 (please do get back to Jesse's original remark in case you have lost
 the thread :-)


 I don't follow you there.  If you're referring to multiprocessing, our
 concerns are:

 - Maturity (am I willing to tell my partners and employees that I'm
 betting our future on a brand-new module that imposes significant
 restrictions as to how our app operates?)
 - Liability (am I ready to invest our resources into lots of new
 python module-specific code to find out that a platform that we want
 to target isn't supported or has problems?).  Like it not, we're a
 company and we have to show sensitivity about new or fringe packages
 that make our codebase less agile -- C/C++ continues to win the day in
 that department.
 - Shared memory -- for the reasons listed in my other posts, IPC or a
 shared/mapped memory region doesn't work for our situation (and I
 venture to say, for many real world situations otherwise you'd see end-
 user/common apps use forking more often than threading).


FWIW (and again, I am not saying MP is good for your problem domain) -
multiprocessing works on windows, OS/X, Linux and Solaris quite well.
The only platforms it has problems on right now *BSD and AIX. It has
plenty of tests (I want more more more) and has a decent amount of
usage is my mail box and bug list are any indication.

Multiprocessing is not *new* - it's a branch of the pyprocessing package.

Multiprocessing is written in C, so as for the less agile - I don't
see how it's any less agile then what you've talked about. If you
wanted true platform insensitivity, then Java is a better bet :) As
for your final point:

 - Shared memory -- for the reasons listed in my other posts, IPC or a
 shared/mapped memory region doesn't work for our situation (and I
 venture to say, for many real world situations otherwise you'd see end-
 user/common apps use forking more often than threading).


I philosophically disagree with you here. PThreads and Shared memory
as it is today, is largely based on Java's influence on the world. I
would argue that the reason most people use threads as opposed to
processes is simply based on ease of use and entry (which is ironic,
given how many problems it causes). Not because they *need* the shared
memory aspects of it, or because they could not decompose the problem
into Actors/message passing, but because threads:

A are there (e.g. in Java, Python, etc)
B allow you to share anything (which allows you to take horrible shortcuts)
C is what everyone knows at this point.

Even luminaries such as Brian Goetz and many, many others have pointed
out that threading, as it exists today is fundamentally difficult to
get right. Ergo the renaissance (read: echo chamber) towards
Erlang-style concurrency.

For many real world applications - threading is just simple. This
is why Multiprocessing exists at all - to attempt to make forking/IPC
as simple as the API to threading. It's not foolproof, but the goal
was to open the door to multiple cores with a familiar API:

Quoting PEP 371:

The pyprocessing package offers a method to side-step the GIL
allowing applications within CPython to take advantage of
multi-core architectures without asking users to completely change
their programming paradigm (i.e.: dropping threaded programming
for another concurrent approach - Twisted, Actors, etc).

The Processing package offers CPython a known API which mirrors
albeit in a PEP 8 compliant manner, that of the threading API,
with known semantics and easy scalability.

I would argue that most of the people taking part in this discussion
are working on real world applications - sure, multiprocessing as it
exists today, right now - may not support your use case, but it was
evaluated to fit *many* use cases.

Most of the people here are working in Pure python, or they're using a
few extension modules here and there (in C). Again, when you say
threads and processes, most people here are going to think import
threading, fork() or import multiprocessing

Please correct me if I am wrong in understanding what you want: You
are making threads in another language (not via the threading API),
embed python in those threads, but you want to be able to share
objects/state between those threads, and independent interpreters. You
want to be able to pass state from one interpreter to another via
shared memory (e.g. pointers/contexts/etc).

Example:

ParentAppFoo makes 10 threads (in C)
Each thread gets an itty bitty python interpreter
ParentAppFoo gets a object(video) to render
Rather then

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread VanL

Jesse Noller wrote:

 Even luminaries such as Brian Goetz and many, many others have pointed
 out that threading, as it exists today is fundamentally difficult to
 get right. Ergo the renaissance (read: echo chamber) towards
 Erlang-style concurrency.

I think this is slightly missing what Andy is saying. Andy is trying
something that would look much more like Erlang-style concurrency than
classic threads - green processes to use someone else's term.

AFAIK, Erlang processes aren't really processes at the OS level.
Instead, they are named processes because they only communicate through
message passing. When multiple processes are running in the same
os-level-multi-threaded interpreter, the interpreter cheats to make the
message passing fast.

I think Andy is thinking along the same lines. With a Python
subinterpreter per thread, he is suggesting intra-process message
passing as a way to get concurrency.

Its actually not too far from what he is doing already, but he is
fighting OS-level shared library semantics to do it. Instead, if Python
supported a per-subinterpreter GIL and per-subinterpreter state, then
you could theoretically get to a good place:

- You only initialize subinterpreters if you need them, so
single-process Python doesn't pay a large (any?) penalty
- Intra-process message passing can be fast, but still has the
no-shared-state benefits of the Erlang concurrency model
- There are fewer changes to the Python core, because the GIL doesn't go
away

No, this isn't whole-hog free threading (or safe threading), there are
restrictions that go along with this model - but there would be benefits.

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Andy O'Meara

On Oct 30, 1:00 pm, Jesse Noller [EMAIL PROTECTED] wrote:


 Multiprocessing is written in C, so as for the less agile - I don't
 see how it's any less agile then what you've talked about.

Sorry for not being more specific there, but by less agile I meant
that an app's codebase is less agile if python is an absolute
requirement.  If I was told tomorrow that for some reason we had to
drop python and go with something else, it's my job to have chosen a
codebase path/roadmap such that my response back isn't just well,
we're screwed then.  Consider modern PC games.  They have huge code
bases that use DirectX and OpenGL and having a roadmap of flexibility
is paramount so packages they choose to use are used in a contained
and hedged fashion.  It's a survival tactic for a company not to
entrench themselves in a package or technology if they don't have to
(and that's what I keep trying to raise in the thread--that the python
dev community should embrace development that makes python a leading
candidate for lightweight use).  Companies want to build a flexible,
powerful codebases that are married to as few components as
possible.


  - Shared memory -- for the reasons listed in my other posts, IPC or a
  shared/mapped memory region doesn't work for our situation (and I
  venture to say, for many real world situations otherwise you'd see end-
  user/common apps use forking more often than threading).

 I would argue that the reason most people use threads as opposed to
 processes is simply based on ease of use and entry (which is ironic,
 given how many problems it causes).

No, we're in agreement here -- I was just trying to offer a more
detailed explanation of ease of use.  It's easy because memory is
shared and no IPC, serialization, or special allocator code is
required.  And as we both agree, it's far from easy once those
threads to interact with each other.  But again, my goal here is to
stay on the embarrassingly easy parallelization scenarios.



 I would argue that most of the people taking part in this discussion
 are working on real world applications - sure, multiprocessing as it
 exists today, right now - may not support your use case, but it was
 evaluated to fit *many* use cases.

And as I've mentioned, it's a totally great endeavor to be super proud
of.  That suite of functionality alone opens some *huge* doors for
python and I hope folks that use it appreciate how much time and
thought that undoubtably had to go into it.  You get total props, for
sure, and you're work is a huge and unique credit to the community.


 Please correct me if I am wrong in understanding what you want: You
 are making threads in another language (not via the threading API),
 embed python in those threads, but you want to be able to share
 objects/state between those threads, and independent interpreters. You
 want to be able to pass state from one interpreter to another via
 shared memory (e.g. pointers/contexts/etc).

 Example:

 ParentAppFoo makes 10 threads (in C)
 Each thread gets an itty bitty python interpreter
 ParentAppFoo gets a object(video) to render
 Rather then marshal that object, you pass a pointer to the object to
 the children
 You want to pass that pointer to an existing, or newly created itty
 bitty python interpreter for mangling
 Itty bitty python interpreter passes the object back to a C module via
 a pointer/context

 If the above is wrong, I think possible outlining it in the above form
 may help people conceptualize it - I really don't think you're talking
 about python-level processes or threads.


Yeah, you have it right-on there, with added fact that the C and
python execution (and data access) are highly intertwined (so getting
and releasing the GIL would have to be happening all over).  For
example, consider and the dynamics, logic, algorithms, and data
structures associated with image and video effects and image and video
image recognition/analysis.


Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Jesse Noller

On Thu, Oct 30, 2008 at 1:54 PM, Andy O'Meara [EMAIL PROTECTED] wrote:
 On Oct 30, 1:00 pm, Jesse Noller [EMAIL PROTECTED] wrote:


 Multiprocessing is written in C, so as for the less agile - I don't
 see how it's any less agile then what you've talked about.

 Sorry for not being more specific there, but by less agile I meant
 that an app's codebase is less agile if python is an absolute
 requirement.  If I was told tomorrow that for some reason we had to
 drop python and go with something else, it's my job to have chosen a
 codebase path/roadmap such that my response back isn't just well,
 we're screwed then.  Consider modern PC games.  They have huge code
 bases that use DirectX and OpenGL and having a roadmap of flexibility
 is paramount so packages they choose to use are used in a contained
 and hedged fashion.  It's a survival tactic for a company not to
 entrench themselves in a package or technology if they don't have to
 (and that's what I keep trying to raise in the thread--that the python
 dev community should embrace development that makes python a leading
 candidate for lightweight use).  Companies want to build a flexible,
 powerful codebases that are married to as few components as
 possible.


  - Shared memory -- for the reasons listed in my other posts, IPC or a
  shared/mapped memory region doesn't work for our situation (and I
  venture to say, for many real world situations otherwise you'd see end-
  user/common apps use forking more often than threading).

 I would argue that the reason most people use threads as opposed to
 processes is simply based on ease of use and entry (which is ironic,
 given how many problems it causes).

 No, we're in agreement here -- I was just trying to offer a more
 detailed explanation of ease of use.  It's easy because memory is
 shared and no IPC, serialization, or special allocator code is
 required.  And as we both agree, it's far from easy once those
 threads to interact with each other.  But again, my goal here is to
 stay on the embarrassingly easy parallelization scenarios.


That's why when I'm using threads, I stick to Queues. :)



 I would argue that most of the people taking part in this discussion
 are working on real world applications - sure, multiprocessing as it
 exists today, right now - may not support your use case, but it was
 evaluated to fit *many* use cases.

 And as I've mentioned, it's a totally great endeavor to be super proud
 of.  That suite of functionality alone opens some *huge* doors for
 python and I hope folks that use it appreciate how much time and
 thought that undoubtably had to go into it.  You get total props, for
 sure, and you're work is a huge and unique credit to the community.


Thanks - I'm just a cheerleader and pusher-into-core, R Oudkerk is the
implementor. He and everyone else who has helped deserve more credit
than me by far.

My main interest, and the reason I brought it up (again) is that I'm
interested in making it better :)


 Please correct me if I am wrong in understanding what you want: You
 are making threads in another language (not via the threading API),
 embed python in those threads, but you want to be able to share
 objects/state between those threads, and independent interpreters. You
 want to be able to pass state from one interpreter to another via
 shared memory (e.g. pointers/contexts/etc).

 Example:

 ParentAppFoo makes 10 threads (in C)
 Each thread gets an itty bitty python interpreter
 ParentAppFoo gets a object(video) to render
 Rather then marshal that object, you pass a pointer to the object to
 the children
 You want to pass that pointer to an existing, or newly created itty
 bitty python interpreter for mangling
 Itty bitty python interpreter passes the object back to a C module via
 a pointer/context

 If the above is wrong, I think possible outlining it in the above form
 may help people conceptualize it - I really don't think you're talking
 about python-level processes or threads.


 Yeah, you have it right-on there, with added fact that the C and
 python execution (and data access) are highly intertwined (so getting
 and releasing the GIL would have to be happening all over).  For
 example, consider and the dynamics, logic, algorithms, and data
 structures associated with image and video effects and image and video
 image recognition/analysis.

okie doke!
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Paul Boddie

On 30 Okt, 14:12, Andy O'Meara [EMAIL PROTECTED] wrote:

 3) Start a new python implementation, let's call it CPythonES

[...]

 4) Drop python, switch to Lua.

Have you looked at tinypy? I'm not sure about the concurrency aspects
of the implementation, but the developers are not completely
unfamiliar with game development, and there is a certain amount of
influence from Lua:

http://www.tinypy.org/

It might also be a more appropriate starting point than CPython for
experimentation.

Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Terry Reedy


Andy O'Meara wrote:

On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:



You should really reconsider writing performance-critical code in
Python.


I don't follow you there...  Performance-critical code in Python??


Martin meant what he said better later
 Again, if you do heavy-lifting in Python, you should consider to
 rewrite the performance-critical parts in C.


I tried to list some abbreviated examples in other posts, but here's
some elaboration:

...

The common pattern here is where there's a serious mix of C and python
code and data structures,


I get the feeling that what you are doing is more variegated that what 
most others are doing with Python.  And the reason is that what you are 
doing is apparently not possible with *stock* CPython.  Again, it is a 
chicken-and-egg type problem.


You might find this of interest from the PyDev list just hours ago.

Hi to all Python developers

For a student project in a course on virtual machines, we are
evaluating the possibility to
experiment with removing the GIL from CPython

We have read the arguments against doing this at
http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock.

But we think it might be possible to do this with a different approach
than what has been tried till now.

The main reason for the necessity of the GIL is reference counting.

We believe that most of the slowdown in the free threading
implementation of Greg Stein was due to the need of atomic
refcounting, as this mail seems to confirm:
http://mail.python.org/pipermail/python-ideas/2007-April/000414.html

So we want to change CPython into having a real garbage collector -
removing all reference counting, and then the need for locks (or
atomic inc/dec ops) should be
highly alleviated.

Preferably the GC should be a high-performance one for instance a
generational one.

We believe that it can run quite a lot faster than ref-counting.

Shared datastructures would get their lock obviously.
Immutable objects (especially shared global objects, like True, False, Null)
would not.

Most of the interpreter structure would be per-thread, at that point.

We do not know how Greg Stein did his locking in the free threads
patch, but as a part of the course we learned there exists much faster
ways of locking than using OS-locks (faster for the uncontented case)
that are used in e.g. the HOT-SPOT java-compiler. This might make
free threading in python more attractive than some pessimists think.
(http://blogs.sun.com/dave/entry/biased_locking_in_hotspot)
In particular, we are talking about making the uncontended case go fast,
not about the independent part of stack-allocating the mutex
structure, which can only be done and is only needed in Java.

These ideas are similar to the ones used by Linux fast mutexes
(futexes), the implementation of mutexes in NPTL.

We have read this mail thread - so it seems that our idea surfaced,
but Greg didn't completely love it (he wanted to optimize refcounting
instead):
http://mail.python.org/pipermail/python-ideas/2007-April/000436.html

He was not totally negative however. His main objections are about:
- cache locality (He is in our opinion partially right, as seen in some
other paper time ago - any GC, copying GC in particular, doubles the
amount of used memory, so it's less cache-friendly). But still GCs are
overall competitive or faster than explicit management, and surely
much faster of refcounting.

We know it is the plan for PyPy to work in this way, and also that
Jython and Ironpython works like that (using the host vm's GC), so it
seems to be somehow agreeable with the python semantics (perhaps not
really with __del__ but they are not really nice anyway).

Was this ever tried for CPython?

Any other comments, encouragements or warnings on the project-idea?

Best regards: Paolo, Sigurd [EMAIL PROTECTED]


Guido's response

It's not that I have any love for the GIL, it just is the best
compromise I could find. I expect that you won't be able to do better,
but I wish you luck anyway.


And a bit more explanation from Van Lindberg

Just an FYI, these two particular students already introduced themselves
on the PyPy list. Paolo is a masters student with experience in the
Linux kernel; Sigurd is a PhD candidate.

Their professor is Lars Bak, the lead architect of the Google V8
Javascript engine. They spent some time working on V8 in the last couple
months.


I agree that you should continue the discussion.  Just let Martin ignore 
it for awhile until you need further input from him.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Martin v. Löwis

 Why do you think so? For C code that is carefully written, the GIL
 allows *very well* to write CPU bound scripts running on other threads.
 (please do get back to Jesse's original remark in case you have lost
 the thread :-)

 
 I don't follow you there.  If you're referring to multiprocessing

No, I'm not. I refer to regular, plain, multi-threading.

 It's turns out that this isn't an exotic case
 at all: there's a *ton* of utility gained by making calls back into
 the interpreter. The best example is that since code more easily
 maintained in python than in C, a lot of the module utility code is
 likely to be in python.
 You should really reconsider writing performance-critical code in
 Python.
 
 I don't follow you there...  Performance-critical code in Python??

I probably expressed myself incorrectly (being not a native speaker
of English): If you were writing performance-critical in Python,
you should reconsider (i.e. you should rewrite it in C).

It's not clear whether this calling back into Python is in the
performance-critical path. If it is, then reconsider.

 I tried to list some abbreviated examples in other posts, but here's
 some elaboration:
 
 - Pixel-level effects and filters, where some filters may use C procs
 while others may call back into the interpreter to execute logic --
 while some do both, multiple times.

Ok. For a plain C proc, release the GIL before the proc, and reacquire
it afterwards. For a proc that calls into the interpreter:
a) if it is performance-critical, reconsider writing it in C, or
   reformulate so that it stops being performance critical (e.g.
   through caching)
b) else, reacquire the GIL before calling back into Python, then
   release the GIL before continuing the proc

 - Image and video analysis/recognition where there's TONS of intricate
 data structures and logic.  Those data structures and logic are
 easiest to develop and maintain in python, but you'll often want to
 call back to C procs which will, in turn, want to access Python (as
 well as C-level) data structures.

Not sure what the processing is, or what processing you need to do.
The data structures themselves are surely not performance critical
(not being algorithms). If you really run Python algorithms on these
structures, then my approach won't help you (except for the general
recommendation to find some expensive sub-algorithm and rewrite that
in C, so that it both becomes faster and can release the GIL).

 It's just not practical to be
 locking and locking the GIL when you want to operate on python data
 structures or call back into python.

This I don't understand. I find that fairly easy to do.

 You seem to have placed the burden of proof on my shoulders for an app
 to deserve the ability to free-thread when using 3rd party packages,
 so how about we just agree it's not an unreasonable desire for a
 package (such as python) to support it and move on with the
 discussion.

Not at all - I don't want a proof. I just want agreement on Jesse
Noller's claim

# A c-level module, on the other hand, can sidestep/release
# the GIL at will, and go on it's merry way and process away.

 If neither is likely to result, killing the discussion is the most
 productive thing we can do.

 
 Well, most others here seem to have a lot different definition of what
 qualifies as a futile discussion, so how about you allow the rest of
 us continue to discuss these issues and possible solutions.  And, for
 the record, I've said multiple times I'm ready to contribute
 monetarily, professionally, and personally, so if that doesn't qualify
 as the precursor to code contributions from one of the participants
 then I don't know WHAT does.

Ok, I apologize for having misunderstood you here.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Patrick Stinson

On Wed, Oct 29, 2008 at 4:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/29/2008 3:45 PM, came the following characters from the
 keyboard of Patrick Stinson:

 If you are dealing with lots of data like in video or sound editing,
 you would just keep the data in shared memory and send the reference
 over IPC to the worker process. Otherwise, if you marshal and send you
 are looking at a temporary doubling of the memory footprint of your
 app because the data will be copied, and marshaling overhead.

 Right.  Sounds, and is, easy, if the data is all directly allocated by the
 application.  But when pieces are allocated by 3rd party libraries, that use
 the C-runtime allocator directly, then it becomes more difficult to keep
 everything in shared memory.

good point.


 One _could_ replace the C-runtime allocator, I suppose, but that could have
 some adverse effects on other code, that doesn't need its data to be in
 shared memory.  So it is somewhat between a rock and a hard place.

ewww scary. mousetraps for sale?


 By avoiding shared memory, such problems are sidestepped... until you run
 smack into the GIL.

 --
 Glenn -- http://nevcal.com/
 ===
 A protocol is complete when there is nothing left to remove.
 -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Glenn Linderman

On approximately 10/30/2008 6:26 AM, came the following characters from 
the keyboard of Jesse Noller:

On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman [EMAIL PROTECTED] wrote:
  

On approximately 10/29/2008 3:45 PM, came the following characters from the
keyboard of Patrick Stinson:


If you are dealing with lots of data like in video or sound editing,
you would just keep the data in shared memory and send the reference
over IPC to the worker process. Otherwise, if you marshal and send you
are looking at a temporary doubling of the memory footprint of your
app because the data will be copied, and marshaling overhead.
  

Right.  Sounds, and is, easy, if the data is all directly allocated by the
application.  But when pieces are allocated by 3rd party libraries, that use
the C-runtime allocator directly, then it becomes more difficult to keep
everything in shared memory.

One _could_ replace the C-runtime allocator, I suppose, but that could have
some adverse effects on other code, that doesn't need its data to be in
shared memory.  So it is somewhat between a rock and a hard place.

By avoiding shared memory, such problems are sidestepped... until you run
smack into the GIL.



If you do not have shared memory: You don't need threads, ergo: You
don't get penalized by the GIL. Threads are only useful when you need
to have that requirement of large in-memory data structures shared and
modified by a pool of workers.


The whole point of this thread is to talk about large in-memory data 
structures that are shared and modified by a pool of workers.


My reference to shared memory was specifically referring to the concept 
of sharing memory between processes... a particular OS feature that is 
called shared memory.


The need for sharing memory among a pool of workers is still the 
premise.  Threads do that automatically, without the need for the OS 
shared memory feature, that brings with it the need for a special 
allocator to allocate memory in the shared memory area vs the rest of 
the address space.


Not to pick on you, particularly, Jesse, but this particular response 
made me finally understand why there has been so much repetition of the 
same issues and positions over and over and over in this thread: instead 
of comprehending the whole issue, people are responding to small 
fragments of it, with opinions that may be perfectly reasonable for that 
fragment, but missing the big picture, or the explanation made when the 
same issue was raised in a different sub-thread.


--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Patrick Stinson

Speaking of the big picture, is this how it normally works when
someone says Here's some code and a problem and I'm willing to pay
for a solution? I've never really walked that path with a project of
this complexity (I guess it's the backwards-compatibility that makes
it confusing), but is this problem just too complex so we have to keep
talking and talking on forum after forum? Afraid to fork? I know I am.
How many people are qualified to tackle Andy's problem? Are all of
them busy or uninterested? Is the current code in a tight spot where
it just can't be fixed without really jabbing that FORK in so deep
that the patch will die when your project does?

Personally I think this problem is super-awesome on the hobbyest's fun
scale. I'd totally take the time to let my patch do the talking but I
haven't read enough of the (2.5) code. So, I resort to simply reading
the newsgroups and python code to better understand the mechanics
problem :(

On Thu, Oct 30, 2008 at 2:54 PM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/30/2008 6:26 AM, came the following characters from the
 keyboard of Jesse Noller:

 On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman [EMAIL PROTECTED]
 wrote:


 On approximately 10/29/2008 3:45 PM, came the following characters from
 the
 keyboard of Patrick Stinson:


 If you are dealing with lots of data like in video or sound editing,
 you would just keep the data in shared memory and send the reference
 over IPC to the worker process. Otherwise, if you marshal and send you
 are looking at a temporary doubling of the memory footprint of your
 app because the data will be copied, and marshaling overhead.


 Right.  Sounds, and is, easy, if the data is all directly allocated by
 the
 application.  But when pieces are allocated by 3rd party libraries, that
 use
 the C-runtime allocator directly, then it becomes more difficult to keep
 everything in shared memory.

 One _could_ replace the C-runtime allocator, I suppose, but that could
 have
 some adverse effects on other code, that doesn't need its data to be in
 shared memory.  So it is somewhat between a rock and a hard place.

 By avoiding shared memory, such problems are sidestepped... until you run
 smack into the GIL.


 If you do not have shared memory: You don't need threads, ergo: You
 don't get penalized by the GIL. Threads are only useful when you need
 to have that requirement of large in-memory data structures shared and
 modified by a pool of workers.

 The whole point of this thread is to talk about large in-memory data
 structures that are shared and modified by a pool of workers.

 My reference to shared memory was specifically referring to the concept of
 sharing memory between processes... a particular OS feature that is called
 shared memory.

 The need for sharing memory among a pool of workers is still the premise.
  Threads do that automatically, without the need for the OS shared memory
 feature, that brings with it the need for a special allocator to allocate
 memory in the shared memory area vs the rest of the address space.

 Not to pick on you, particularly, Jesse, but this particular response made
 me finally understand why there has been so much repetition of the same
 issues and positions over and over and over in this thread: instead of
 comprehending the whole issue, people are responding to small fragments of
 it, with opinions that may be perfectly reasonable for that fragment, but
 missing the big picture, or the explanation made when the same issue was
 raised in a different sub-thread.

 --
 Glenn -- http://nevcal.com/
 ===
 A protocol is complete when there is nothing left to remove.
 -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Rhamphoryncus

On Oct 30, 8:23 pm, Patrick Stinson [EMAIL PROTECTED]
wrote:
 Speaking of the big picture, is this how it normally works when
 someone says Here's some code and a problem and I'm willing to pay
 for a solution? I've never really walked that path with a project of
 this complexity (I guess it's the backwards-compatibility that makes
 it confusing), but is this problem just too complex so we have to keep
 talking and talking on forum after forum? Afraid to fork? I know I am.
 How many people are qualified to tackle Andy's problem? Are all of
 them busy or uninterested? Is the current code in a tight spot where
 it just can't be fixed without really jabbing that FORK in so deep
 that the patch will die when your project does?

 Personally I think this problem is super-awesome on the hobbyest's fun
 scale. I'd totally take the time to let my patch do the talking but I
 haven't read enough of the (2.5) code. So, I resort to simply reading
 the newsgroups and python code to better understand the mechanics
 problem :(

The scale of this issue is why so little progress gets made, yes.  I
intend to solve it regardless of getting paid (and have been working
on various aspects for quite a while now), but as you can see from
this thread it's very difficult to convince anybody that my approach
is the *right* approach.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread alex23

On Oct 31, 2:05 am, Andy O'Meara [EMAIL PROTECTED] wrote:
 I don't follow you there.  If you're referring to multiprocessing, our
 concerns are:

 - Maturity (am I willing to tell my partners and employees that I'm
 betting our future on a brand-new module that imposes significant
 restrictions as to how our app operates?)
 - Liability (am I ready to invest our resources into lots of new
 python module-specific code to find out that a platform that we want
 to target isn't supported or has problems?).  Like it not, we're a
 company and we have to show sensitivity about new or fringe packages
 that make our codebase less agile -- C/C++ continues to win the day in
 that department.

I don't follow this...wouldn't both of these concerns be even more
true for modifying the CPython interpreter to provide the
functionality you want?
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Close, I work currently for EastWest :)

Well, I actually like almost everything else about CPython,
considering my audio work the only major problem I've had is with the
GIL. I like the purist community, and I like the code, since
integrating it on both platforms has been relatively clean, and
required *zero* support. Frankly, with the exception of some windows
deployment issues relating to static linking of libpython and some
extensions, it's been a dream lib to use.

Further, I really appreciate the discussions that happen in these
lists, and I think that this particular problem is a wonderful example
of a situation that requires tons of miscellaneous opinions and input
from all angles - especially at this stage. I think that this problem
has lots of standing discussion and lots of potential solutions and/or
workarounds, and it would be cool for someone to aggregate and
paraphrase that stuff into a page to assist those thinking about doing
some patching. That's probably something that the coder would do
themselves though.

On Fri, Oct 24, 2008 at 10:25 AM, Andy O'Meara [EMAIL PROTECTED] wrote:

 So we are sitting this music platform with unimaginable possibilities
 in the music world (of which python does not play a role), but those
 little CPU spikes caused by the GIL at low latencies won't let us have
 it. AFAIK, there is no music scripting language out there that would
 come close, and yet we are so close! This is a big deal.


 Perfectly said, Patrick.  It pains me to know how widespread python
 *could* be in commercial software!

 Also, good points about people being longwinded and that code talks.

 Sadly, the time alone I've spend in the last couple days on this
 thread is scary, but I'm committed now, I guess.  :^(   I look at the
 length of the posts of some of these guys and I have to wonder what
 the heck they do for a living!

 As I mentioned, however, I'm close to just blowing the whistle on this
 crap and start making CPythonES (as I call it, in the spirit of the
 ES in OpenGLES).  Like you, we just want the core features of
 python in a clean, tidy, *reliable* fashion--something that we can
 ship and not lose sleep (or support hours) over.  Basically, I imagine
 developing an interpreter designed for dev houses like yours and mine
 (you're Ableton or Propellerhead, right?)--a python version of lua, if
 you will.  The nice thing about it is that is could start fresh and
 small, but I have a feeling it would really catch on because every
 commercial dev house would choose it over CPython any day of the week
 and it would be completely disjoint form CPython.

 Andy

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Wow, man. Excellent post. You want a job?

The gui could use PyA threads for sure, and the audio thread could use
PyC threads. It would not be a problem to limit the audio thread to
only reentrant libraries.

This kind of thought is what I had in mind about finding a compromise,
especially in the way that PyD would not break old code assuming that
it could eventually be ported.

On Fri, Oct 24, 2008 at 11:02 AM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/24/2008 8:42 AM, came the following characters from the
 keyboard of Andy O'Meara:

 Glenn, great post and points!


 Thanks. I need to admit here that while I've got a fair bit of professional
 programming experience, I'm quite new to Python -- I've not learned its
 internals, nor even the full extent of its rich library. So I have some
 questions that are partly about the goals of the applications being
 discussed, partly about how Python is constructed, and partly about how the
 library is constructed. I'm hoping to get a better understanding of all of
 these; perhaps once a better understanding is achieved, limitations will be
 understood, and maybe solutions be achievable.

 Let me define some speculative Python interpreters; I think the first is
 today's Python:

 PyA: Has a GIL. PyA threads can run within a process; but are effectively
 serialized to the places where the GIL is obtained/released. Needs the GIL
 because that solves lots of problems with non-reentrant code (an example of
 non-reentrant code, is code that uses global (C global, or C static)
 variables – note that I'm not talking about Python vars declared global...
 they are only module global). In this model, non-reentrant code could
 include pieces of the interpreter, and/or extension modules.

 PyB: No GIL. PyB threads acquire/release a lock around each reference to a
 global variable (like with feature). Requires massive recoding of all code
 that contains global variables. Reduces performance significantly by the
 increased cost of obtaining and releasing locks.

 PyC: No locks. Instead, recoding is done to eliminate global variables
 (interpreter requires a state structure to be passed in). Extension modules
 that use globals are prohibited... this eliminates large portions of the
 library, or requires massive recoding. PyC threads do not share data between
 threads except by explicit interfaces.

 PyD: (A hybrid of PyA  PyC). The interpreter is recoded to eliminate global
 variables, and each interpreter instance is provided a state structure.
 There is still a GIL, however, because globals are potentially still used by
 some modules. Code is added to detect use of global variables by a module,
 or some contract is written whereby a module can be declared to be reentrant
 and global-free. PyA threads will obtain the GIL as they would today. PyC
 threads would be available to be created. PyC instances refuse to call
 non-reentrant modules, but also need not obtain the GIL... PyC threads would
 have limited module support initially, but over time, most modules can be
 migrated to be reentrant and global-free, so they can be used by PyC
 instances. Most 3rd-party libraries today are starting to care about
 reentrancy anyway, because of the popularity of threads.

 The assumptions here are that:

 Data-1) A Python interpreter doesn't provide any mechanism to share normal
 data among threads, they are independent... but message passing works.
 Data-2) A Python interpreter could be extended to provide mechanisms to
 share special data, and the data would come with an implicit lock.
 Data-3) A Python interpreter could be extended to provide unlocked access to
 special data, requiring the application to handle the synchronization
 between threads. Data of type 2 could be used to control access to data of
 type 3. This type of data could be large, or frequently referenced data, but
 only by a single thread at a time, with major handoffs to a different thread
 synchronized by the application in whatever way it chooses.

 Context-1) A Python interpreter would know about threads it spawns, and
 could pass in a block of context (in addition to the state structure) as a
 parameter to a new thread. That block of context would belong to the thread
 as long as it exists, and return to the spawner when the thread completes.
 An embedded interpreter would also be given a block of context (in addition
 to the state structure). This would allow application context to be created
 and passed around. Pointers to shared memory structures, might be typical
 context in the embedded case.

 Context-2) Embedded Python interpreters could be spawned either as PyA
 threads or PyC threads. PyC threads would be limited to modules that are
 reentrant.


 I think that PyB and PyC are the visions that people see, which argue
 against implementing independent interpreters. PyB isn't truly independent,
 because data are shared, recoding is required, and performance suffers. Ick.
 PyC requires

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 12:51 PM, Andy O'Meara [EMAIL PROTECTED] wrote:

 Another great post, Glenn!!  Very well laid-out and posed!! Thanks for
 taking the time to lay all that out.


 Questions for Andy: is the type of work you want to do in independent
 threads mostly pure Python? Or with libraries that you can control to
 some extent? Are those libraries reentrant? Could they be made
 reentrant? How much of the Python standard library would need to be
 available in reentrant mode to provide useful functionality for those
 threads? I think you want PyC


 I think you've defined everything perfectly, and you're you're of
 course correct about my love for for the PyC model.  :^)

 Like any software that's meant to be used without restrictions, our
 code and frameworks always use a context object pattern so that
 there's never and non-const global/shared data).  I would go as far to
 say that this is the case with more performance-oriented software than
 you may think since it's usually a given for us to have to be parallel
 friendly in as many ways as possible.  Perhaps Patrick can back me up
 there.

And I will.


 As to what modules are essential...  As you point out, once
 reentrant module implementations caught on in PyC or hybrid world, I
 think we'd start to see real effort to whip them into compliance--
 there's just so much to be gained imho.  But to answer the question,
 there's the obvious ones (operator, math, etc), string/buffer
 processing (string, re), C bridge stuff (struct, array), and OS basics
 (time, file system, etc).  Nice-to-haves would be buffer and image
 decompression (zlib, libpng, etc), crypto modules, and xml. As far as
 I can imagine, I have to believe all of these modules already contain
 little, if any, global data, so I have to believe they'd be super easy
 to make PyC happy.  Patrick, what would you see you guys using?


We don't need anything :) Since our goal is just to use python as a
scripting language/engine to our MIDI application, all we really need
is to make calls to the api that we expose using __builtins__.

You know, the standard python library is pretty siick, but the
syntax, object model, and import mechanics of python itself is an
**equally exportable function** of the code. Funny that I'm lucky
enough to say:

Screw the extension modules - I just want the LANGUAGE. But, I can't have it.


  That's the rub...  In our case, we're doing image and video
  manipulation--stuff not good to be messaging from address space to
  address space.  The same argument holds for numerical processing with
  large data sets.  The workers handing back huge data sets via
  messaging isn't very attractive.

 In the module multiprocessing environment could you not use shared
 memory, then, for the large shared data items?


 As I understand things, the multiprocessing puts stuff in a child
 process (i.e. a separate address space), so the only to get stuff to/
 from it is via IPC, which can include a shared/mapped memory region.
 Unfortunately, a shared address region doesn't work when you have
 large and opaque objects (e.g. a rendered CoreVideo movie in the
 QuickTime API or 300 megs of audio data that just went through a
 DSP).  Then you've got the hit of serialization if you're got
 intricate data structures (that would normally would need to be
 serialized, such as a hashtable or something).  Also, if I may speak
 for commercial developers out there who are just looking to get the
 job done without new code, it's usually always preferable to just a
 single high level sync object (for when the job is complete) than to
 start a child processes and use IPC.  The former is just WAY less
 code, plain and simple.


 Andy


 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-29 Thread Paul Boddie

On 28 Okt, 21:03, Rhamphoryncus [EMAIL PROTECTED] wrote:
 * get a short-term bodge that works, like hacking the 3rd party
 library to use your shared-memory allocator.  Should be far less work
 than hacking all of CPython.

Did anyone come up with a reason why shared memory couldn't be used
for the purpose described by the inquirer? With the disadvantages of
serialisation circumvented, that would leave issues of contention, and
on such matters I have to say that I'm skeptical about solutions which
try and make concurrent access to CPython objects totally transparent,
mostly because it appears to be quite a lot of work to get right (as
POSH illustrates, and as your own safethread work shows), and also
because systems where contention is spread over a large surface (any
object can potentially be accessed by any process at any time) are
likely to incur a lot of trouble for the dubious benefit of being
vague about which objects are actually being shared.

Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-29 Thread Rhamphoryncus

On Oct 29, 7:20 am, Paul Boddie [EMAIL PROTECTED] wrote:
 On 28 Okt, 21:03, Rhamphoryncus [EMAIL PROTECTED] wrote:

  * get a short-term bodge that works, like hacking the 3rd party
  library to use your shared-memory allocator.  Should be far less work
  than hacking all of CPython.

 Did anyone come up with a reason why shared memory couldn't be used
 for the purpose described by the inquirer? With the disadvantages of
 serialisation circumvented, that would leave issues of contention, and
 on such matters I have to say that I'm skeptical about solutions which
 try and make concurrent access to CPython objects totally transparent,
 mostly because it appears to be quite a lot of work to get right (as
 POSH illustrates, and as your own safethread work shows), and also
 because systems where contention is spread over a large surface (any
 object can potentially be accessed by any process at any time) are
 likely to incur a lot of trouble for the dubious benefit of being
 vague about which objects are actually being shared.

I believe large existing libraries were the reason.  Thus my
suggestion of the evil fork+mmap abuse.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

If you are dealing with lots of data like in video or sound editing,
you would just keep the data in shared memory and send the reference
over IPC to the worker process. Otherwise, if you marshal and send you
are looking at a temporary doubling of the memory footprint of your
app because the data will be copied, and marshaling overhead.

On Fri, Oct 24, 2008 at 3:50 PM, Andy O'Meara [EMAIL PROTECTED] wrote:


 Are you familiar with the API at all? Multiprocessing was designed to
 mimic threading in about every way possible, the only restriction on
 shared data is that it must be serializable, but event then you can
 override or customize the behavior.

 Also, inter process communication is done via pipes. It can also be
 done with messages if you want to tweak the manager(s).


 I apologize in advance if I don't understand something correctly, but
 as I understand them, everything has to be serialized in order to go
 through IPC.  So when you're talking about thousands of objects,
 buffers, and/or large OS opaque objects (e.g. memory-resident video
 and images), that seems like a pretty rough hit of run-time resources.

 Please don't misunderstand my comments to suggest that multiprocessing
 isn't great stuff.  On the contrary, it's very impressive and it
 singlehandedly catapults python *way* closer to efficient CPU bound
 processing than it ever was before.  All I mean to say is that in the
 case where using a shared address space with a worker pthread per
 spare core to do CPU bound work, it's a really big win not to have to
 serialize stuff.  And in the case of hundreds of megs of data and/or
 thousands of data structure instances, it's a deal breaker to
 serialize and unserialize everything just so that it can be sent
 though IPC.  It's a deal breaker for most performance-centric apps
 because of the unnecessary runtime resource hit and because now all
 those data structures being passed around have to have accompanying
 serialization code written (and maintained) for them.   That's
 actually what I meant when I made the comment that a high level sync
 object in a shared address space is better then sending it all
 through IPC (when the data sets are wild and crazy).  From a C/C++
 point of view, I would venture to say that it's always a huge win to
 just stick those embarrassingly easy parallelization cases into the
 thread with a sync object than forking and using IPC and having to
 write all the serialization code. And in the case of huge data types--
 such as video or image rendering--it makes me nervous to think of
 serializing it all just so it can go through IPC when it could just be
 passed using a pointer change and a single sync object.

 So, if I'm missing something and there's a way so pass data structures
 without serialization, then I'd definitely like to learn more (sorry
 in advance if I missed something there).  When I took a look at
 multiprocessing my concerns where:
   - serialization (discussed above)
   - maturity (are we ready to bet the farm that mp is going to work
 properly on the platforms we need it to?)

 Again, I'm psyched that multiprocessing appeared in 2.6 and it's a
 huge huge step in getting everyone to unlock the power of python!
 But, then some of the tidbits described above are additional data
 points for you and others to chew on.  I can tell you they're pretty
 important points for any performance-centric software provider (us,
 game developers--from EA to Ambrosia, and A/V production app
 developers like Patrick).

 Andy










 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Rhamphoryncus

On Oct 26, 6:57 pm, Andy O'Meara [EMAIL PROTECTED] wrote:
 Grrr... I posted a ton of lengthy replies to you and other recent
 posts here using Google and none of them made it, argh. Poof. There's
 nothing that fires more up more than lost work,  so I'll have to
 revert short and simple answers for the time being.  Argh, damn.

 On Oct 25, 1:26 am, greg [EMAIL PROTECTED] wrote:



  Andy O'Meara wrote:
   I would definitely agree if there was a context (i.e. environment)
   object passed around then perhaps we'd have the best of all worlds.

  Moreover, I think this is probably the *only* way that
  totally independent interpreters could be realized.

  Converting the whole C API to use this strategy would be
  a very big project. Also, on the face of it, it seems like
  it would render all existing C extension code obsolete,
  although it might be possible to do something clever with
  macros to create a compatibility layer.

  Another thing to consider is that passing all these extra
  pointers around everywhere is bound to have some effect
  on performance.

 I'm with you on all counts, so no disagreement there.  On the passing
 a ptr everywhere issue, perhaps one idea is that all objects could
 have an additionalfieldthat would point back to their parent context
 (ie. their interpreter).  So the only prototypes that would have to be
 modified to contain the context ptr would be the ones that don't
 inherently operate on objects (e.g. importing a module).

Trying to directly share objects like this is going to create
contention.  The refcounting becomes the sequential portion of
Amdahl's Law.  This is why safethread doesn't scale very well: I share
a massive amount of objects.

An alternative, actually simpler, is to create proxies to your real
object.  The proxy object has a pointer to the real object and the
context containing it.  When you call a method it serializes the
arguments, acquires the target context's GIL (while releasing yours),
and deserializes in the target context.  Once the method returns it
reverses the process.

There's two reasons why this may perform well for you: First,
operations done purely in C may cheat (if so designed).  A copy from
one memory buffer to another memory buffer may be given two proxies as
arguments, but then operate directly on the target objects (ie without
serialization).

Second, if a target context is idle you can enter it (acquiring its
GIL) without any context switch.

Of course that scenario is full of maybes, which is why I have
little interest in it..

An even better scenario is if your memory buffer's methods are in pure
C and it's a simple object (no pointers).  You can stick the memory
buffer in shared memory and have multiple processes manipulate it from
C.  More maybes.

An evil trick if you need pointers, but control the allocation, is to
take advantage of the fork model.  Have a master process create a
bunch of blank files (temp files if linux doesn't allow /dev/zero),
mmap them all using MAP_SHARED, then fork and utilize.  The addresses
will be inherited from the master process, so any pointers within them
will be usable across all processes.  If you ever want to return
memory to the system you can close that file, then have all processes
use MAP_SHARED|MAP_FIXED to overwrite it.  Evil, but should be
disturbingly effective, and still doesn't require modifying CPython.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Michael Sparks

Glenn Linderman wrote:

 so a 3rd party library might be called to decompress the stream into a
 set of independently allocated chunks, each containing one frame (each
 possibly consisting of several allocations of memory for associated
 metadata) that is independent of other frames

We use a combination of a dictionary + RGB data for this purpose. Using a
dictionary works out pretty nicely for the metadata, and obviously one
attribute holds the frame data as a binary blob.

http://www.kamaelia.org/Components/pydoc/Kamaelia.Codec.YUV4MPEG gives some
idea structure and usage. The example given there is this:

Pipeline( RateControlledFileReader(video.dirac,readmode=bytes, ...),
  DiracDecoder(),
  FrameToYUV4MPEG(),
  SimpleFileWriter(output.yuv4mpeg)
).run()

Now all of those components are generator components.

That's useful since:
   a) we can structure the code to show what it does more clearly, and it
  still run efficiently inside a single process
   b) We can change this over to using multiple processes trivially:

ProcessPipeline(
  RateControlledFileReader(video.dirac,readmode=bytes, ...),
  DiracDecoder(),
  FrameToYUV4MPEG(),
  SimpleFileWriter(output.yuv4mpeg)
).run()

This version uses multiple processes (under the hood using Paul Boddies
pprocess library, since this support predates the multiprocessing module
support in python).

The big issue with *this* version however is that due to pprocess (and
friends) pickling data to be sent across OS pipes, the data throughput on
this would be lowsy. Specifically in this example, if we could change it
such that the high level API was this:

ProcessPipeline(
  RateControlledFileReader(video.dirac,readmode=bytes, ...),
  DiracDecoder(),
  FrameToYUV4MPEG(),
  SimpleFileWriter(output.yuv4mpeg)
use_shared_memory_IPC = True,
).run()

That would be pretty useful, for some hopefully obvious reasons. I suppose
ideally we'd just use shared_memory_IPC for everything and just go back to
this:

ProcessPipeline(
  RateControlledFileReader(video.dirac,readmode=bytes, ...),
  DiracDecoder(),
  FrameToYUV4MPEG(),
  SimpleFileWriter(output.yuv4mpeg)
).run()

But essentially for us, this is an optimisation problem, not a how do I
even begin to use this problem. Since it is an optimisation problem, it
also strikes me as reasonable to consider it OK to special purpose and
specialise such links until you get an approach that's reasonable for
general purpose data.

In theory, poshmodule.sourceforge.net, with a bit of TLC would be a good
candidate or good candidate starting point for that optimisation work
(since it does work in Linux, contrary to a reply in the thread - I've not
tested it under windows :).

If someone's interested in building that, then someone redoing our MiniAxon
tutorial using processes  shared memory IPC rather than generators would
be a relatively gentle/structured approach to dealing with this:

   * http://www.kamaelia.org/MiniAxon/

The reason I suggest that is because any time we think about fiddling and
creating a new optimisation approach or concurrency approach, we tend to
build a MiniAxon prototype to flesh out the various issues involved.


Michael
--
http://www.kamaelia.org/Home

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Michael Sparks

Philip Semanchuk wrote:
 On Oct 25, 2008, at 7:53 AM, Michael Sparks wrote:
 Glenn Linderman wrote:
 In the module multiprocessing environment could you not use shared
 memory, then, for the large shared data items?

 If the poshmodule had a bit of TLC, it would be extremely useful for
 this,... http://poshmodule.sourceforge.net/
 
 Last time I checked that was Windows-only. Has that changed?

I've only tested it under Linux where it worked, but does clearly need a bit
of work :)

 The only IPC modules for Unix that I'm aware of are one which I
 adopted (for System V semaphores  shared memory) and one which I
 wrote (for POSIX semaphores  shared memory).
 
 http://NikitaTheSpider.com/python/shm/
 http://semanchuk.com/philip/posix_ipc/

I'll take a look at those - poshmodule does need a bit of TLC and doesn't
appear to be maintained.

 If anyone wants to wrap POSH cleverness around them, go for it! If
 not, maybe I'll make the time someday.

I personally don't have the time do do this, but I'd be very interested in
hearing someone building an up-to-date version. (Indeed, something like
this would be extremely useful for everyone to have in the standard library
now that the multiprocessing library is in the standard library)


Michael.
--
http://www.kamaelia.org/Home

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 26, 10:11 pm, James Mills [EMAIL PROTECTED]
wrote:
 On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara [EMAIL PROTECTED] wrote:
  I think we miscommunicated there--I'm actually agreeing with you.  I
  was trying to make the same point you were: that intricate and/or
  large structures are meant to be passed around by a top-level pointer,
  not using and serialization/messaging.  This is what I've been trying
  to explain to others here; that IPC and shared memory unfortunately
  aren't viable options, leaving app threads (rather than child
  processes) as the solution.

 Andy,

 Why don't you just use a temporary file
 system (ram disk) to store the data that
 your app is manipulating. All you need to
 pass around then is a file descriptor.

 --JamesMills

Unfortunately, it's the penalty of serialization and unserialization.
When you're talking about stuff like memory-resident images and video
(complete with their intricate and complex codecs), then the only
option is to be passing around a couple pointers rather then take the
hit of serialization (which is huge for video, for example).  I've
gone into more detail in some other posts but I could have missed
something.


Andy



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 27, 4:05 am, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Andy O'Meara wrote:



  Well, when you're talking about large, intricate data structures
  (which include opaque OS object refs that use process-associated
  allocators), even a shared memory region between the child process and
  the parent can't do the job.  Otherwise, please describe in detail how
  I'd get an opaque OS object (e.g. an OS ref that refers to memory-
  resident video) from the child process back to the parent process.

 WHAT PARENT PROCESS? In the same address space, to me, means
 a single process only, not multiple processes, and no parent process
 anywhere. If you have just multiple threads, the notion of passing
 data from a child process back to the parent process is
 meaningless.

I know...  I was just responding to you and others here keep beating
the fork drum.  I just trying make it clear that a shared address
space is the only way to go.  Ok, good, so we're in agreement that
threads is the only way to deal with the intricate and complex data
set issue in a performance-centric application.


  Again, the big picture that I'm trying to plant here is that there
  really is a serious need for truly independent interpreters/contexts
  in a shared address space.

 I understand that this is your mission in this thread. However, why
 is that your problem? Why can't you just use the existing (limited)
 multiple-interpreters machinery, and solve your problems with that?

Because then we're back into the GIL not permitting threads efficient
core use on CPU bound scripts running on other threads (when they
otherwise could).  Just so we're on the same page, when they
otherwise could is relevant here because that's the important given:
that each interpreter (context) truly never has any context with
others.

An example would be python scripts that generate video programatically
using an initial set of params and use an in-house C module to
construct frame (which in turn make and modify python C objects that
wrap to intricate codec related data structures).  Suppose you wanted
to render 3 of these at the same time, one on each thread (3
threads).  With the GIL in place, these threads can't anywhere close
to their potential.  Your response thus far is that the C module
should release the GIL before it commences its heavy lifting.  Well,
the problem is that if during its heavy lifting it needs to call back
into its interpreter.  It's turns out that this isn't an exotic case
at all: there's a *ton* of utility gained by making calls back into
the interpreter. The best example is that since code more easily
maintained in python than in C, a lot of the module utility code is
likely to be in python.  Unsurprisingly, this is the situation myself
and many others are in: where we want to subsequently use the
interpreter within the C module (so, as I understand it, the proposal
to have the C module release the GIL unfortunately doesn't work as a
general solution).


  For most
  industry-caliber packages, the expectation and convention (unless
  documented otherwise) is that the app can make as many contexts as its
  wants in whatever threads it wants because the convention is that the
  app is must (a) never use one context's objects in another context,
  and (b) never use a context at the same time from more than one
  thread.  That's all I'm really trying to look at here.

 And that's indeed the case for Python, too. The app can make as many
 subinterpreters as it wants to, and it must not pass objects from one
 subinterpreter to another one, nor should it use a single interpreter
 from more than one thread (although that is actually supported by
 Python - but it surely won't hurt if you restrict yourself to a single
 thread per interpreter).


I'm not following you there...  I thought we're all in agreement that
the existing C modules are FAR from being reentrant, regularly making
use of static/global objects. The point I had made before is that
other industry-caliber packages specifically don't have restrictions
in *any* way.

I appreciate your arguments these a PyC concept is a lot of work with
some careful design work, but let's not kill the discussion just
because of that.  The fact remains that the video encoding scenario
described above is a pretty reasonable situation, and as more people
are commenting in this thread, there's an increasing need to offer
apps more flexibility when it comes to multi-threaded use.


Andy




--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Greg Ewing


Glenn Linderman wrote:

So your 50% number is just a scare tactic, it would seem, based on wild 
guesses.  Was there really any benefit to the comment?


All I was really trying to say is that it would be a
mistake to assume that the overhead will be negligible,
as that would be just as much a wild guess as 50%.

--
Greg

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 25, 9:46 am, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 These discussion pop up every year or so and I think that most of them
 are not really all that necessary, since the GIL isn't all that bad.


Thing is, if the topic keeps coming up, then that may be an indicator
that change is truly needed.  Someone much wiser than me once shared
that a measure of the usefulness and quality of a package (or API) is
how easily it can be added to an application--of any flavors--without
the application needing to change.

So in the rising world of idle cores and worker threads, I do see an
increasing concern over the GIL.  Although I recognize that the debate
is lengthy, heated, and has strong arguments on both sides, my reading
on the issue makes me feel like there's a bias for the pro-GIL side
because of the volume of design and coding work associated with
considering various alternatives (such as Glenn's Py* concepts).
And I DO respect and appreciate where the pro-GIL people come from:
who the heck wants to do all that work and recoding so that a tiny
percent of developers can benefit?  And my best response is that as
unfortunate as it is, python needs to be more multi-threaded app-
friendly if we hope to attract the next generation of app developers
that want to just drop python into their app (and not have to change
their app around python).  For example, Lua has that property, as
evidenced by its rapidly growing presence in commercial software
(Blizzard uses it heavily, for example).


 Furthermore, there are lots of ways to tune the CPython VM to make
 it more or less responsive to thread switches via the various sys.set*()
 functions in the sys module.

 Most computing or I/O intense C extensions, built-in modules and object
 implementations already release the GIL for you, so it usually doesn't
 get in the way all that often.


The main issue I take there is that it's often highly useful for C
modules to make subsequent calls back into the interpreter. I suppose
the response to that is to call the GIL before reentry, but it just
seems to be more code and responsibility in scenarios where it's no
necessary.  Although that code and protocol may come easy to veteran
CPython developers, let's not forget that an important goal is to
attract new developers and companies to the scene, where they get
their thread-independent code up and running using python without any
unexpected reengineering.  Again, why are companies choosing Lua over
Python when it comes to an easy and flexible drop-in interpreter?  And
please take my points here to be exploratory, and not hostile or
accusatory, in nature.


Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 27, 10:55 pm, Glenn Linderman [EMAIL PROTECTED] wrote:


 And I think we still are miscommunicating!  Or maybe communicating anyway!

 So when you said object, I actually don't know whether you meant
 Python object or something else.  I assumed Python object, which may not
 have been correct... but read on, I think the stuff below clears it up.


 Then when you mentioned thousands of objects, I imagined thousands of
 Python objects, and somehow transforming the blob into same... and back
 again.  

My apologies to you and others here on my use of objects -- I'm use
the term generically and mean it to *not* refer to python objects (for
the all the reasons discussed here).  Python only makes up a small
part of our app, hence my habit of objects to refer to other APIs'
allocated and opaque objects (including our own and OS APIs).  For all
the reasons we've discussed, in our world, python objects don't travel
around outside of our python C modules -- when python objects need to
be passed to other parts of the app, they're converted into their non-
python (portable) equivalents (ints, floats, buffers, etc--but most of
the time, the objects are PyCObjects, so they can enter and leave a
python context with negligible overhead). I venture to say this is
pretty standard when any industry app uses a package (such as python),
for various reasons:
   - Portability/Future (e.g.  if we do decode to drop Python and go
with Lua, the changes are limited to only one region of code).
   - Sanity (having any API's objects show up in places far away
goes against easy-to-follow code).
   - MT flexibility (because we always never use static/global
storage, we have all kinds of options when it comes to
multithreading).  For example, recall that by throwing python in
multiple dynamic libs, we were able to achieve the GIL-less
interpreter independence that we want (albeit ghetto and a pain).



Andy



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Rhamphoryncus

On Oct 28, 9:30 am, Andy O'Meara [EMAIL PROTECTED] wrote:
 On Oct 25, 9:46 am, M.-A. Lemburg [EMAIL PROTECTED] wrote:

  These discussion pop up every year or so and I think that most of them
  are not really all that necessary, since the GIL isn't all that bad.

 Thing is, if the topic keeps coming up, then that may be an indicator
 that change is truly needed.  Someone much wiser than me once shared
 that a measure of the usefulness and quality of a package (or API) is
 how easily it can be added to an application--of any flavors--without
 the application needing to change.

 So in the rising world of idle cores and worker threads, I do see an
 increasing concern over the GIL.  Although I recognize that the debate
 is lengthy, heated, and has strong arguments on both sides, my reading
 on the issue makes me feel like there's a bias for the pro-GIL side
 because of the volume of design and coding work associated with
 considering various alternatives (such as Glenn's Py* concepts).
 And I DO respect and appreciate where the pro-GIL people come from:
 who the heck wants to do all that work and recoding so that a tiny
 percent of developers can benefit?  And my best response is that as
 unfortunate as it is, python needs to be more multi-threaded app-
 friendly if we hope to attract the next generation of app developers
 that want to just drop python into their app (and not have to change
 their app around python).  For example, Lua has that property, as
 evidenced by its rapidly growing presence in commercial software
 (Blizzard uses it heavily, for example).



  Furthermore, there are lots of ways to tune the CPython VM to make
  it more or less responsive to thread switches via the various sys.set*()
  functions in the sys module.

  Most computing or I/O intense C extensions, built-in modules and object
  implementations already release the GIL for you, so it usually doesn't
  get in the way all that often.

 The main issue I take there is that it's often highly useful for C
 modules to make subsequent calls back into the interpreter. I suppose
 the response to that is to call the GIL before reentry, but it just
 seems to be more code and responsibility in scenarios where it's no
 necessary.  Although that code and protocol may come easy to veteran
 CPython developers, let's not forget that an important goal is to
 attract new developers and companies to the scene, where they get
 their thread-independent code up and running using python without any
 unexpected reengineering.  Again, why are companies choosing Lua over
 Python when it comes to an easy and flexible drop-in interpreter?  And
 please take my points here to be exploratory, and not hostile or
 accusatory, in nature.

 Andy

Okay, here's the bottom line:
* This is not about the GIL.  This is about *completely* isolated
interpreters; most of the time when we want to remove the GIL we want
a single interpreter with lots of shared data.
* Your use case, although not common, is not extraordinarily rare
either.  It'd be nice to support.
* If CPython had supported it all along we would continue to maintain
it.
* However, since it's not supported today, it's not worth the time
invested, API incompatibility, and general breakage it would imply.
* Although it's far more work than just solving your problem, if I
were to remove the GIL I'd go all the way and allow shared objects.

So there's really only two options here:
* get a short-term bodge that works, like hacking the 3rd party
library to use your shared-memory allocator.  Should be far less work
than hacking all of CPython.
* invest yourself in solving the *entire* problem (GIL removal with
shared python objects).
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Martin v. Löwis

 Because then we're back into the GIL not permitting threads efficient
 core use on CPU bound scripts running on other threads (when they
 otherwise could).

Why do you think so? For C code that is carefully written, the GIL
allows *very well* to write CPU bound scripts running on other threads.
(please do get back to Jesse's original remark in case you have lost
the thread :-)

 An example would be python scripts that generate video programatically
 using an initial set of params and use an in-house C module to
 construct frame (which in turn make and modify python C objects that
 wrap to intricate codec related data structures).  Suppose you wanted
 to render 3 of these at the same time, one on each thread (3
 threads).  With the GIL in place, these threads can't anywhere close
 to their potential.  Your response thus far is that the C module
 should release the GIL before it commences its heavy lifting.  Well,
 the problem is that if during its heavy lifting it needs to call back
 into its interpreter.

So it should reacquire the GIL then. Assuming the other threads
all do their heavy lifting, it should immediately get the GIL,
fetch some data, release the GIL, and continue to do heavy lifting.
If it's truly CPU-bound, I hope it doesn't spend most of its time
in Python API, but in true computation.

 It's turns out that this isn't an exotic case
 at all: there's a *ton* of utility gained by making calls back into
 the interpreter. The best example is that since code more easily
 maintained in python than in C, a lot of the module utility code is
 likely to be in python.

You should really reconsider writing performance-critical code in
Python. Regardless of the issue under discussion, a lot of performance
can be gained by using flattened data structures, less pointer,
less reference counting, less objects, and so on - in the inner loops
of the computation. You didn't reveal what *specific* computation you
perform, so it's difficult to give specific advise.

 Unsurprisingly, this is the situation myself
 and many others are in: where we want to subsequently use the
 interpreter within the C module (so, as I understand it, the proposal
 to have the C module release the GIL unfortunately doesn't work as a
 general solution).

Not if you do the actual computation in Python, no. However, this
subthread started with Jesse's remark that you *can* release the GIL
in C code.

Again, if you do heavy-lifting in Python, you should consider to rewrite
the performance-critical parts in C. You may find that the need for
multiple CPUs goes even away.

 I appreciate your arguments these a PyC concept is a lot of work with
 some careful design work, but let's not kill the discussion just
 because of that.

Any discussion in this newsgroup is futile, except when it either
a) leads to a solution that is already possible, and the OP didn't
envision, or
b) is followed up by code contributions from one of the participants.

If neither is likely to result, killing the discussion is the most
productive thing we can do.

Regards,
Maritn
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-27 Thread Martin v. Löwis

Andy O'Meara wrote:
 On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
 A c-level module, on the other hand, can sidestep/release
 the GIL at will, and go on it's merry way and process away.
 ...Unless part of the C module execution involves the need do CPU-
 bound work on another thread through a different python interpreter,
 right?
 Wrong.
[...]
 
 So I think the disconnect here is that maybe you're envisioning
 threads being created *in* python.  To be clear, we're talking out
 making threads at the app level and making it a given for the app to
 take its safety in its own hands.

No. Whether or not threads are created by Python or the application
does not matter for my Wrong evaluation: in either case, C module
execution can easily side-step/release the GIL.

 As far as I can tell, it seems
 CPython's current state can't CPU bound parallelization in the same
 address space.
 That's not true.

 
 Well, when you're talking about large, intricate data structures
 (which include opaque OS object refs that use process-associated
 allocators), even a shared memory region between the child process and
 the parent can't do the job.  Otherwise, please describe in detail how
 I'd get an opaque OS object (e.g. an OS ref that refers to memory-
 resident video) from the child process back to the parent process.

WHAT PARENT PROCESS? In the same address space, to me, means
a single process only, not multiple processes, and no parent process
anywhere. If you have just multiple threads, the notion of passing
data from a child process back to the parent process is
meaningless.

 Again, the big picture that I'm trying to plant here is that there
 really is a serious need for truly independent interpreters/contexts
 in a shared address space.

I understand that this is your mission in this thread. However, why
is that your problem? Why can't you just use the existing (limited)
multiple-interpreters machinery, and solve your problems with that?

 For most
 industry-caliber packages, the expectation and convention (unless
 documented otherwise) is that the app can make as many contexts as its
 wants in whatever threads it wants because the convention is that the
 app is must (a) never use one context's objects in another context,
 and (b) never use a context at the same time from more than one
 thread.  That's all I'm really trying to look at here.

And that's indeed the case for Python, too. The app can make as many
subinterpreters as it wants to, and it must not pass objects from one
subinterpreter to another one, nor should it use a single interpreter
from more than one thread (although that is actually supported by
Python - but it surely won't hurt if you restrict yourself to a single
thread per interpreter).

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Martin v. Löwis

 As far as I can tell, it seems
 CPython's current state can't CPU bound parallelization in the same
 address space.
 That's not true.

 
 Um...  So let's say you have a opaque object ref from the OS that
 represents hundreds of megs of data (e.g. memory-resident video).  How
 do you get that back to the parent process without serialization and
 IPC?

What parent process? I thought you were talking about multi-threading?

 What should really happen is just use the same address space so
 just a pointer changes hands.  THAT's why I'm saying that a separate
 address space is  generally a deal breaker when you have large or
 intricate data sets (ie. when performance matters).

Right. So use a single address space, multiple threads, and perform the
heavy computations in C code. I don't see how Python is in the way at
all. Many people do that, and it works just fine. That's what
Jesse (probably) meant with his remark

 A c-level module, on the other hand, can sidestep/release
 the GIL at will, and go on it's merry way and process away.

Please reconsider this; it might be a solution to your problem.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re:: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Hendrik van Rooyen

Andy O'Meara  Wrote:

Um...  So let's say you have a opaque object ref from the OS that
represents hundreds of megs of data (e.g. memory-resident video).  How
do you get that back to the parent process without serialization and
IPC?  What should really happen is just use the same address space so
just a pointer changes hands.  THAT's why I'm saying that a separate
address space is  generally a deal breaker when you have large or
intricate data sets (ie. when performance matters).

You can try to assign the buffer in the shared memory space, that can
be managed by Nikita the Spider's shm module.

Then you can implement what would be essentially a systolic array
structure, passing the big buffer along to the processes who
may, or may not, be running on different processors, to do whatever
magic each process has to do, to complete the whole transformation.

(filter, fft, decimation, compression, mpeg, whatever...)

aside
This may be faster than forking an OS thread - don't subprocesses get
a COPY of the parent's environment?
\aside

But this will give you only one process running at a time, as you
can't do stuff simultaneously to the same data.

So you will need to split a real big ram area into your big buffers so
that each of the processes you contemplate running seperately can be given
one 100 M area (out of the shared big one) to own to do its magic on.
When it is finished, it passes the ownership back, and the block is
assigned to the next process in the sequence, while a new block from
the OS is assigned to the first process, and so on.

So you still have shared ram IPC, but there is no serialisation.
And you don't move the data, unless you want to. You can update
or twiddle in place. Its the serialisation that kills the performance.
And the pointers can be passed by the same mechanism, if I 
understand what shm does after a quick look.

So you can build a real ripsnorter - it rips this, while it
snorts the previous and tears the antepenultimate...

- Hendrik



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Andy O'Meara


Grrr... I posted a ton of lengthy replies to you and other recent
posts here using Google and none of them made it, argh. Poof. There's
nothing that fires more up more than lost work,  so I'll have to
revert short and simple answers for the time being.  Argh, damn.


On Oct 25, 1:26 am, greg [EMAIL PROTECTED] wrote:
 Andy O'Meara wrote:
  I would definitely agree if there was a context (i.e. environment)
  object passed around then perhaps we'd have the best of all worlds.

 Moreover, I think this is probably the *only* way that
 totally independent interpreters could be realized.

 Converting the whole C API to use this strategy would be
 a very big project. Also, on the face of it, it seems like
 it would render all existing C extension code obsolete,
 although it might be possible to do something clever with
 macros to create a compatibility layer.

 Another thing to consider is that passing all these extra
 pointers around everywhere is bound to have some effect
 on performance.


I'm with you on all counts, so no disagreement there.  On the passing
a ptr everywhere issue, perhaps one idea is that all objects could
have an additional field that would point back to their parent context
(ie. their interpreter).  So the only prototypes that would have to be
modified to contain the context ptr would be the ones that don't
inherently operate on objects (e.g. importing a module).


On Oct 25, 1:54 am, greg [EMAIL PROTECTED] wrote:
 Andy O'Meara wrote:
  - each worker thread makes its own interpreter, pops scripts off a
  work queue, and manages exporting (and then importing) result data to
  other parts of the app.

 I hope you realize that starting up one of these interpreters
 is going to be fairly expensive. It will have to create its
 own versions of all the builtin constants and type objects,
 and import its own copy of all the modules it uses.


Yeah, for sure. And I'd say that's a pretty well established
convention already out there for any industry package.  The pattern
I'd expect to see is where the app starts worker threads, starts
interpreters in one or more of each, and throws jobs to different ones
(and the interpreter would persist to move on to subsequent jobs).

 One wonders if it wouldn't be cheaper just to fork the
 process. Shared memory can be used to transfer large lumps
 of data if needed.


As I mentioned, wen you're talking about intricate data structures, OS
opaque objects (ie. that have their own internal allocators), or huge
data sets, even a shared memory region unfortunately  can't fit the
bill.


Andy
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Andy O'Meara

On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  A c-level module, on the other hand, can sidestep/release
  the GIL at will, and go on it's merry way and process away.

  ...Unless part of the C module execution involves the need do CPU-
  bound work on another thread through a different python interpreter,
  right?

 Wrong.


Let's take a step back and remind ourselves of the big picture.  The
goal is to have independent interpreters running in pthreads that the
app starts and controls.  Each interpreter never at any point is doing
any thread-related stuff in any way.  For example, each script job
just does meat an potatoes CPU work, using callbacks that, say,
programatically use OS APIs to edit and transform frame data.

So I think the disconnect here is that maybe you're envisioning
threads being created *in* python.  To be clear, we're talking out
making threads at the app level and making it a given for the app to
take its safety in its own hands.




  As far as I can tell, it seems
  CPython's current state can't CPU bound parallelization in the same
  address space.

 That's not true.


Well, when you're talking about large, intricate data structures
(which include opaque OS object refs that use process-associated
allocators), even a shared memory region between the child process and
the parent can't do the job.  Otherwise, please describe in detail how
I'd get an opaque OS object (e.g. an OS ref that refers to memory-
resident video) from the child process back to the parent process.

Again, the big picture that I'm trying to plant here is that there
really is a serious need for truly independent interpreters/contexts
in a shared address space.  Consider stuff like libpng, zlib, ipgjpg,
or whatever, the use pattern is always the same: make a context
object, do your work in the context, and take it down.  For most
industry-caliber packages, the expectation and convention (unless
documented otherwise) is that the app can make as many contexts as its
wants in whatever threads it wants because the convention is that the
app is must (a) never use one context's objects in another context,
and (b) never use a context at the same time from more than one
thread.  That's all I'm really trying to look at here.


Andy




--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Andy O'Meara



  And in the case of hundreds of megs of data

 ... and I would be surprised at someone that would embed hundreds of
 megs of data into an object such that it had to be serialized... seems
 like the proper design is to point at the data, or a subset of it, in a
 big buffer.  Then data transfers would just transfer the offset/length
 and the reference to the buffer.

  and/or thousands of data structure instances,

 ... and this is another surprise!  You have thousands of objects (data
 structure instances) to move from one thread to another?


I think we miscommunicated there--I'm actually agreeing with you.  I
was trying to make the same point you were: that intricate and/or
large structures are meant to be passed around by a top-level pointer,
not using and serialization/messaging.  This is what I've been trying
to explain to others here; that IPC and shared memory unfortunately
aren't viable options, leaving app threads (rather than child
processes) as the solution.


 Of course, I know that data get large, but typical multimedia streams
 are large, binary blobs.  I was under the impression that processing
 them usually proceeds along the lines of keeping offsets into the blobs,
 and interpreting, etc.  Editing is usually done by making a copy of a
 blob, transforming it or a subset in some manner during the copy
 process, resulting in a new, possibly different-sized blob.


Your instincts are right.  I'd only add on that when you're talking
about data structures associated with an intricate video format, the
complexity and depth of the data structures is insane -- the LAST
thing you want to burn cycles on is serializing and unserializing that
stuff (so IPC is out)--again, we're already on the same page here.

I think at one point you made the comment that shared memory is a
solution to handle large data sets between a child process and the
parent.  Although this is certainty true in principle, it doesn't hold
up in practice since complex data structures often contain 3rd party
and OS API objects that have their own allocators.  For example, in
video encoding, there's TONS of objects that comprise memory-resident
video from all kinds of APIs, so the idea of having them allocated
from shared/mapped memory block isn't even possible. Again, I only
raise this to offer evidence that doing real-world work in a child
process is a deal breaker--a shared address space is just way too much
to give up.


Andy
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread James Mills

On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara [EMAIL PROTECTED] wrote:
 I think we miscommunicated there--I'm actually agreeing with you.  I
 was trying to make the same point you were: that intricate and/or
 large structures are meant to be passed around by a top-level pointer,
 not using and serialization/messaging.  This is what I've been trying
 to explain to others here; that IPC and shared memory unfortunately
 aren't viable options, leaving app threads (rather than child
 processes) as the solution.

Andy,

Why don't you just use a temporary file
system (ram disk) to store the data that
your app is manipulating. All you need to
pass around then is a file descriptor.

--JamesMills

-- 
--
-- Problems are solved by method
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


Andy O'Meara wrote:


- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app.


I hope you realize that starting up one of these interpreters
is going to be fairly expensive. It will have to create its
own versions of all the builtin constants and type objects,
and import its own copy of all the modules it uses.

One wonders if it wouldn't be cheaper just to fork the
process. Shared memory can be used to transfer large lumps
of data if needed.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


Glenn Linderman wrote:

If Py_None corresponds to None in Python syntax ... then 
it is a fixed constant and could be left global, probably.


No, it couldn't, because it's a reference-counted object
like any other Python object, and therefore needs to be
protected against simultaneous refcount manipulation by
different threads. So each interpreter would need its own
instance of Py_None.

The same goes for all the other built-in constants and
type objects -- there are dozens of these.


The cost is one more push on every function call,


Which sounds like it could be a rather high cost! If
(just a wild guess) each function has an average of 2
parameters, then this is increasing the amount of
argument pushing going on by 50%...


On many platforms, there is the concept of TLS, or thread-local storage.


That's another possibility, although doing it that
way would require you to have a separate thread for
each interpreter, which you mightn't always want.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


Andy O'Meara wrote:


In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space.


Have you considered using shared memory?

Using mmap or equivalent, you can arrange for a block of
memory to be shared between processes. Then you can dump
the big lump of data to be transferred in there, and send
a short message through a pipe to the other process to
let it know it's there.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


Rhamphoryncus wrote:

A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.


Type objects contain dicts, which allow arbitrary values
to be stored in them. What happens if one thread puts
a private object in there? It becomes visible to other
threads using the same type object. If it's not safe
for sharing, bad things happen.

Python's data model is not conducive to making a clear
distinction between private and shared objects,
except at the level of an entire interpreter.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Martin v. Löwis

 If Py_None corresponds to None in Python syntax (sorry I'm not familiar
 with Python internals yet; glad you are commenting, since you are), then
 it is a fixed constant and could be left global, probably.

If None remains global, then type(None) also remains global, and
type(None),__bases__[0]. Then type(None).__bases__[0].__subclasses__()
will yield interesting results. This is essentially the status quo.

 But if we
 want a separate None for each interpreter, or if we just use Py_None as
 an example global variable to use to answer the question then here goes

There are a number of problems with that approach. The biggest one is
that it is theoretical. Of course I'm aware of thread-local variables,
and the abstract possibility of collecting all global variables in
a single data structure (in fact, there is already an interpreter
structure and per-interpreter state in Python). I wasn't claiming that
it was impossible to solve that problem - just that it is not simple.
If you want to find out what all the problems are, please try
implementing it for real.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Hi Andy,


Andy wrote:

 However, we require true thread/interpreter
 independence so python 2 has been frustrating at time, to say the
 least.  Please don't start with but really, python supports multiple
 interpreters because I've been there many many times with people.
 And, yes, I'm aware of the multiprocessing module added in 2.6, but
 that stuff isn't lightweight and isn't suitable at all for many
 environments (including ours).

This is a very conflicting set of statements and whilst you appear to be
extremely clear on what you want here, and why multiprocessing, and
associated techniques are not appropriate, this does sound very
conflicting. I'm guessing I'm not the only person who finds this a
little odd.

Based on the size of the thread, having read it all, I'm guessing also
that you're not going to have an immediate solution but a work around.
However, also based on reading it, I think it's a usecase that would be
generally useful in embedding python.

So, I'll give it a stab as to what I think you're after.

The scenario as I understand it is this:
* You have an application written in C,C++ or similar.
* You've been providing users the ability to script it or customise it
  in some fashion using scripts.

Based on the conversation:
* This worked well, and you really liked the results, but...
* You only had one interpreter embedded in the system
* You were allowing users to use multiple scripts

Suddenly you go from: Single script, single memory space.
To multiple scripts, unconstrained shared shared memory space.

That then causes pain for you and your users. So as a result, you decided to
look for this scenario:
* A mechanism that allows each script to think it's the only script
  running on the python interpreter.
* But to still have only one embedded instance of the interpreter.
* With the primary motivation to eliminate the unconstrained shared
  memory causing breakage to your software.

So, whilst the multiprocessing module gives you this:
* With the primary motivation to eliminate the unconstrained shared
  memory causing breakage to your software.

It's (for whatever reason) too heavyweight for you, due to the multiprocess
usage. At a guess the reason for this is because you allow the user to run
lots of these little scripts.

Essentially what this means is that you want green processes.

One workaround of achieving that may be to find a way to force threads in
python to ONLY be allowed access to (and only update) thread local values,
rather than default to shared values.

The reason I say that, is because the closest you get to green processes in
python at the moment is /inside/ a python generator. It's nowhere near the
level you want, but it's what made me think of the idea of green processes.

Specifically if you have the canonical example of a python generator:

def fib():
a,b = 1,1
while 1:
a,b = b, a+b
yield 1

Then no matter how many times I run that, the values are local, and can't
impact each other. Now clearly this isn't what you want, but on some level
it's *similar*.

You want to be able to do:
run(this_script)

and then when (this_script) is running only use a local environment.

Now, if you could change the threading API, such that there was a means of
forcing all value lookups to look in thread local store before looking
outside the thread local store [1], then this would give you a much greater
level of safety.

[1] I don't know if there is or isn't I've not been sufficiently interested
to look...

I suspect that this would also be a very nice easy win for many
multi-threaded applications as well, reducing accidental data sharing.

Indeed, reversing things such that rather than doing this:
   myLocal = threading.local()
   myLocal.X = 5

Allowing a thread to force the default to be the other way round:
   systemGlobals = threading.globals()
   systemGlobals = 5

Would make a big difference. Furthermore, it would also mean that the
following:
   import MyModule
   from MyOtherModule import whizzy thing

I don't know if such a change would be sufficient to stop the python
interpreter going bang for extension modules though :-)

I suspect also that this change, whilst potentially fraught with
difficulties, would be incredibly useful in python implementations
that are GIL-free (such as Jython or IronPython)

Now, this for me is entirely theoretical because I don't know much about
python's threading implementation (because I've never needed to), but it
does seem to me to be the easier win than looking for truly independent
interpreters...

It would also be more generally useful, since it would make accidental
sharing of data (which is where threads really hurt people most) much
harder.

Since it was raised in the thread, I'd like to say use Kamaelia, but your
usecase is slightly different as I understand it. You want to take existing
stuff that won't be written in any particular way, to

Re: 2.6, 3.0, and truly independent intepreters

Andy O'Meara wrote:

 Yeah, that's the idea--let the highest levels run and coordinate the
 show.

Yes, this works really well in python and it's lots of fun. We've found so
far you need at minimum the following parts to a co-ordination little
language:

Pipeline
Graphline
Carousel
Seq
OneShot
PureTransformer
TPipe
Filter
Backplane
PublishTo
SubscribeTo

The interesting thing to me about this is in most systems these would be
patterns of behaviour in activities, whereas in python/kamaelia these are
concrete things you can drop things into. As you'd expect this all becomes
highly declarative.

In practice the world is slightly messier than a theoretical document would
like to suggest, primarily because if you consider things like pygame,
sometimes you have only have a resource instantiated once in a single
process. So you do need a mechanism for advertising services inside a
process and looking those up. (The Backplane idea though helps with
wrapping those up a lot I admit, for certain sorts of service :)

And sometimes you do need to just share data, and when you do that's when
STM is useful.

But concurrent python systems are fun to build :-)


Michael.
-- 
http://www.kamaelia.org/GetKamaelia

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Glenn Linderman wrote:

 In the module multiprocessing environment could you not use shared
 memory, then, for the large shared data items?

If the poshmodule had a bit of TLC, it would be extremely useful for this,
since it does (surprisingly) still work with python 2.5, but does need a
bit of TLC to make it usable.

http://poshmodule.sourceforge.net/


Michael
--
http://www.kamaelia.org/GetKamaelia
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Andy O'Meara wrote:

 basically, it seems that we're talking about the
 embarrassingly parallel scenario raised in that paper

We build applications in Kamaelia and then discover afterwards that they're
embarrassingly parallel and just work. (we have an introspector that can
look inside running systems and show us the structure that's going on -
very useful for debugging)

My current favourite example of this is a tool created to teaching small
children to read and write:
   http://www.kamaelia.org/SpeakAndWrite

Uses gesture recognition and speech synthesis, has a top level view of
around 15 concurrent components, with signficant numbers of nested ones.

(OK, that's not embarrasingly parallel since it's only around 50 things, but
the whiteboard with around 200 concurrent things, is)

The trick is to stop viewing concurrency as the problem, but to find a way
to use it as a tool for making it easier to write code. That program was a
10 hour or so hack. You end up focussing on the problem you want to solve,
and naturally gain a concurrent friendly system.

Everything else (GIL's, shared memory etc) then just becomes an
optimisation problem - something only to be done if you need it.

My previous favourite examples were based around digital TV, or user
generated content transcode pipelines.

My reason for preferring the speak and write at the moment is because its a
problem you wouldn't normally think of as benefitting from concurrency,
when in this case it benefitted by being made easier to write in the first
place.

Regards,



Michael
--
http://www.kamaelia.org/GetKamaelia

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Jesse Noller wrote:

 http://www.kamaelia.org/Home

Thanks for the mention :)

I don't think it's a good fit for the original poster's question, but a
solution to the original poster's question would be generally useful IMO,
_especially_ on python implementations without a GIL (where threads are the
more natural approach to using multiple processes  multiple processors).

The approach I think would be useful would perhaps by allowing python to
have some concept of green processes - that is threads that can only see
thread local values or they search/update thread local space before
checking globals, ie flipping

   X = threading.local()
   X.foo = bar

To something like:
   X = greenprocesses.shared()
   X.foo = bar

Or even just changing the search for values from:
   * Search local context
   * Search global context

To:
   * Search thread local context
   * Search local context
   * Search global context

Would probably be quite handy, and eliminate whole classes of bugs for
people using threads. (Probably introduce all sorts of new ones of course,
but perhaps easier to isolate ones)

However, I suspect this is also *a lot* easier to say than to implement :-)

(that said, I did hack on the python internals once (cf pep 318) so it might
be quite pleasant to try)

It's also independent of any discussions regarding the GIL of course since
it would just make life generally safer for people.

BTW, regarding Kamaelia - regarding something you said on your blog - whilst
the components list on /Components looks like a large amount of extra stuff
you have to comprehend to use, you don't. (The interdependency between
components is actually very low.)

The core that someone needs to understand is the contents of this:
http://www.kamaelia.org/MiniAxon/

Which is sufficient to get someone started. (based on testing with a couple
of dozen novice developers now :)

If someone doesn't want to rewrite their app to be kamaelia based, they can
cherry pick stuff, by running kamaelia's scheduler in the background and
using components in a file-handle like fashion:
* http://www.kamaelia.org/AxonHandle

The reason /Components contains all those things isn't because we're trying
to make it into a swiss army knife, it's because it's been useful in
domains that have generated those components which are generally
reusable :-)



Michael.
--
http://www.kamaelia.org/GetKamaelia

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread M.-A. Lemburg

These discussion pop up every year or so and I think that most of them
are not really all that necessary, since the GIL isn't all that bad.

Some pointers into the past:

 * http://effbot.org/pyfaq/can-t-we-get-rid-of-the-global-interpreter-lock.htm
   Fredrik on the GIL

 * http://mail.python.org/pipermail/python-dev/2000-April/003605.html
   Greg Stein's proposal to move forward on free threading

 * 
http://www.sauria.com/~twl/conferences/pycon2005/20050325/Python%20at%20Google.notes
   (scroll down to the QA section)
   Greg Stein on whether the GIL really does matter that much

Furthermore, there are lots of ways to tune the CPython VM to make
it more or less responsive to thread switches via the various sys.set*()
functions in the sys module.

Most computing or I/O intense C extensions, built-in modules and object
implementations already release the GIL for you, so it usually doesn't
get in the way all that often.

So you have the option of using a single process with multiple
threads, allowing efficient sharing of data. Or you use multiple
processes and OS mechanisms to share data (shared memory, memory
mapped files, message passing, pipes, shared file descriptors, etc.).

Both have their pros and cons.

There's no general answer to the
problem of how to make best use of multi-core processors, multiple
linked processors or any of the more advanced parallel processing
mechanisms (http://en.wikipedia.org/wiki/Parallel_computing).
The answers will always have to be application specific.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2008)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Terry Reedy


Glenn Linderman wrote:
On approximately 10/24/2008 8:39 PM, came the following characters from 
the keyboard of Terry Reedy:

Glenn Linderman wrote:

For example, Python presently has a rather stupid algorithm for 
string concatenation.


Yes, CPython2.x, x=5 did.

Python the language has syntax and semantics.  Python implementations 
have algorithms that fulfill the defined semantics.


I can buy that, but when Python is not qualified, CPython should be 
assumed, as it predominates.


People do that, and it sometimes leads to unnecessary confusion.  As to 
the present discussion, is it about

* changing Python, the language
* changing all Python implementations
* changing CPython, the leading implementation
* branching CPython with a compiler switch, much as there was one for 
including Unicode or not.

* forking CPython
* modifying an existing module
* adding a new module
* making better use of the existing facilities
* some combination of the above

 Of course, the latest official release

should probably also be assumed, but that is so recent,


People do that, and it sometimes leads to unnecessary confusion.  People 
routine posted version specific problems and questions without 
specifying the version (or platform when relevant).  In a month or so, 
there will be *2* latest official releases.  There will be more 
confusion without qualification.


few have likely 
upgraded as yet... I should have qualified the statement.


* Is the target of this discussion 2.7 or 3.1 (some changes would be 3.1 
only).


[diversion to the side topic]

If there is more than one reference to a guaranteed immutable object, 
such as a string, the 'stupid' algorithm seem necessary to me.  
In-place modification of a shared immutable would violate semantics.


Absolutely.  But after the first iteration, there is only one reference 
to string.


Which is to say, 'string' is the only reference to its object it refers 
too.  You are right, so I presume that the optimization described would 
then kick in.  But I have not read the code, and CPython optimizations 
are not part of the *language* reference.


[back to the main topic]

There is some discussion/debate/confusion about how much of the stdlib 
is 'standard Python library' versus 'standard CPython library'.  [And 
there is some feeling that standard Python modules should have a default 
Python implementation that any implementation can use until it 
optionally replaces it with a faster compiled version.]  Hence my 
question about the target of this discussion and the first three options 
listed above.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Philip Semanchuk



On Oct 25, 2008, at 7:53 AM, Michael Sparks wrote:


Glenn Linderman wrote:


In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?


If the poshmodule had a bit of TLC, it would be extremely useful for  
this,
since it does (surprisingly) still work with python 2.5, but does  
need a

bit of TLC to make it usable.

http://poshmodule.sourceforge.net/


Last time I checked that was Windows-only. Has that changed?

The only IPC modules for Unix that I'm aware of are one which I  
adopted (for System V semaphores  shared memory) and one which I  
wrote (for POSIX semaphores  shared memory).


http://NikitaTheSpider.com/python/shm/
http://semanchuk.com/philip/posix_ipc/


If anyone wants to wrap POSH cleverness around them, go for it! If  
not, maybe I'll make the time someday.


Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Martin v. Löwis

 There are a number of problems with that approach. The biggest one is
 that it is theoretical. 
 
 Not theoretical.  Used successfully in Perl. 

Perhaps it is indeed what Perl does, I know nothing about that.
However, it *is* theoretical for Python. Please trust me that
there are many many many many pitfalls in it, each needing a
separate solution, most likely with no equivalent in Perl.

If you had a working patch, *then* it would be practical.

 Granted Perl is quite a
 different language than Python, but then there are some basic
 similarities in the concepts.

Yes - just as much as both are implemented in C :-(

 Perhaps you should list the problems, instead of vaguely claiming that
 there are a number of them.  Hard to respond to such a vague claim.

As I said: go implement it, and you will find out. Unless you are
really going at an implementation, I don't want to spend my time
explaining it to you.

 But the approach is sound; nearly any monolithic
 program can be turned into a multithreaded program containing one
 monolith per thread using such a technique.

I'm not debating that. I just claim that it is far from simple.

Regards,
Martin


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  A c-level module, on the other hand, can sidestep/release
  the GIL at will, and go on it's merry way and process away.

  ...Unless part of the C module execution involves the need do CPU-
  bound work on another thread through a different python interpreter,
  right?

 Wrong.

  (even if the interpreter is 100% independent, yikes).

 Again, wrong.

  For
  example, have a python C module designed to programmatically generate
  images (and video frames) in RAM for immediate and subsequent use in
  animation.  Meanwhile, we'd like to have a pthread with its own
  interpreter with an instance of this module and have it dequeue jobs
  as they come in (in fact, there'd be one of these threads for each
  excess core present on the machine).

 I don't understand how this example involves multiple threads. You
 mention a single thread (running the module), and you mention designing
 a  module. Where is the second thread?

Glenn seems to be following me here...  The point is to have any many
threads as the app wants, each in it's own world, running without
restriction (performance wise).  Maybe the app wants to run a thread
for each extra core on the machine.

Perhaps the disconnect here is that when I've been saying start a
thread, I mean the app starts an OS thread (e.g. pthread) with the
given that any contact with other threads is managed at the app level
(as opposed to starting threads through python).  So, as far as python
knows, there's zero mention or use of threading in any way,
*anywhere*.


  As far as I can tell, it seems
  CPython's current state can't CPU bound parallelization in the same
  address space.

 That's not true.


Um...  So let's say you have a opaque object ref from the OS that
represents hundreds of megs of data (e.g. memory-resident video).  How
do you get that back to the parent process without serialization and
IPC?  What should really happen is just use the same address space so
just a pointer changes hands.  THAT's why I'm saying that a separate
address space is  generally a deal breaker when you have large or
intricate data sets (ie. when performance matters).

Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 9:40 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  It seems to me that the very simplest move would be to remove global
  static data so the app could provide all thread-related data, which
  Andy suggests through references to the QuickTime API. This would
  suggest compiling python without thread support so as to leave it up
  to the application.

 I'm not sure whether you realize that this is not simple at all.
 Consider this fragment

     if (string == Py_None || index = state-lastmark ||
 !state-mark[index] || !state-mark[index+1]) {
         if (empty)
             /* want empty string */
             i = j = 0;
         else {
             Py_INCREF(Py_None);
             return Py_None;



The way to think about is that, ideally in PyC, there are never any
global variables.  Instead, all globals are now part of a context
(ie. a interpreter) and it would presumably be illegal to ever use
them in a different context. I'd say this is already the expectation
and convention for any modern, industry-grade software package
marketed as extension for apps.  Industry app developers just want to
drop in a 3rd party package, make as many contexts as they want (in as
many threads as they want), and expect to use each context without
restriction (since they're ensuring contexts never interact with each
other).  For example, if I use zlib, libpng, or libjpg, I can make as
many contexts as I want and put them in whatever threads I want.  In
the app, the only thing I'm on the hook for is to: (a) never use
objects from one context in another context, and (b) ensure that I'm
never make any calls into a module from more than one thread at the
same time.  Both of these requirements are trivial to follow in the
embarrassingly easy parallelization scenarios, and that's why I
started this thread in the first place.  :^)

Andy



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 10:24 pm, Glenn Linderman [EMAIL PROTECTED] wrote:

  And in the case of hundreds of megs of data

 ... and I would be surprised at someone that would embed hundreds of
 megs of data into an object such that it had to be serialized... seems
 like the proper design is to point at the data, or a subset of it, in a
 big buffer.  Then data transfers would just transfer the offset/length
 and the reference to the buffer.

  and/or thousands of data structure instances,

 ... and this is another surprise!  You have thousands of objects (data
 structure instances) to move from one thread to another?

Heh, no, we're actually in agreement here.  I'm saying that in the
case where the data sets are large and/or intricate, a single top-
level pointer changing hands is *always* the way to go rather than
serialization.  For example, suppose you had some nifty python code
and C procs that were doing lots of image analysis, outputting tons of
intricate and rich data structures.  Once the thread is done with that
job, all that output is trivially transferred back to the appropriate
thread by a pointer changing hands.


 Of course, I know that data get large, but typical multimedia streams
 are large, binary blobs.  I was under the impression that processing
 them usually proceeds along the lines of keeping offsets into the blobs,
 and interpreting, etc.  Editing is usually done by making a copy of a
 blob, transforming it or a subset in some manner during the copy
 process, resulting in a new, possibly different-sized blob.

No, you're definitely right-on, with the the additional point that the
representation of multimedia usually employs intricate and diverse
data structures (imagine the data structure representation of a movie
encoded in modern codec, such as H.264, complete with paths, regions,
pixel flow, geometry, transformations, and textures).  As we both
agree, that's something that you *definitely* want to move around via
a single pointer (and not in a serialized form).  Hence, my position
that apps that use python can't be forced to go through IPC or else:
(a) there's a performance/resource waste to serialize and unserialize
large or intricate data sets, and (b) they're required to write and
maintain serialization code that otherwise doesn't serve any other
purpose.

Andy



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


 Andy O'Meara wrote:
  I would definitely agree if there was a context (i.e. environment)
  object passed around then perhaps we'd have the best of all worlds.

 Moreover, I think this is probably the *only* way that
 totally independent interpreters could be realized.

 Converting the whole C API to use this strategy would be
 a very big project. Also, on the face of it, it seems like
 it would render all existing C extension code obsolete,
 although it might be possible to do something clever with
 macros to create a compatibility layer.

 Another thing to consider is that passing all these extra
 pointers around everywhere is bound to have some effect
 on performance.


Good points--I would agree with you on all counts there.  On the
passing a context everywhere performance hit, perhaps one idea is
that all objects could have an additional field that would point back
to their parent context (ie. their interpreter).  So the only
prototypes that would have to be modified to contain the context ptr
would be the ones that inherently don't take any objects. This would
conveniently and generally correspond to procs associated with
interpreter control (e.g. importing modules, shutting down modules,
etc).


 Andy O'Meara wrote:
  - each worker thread makes its own interpreter, pops scripts off a
  work queue, and manages exporting (and then importing) result data to
  other parts of the app.

 I hope you realize that starting up one of these interpreters
 is going to be fairly expensive.

Absolutely.  I had just left that issue out in an effort to keep the
discussion pointed, but it's a great point to raise.  My response is
that, like any 3rd party industry package, I'd say this is the
expectation (that context startup and shutdown is non-trivial and to
should be minimized for performance reasons).  For simplicity, my
examples didn't talk about this issue but in practice, it'd be typical
for apps to have their worker interpreters persist as they chew
through jobs.


Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Rhamphoryncus

On Oct 25, 12:29 am, greg [EMAIL PROTECTED] wrote:
 Rhamphoryncus wrote:
  A list
  is not shareable, so it can only be used within the monitor it's
  created within, but the list type object is shareable.

 Type objects contain dicts, which allow arbitrary values
 to be stored in them. What happens if one thread puts
 a private object in there? It becomes visible to other
 threads using the same type object. If it's not safe
 for sharing, bad things happen.

 Python's data model is not conducive to making a clear
 distinction between private and shared objects,
 except at the level of an entire interpreter.

shareable type objects (enabled by a __future__ import) use a
shareddict, which requires all keys and values to themselves be
shareable objects.

Although it's a significant semantic change, in many cases it's easy
to deal with: replace mutable (unshareable) global constants with
immutable ones (ie list - tuple, set - frozenset).  If you've got
some global state you move it into a monitor (which doesn't scale, but
that's your design).  The only time this really fails is when you're
deliberately storing arbitrary mutable objects from any thread, and
later inspecting them from any other thread (such as our new ABC
system's cache).  If you want to store an object, but only to give it
back to the original thread, I've got a way to do that.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


Glenn Linderman wrote:
On approximately 10/25/2008 12:01 AM, came the following characters from 
the keyboard of Martin v. Löwis:



If None remains global, then type(None) also remains global, and
type(None),__bases__[0]. Then type(None).__bases__[0].__subclasses__()
will yield interesting results. This is essentially the status quo.


I certainly don't grok the implications of what you say above, 
as I barely grok the semantics of it.


Not only is there a link from a class to its base classes, there
is a link to all its subclasses as well.

Since every class is ultimately a subclass of 'object', this means
that starting from *any* object, you can work your way up the
__bases__ chain until you get to 'object', then walk the sublass
hierarchy and find every class in the system.

This means that if any object at all is shared, then all class
objects, and any object reachable from them, are shared as well.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread greg


Andy wrote:


1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Löwis


Something like that is necessary for independent interpreters,
but not sufficient. There are also all the built-in constants
and type objects to consider. Most of these are statically
allocated at the moment.


2) Barriers to free threading.  As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps


No, it's there because it's necessary for acceptable performance
when multiple threads are running in one interpreter. Independent
interpreters wouldn't mean the absence of a GIL; it would only
mean each interpreter having its own GIL.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Martin v. Löwis

 You seem confused.  PEP 3121 is for isolated interpreters (ie emulated
 processes), not threading.

Just a small remark: this wasn't the primary objective of the PEP.
The primary objective was to support module cleanup in a reliable
manner, to allow eventually to get modules garbage-collected properly.
However, I also kept the isolated interpreters feature in mind there.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread sturlamolden


Instead of appdomains (one interpreter per thread), or free
threading, you could use multiple processes. Take a look at the new
multiprocessing module in Python 2.6. It has roughly the same
interface as Python's threading and queue modules, but uses processes
instead of threads. Processes are scheduled independently by the
operating system. The objects in the multiprocessing module also tend
to have much better performance than their threading and queue
counterparts. If you have a problem with threads due to the GIL, the
multiprocessing module with most likely take care of it.

There is a fundamental problem with using homebrew loading of multiple
(but renamed) copies of PythonXX.dll that is easily overlooked. That
is, extension modules (.pyd) are DLLs as well. Even if required by two
interpreters, they will only be loaded into the process image once.
Thus you have to rename all of them as well, or you will get havoc
with refcounts. Not to speak of what will happen if a Windows HANDLE
is closed by one interpreter while still needed by another. It is
almost guaranteed to bite you, sooner or later.

There are other options as well:

- Use IronPython. It does not have a GIL.

- Use Jython. It does not have a GIL.

- Use pywin32 to create isolated outproc COM servers in Python. (I'm
not sure what the effect of inproc servers would be.)

- Use os.fork() if your platform supports it (Linux, Unix, Apple,
Cygwin, Windows Vista SUA). This is the standard posix way of doing
multiprocessing. It is almost unbeatable if you have a fast copy-on-
write implementation of fork (that is, all platforms except Cygwin).












--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 9:35 am, sturlamolden [EMAIL PROTECTED] wrote:
 Instead of appdomains (one interpreter per thread), or free
 threading, you could use multiple processes. Take a look at the new
 multiprocessing module in Python 2.6.

That's mentioned earlier in the thread.


 There is a fundamental problem with using homebrew loading of multiple
 (but renamed) copies of PythonXX.dll that is easily overlooked. That
 is, extension modules (.pyd) are DLLs as well.

Tell me about it--there's all kinds of problems and maintenance
liabilities with our approach.  That's why I'm here talking about this
stuff.

 There are other options as well:

 - Use IronPython. It does not have a GIL.

 - Use Jython. It does not have a GIL.

 - Use pywin32 to create isolated outproc COM servers in Python. (I'm
 not sure what the effect of inproc servers would be.)

 - Use os.fork() if your platform supports it (Linux, Unix, Apple,
 Cygwin, Windows Vista SUA). This is the standard posix way of doing
 multiprocessing. It is almost unbeatable if you have a fast copy-on-
 write implementation of fork (that is, all platforms except Cygwin).

This is discussed earlier in the thread--they're unfortunately all
out.

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Stefan Behnel

Terry Reedy wrote:
 Everything in DLLs is compiled C extensions.  I see about 15 for Windows
 3.0.

Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was
inhabited by only 15 libraries? *sigh*

... although ... wait, didn't Win3.0 have more than that already? Maybe you
meant Windows 1.0?

SCNR-ly,

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread sturlamolden

On Oct 24, 3:58 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

 This is discussed earlier in the thread--they're unfortunately all
 out.

It occurs to me that tcl is doing what you want. Have you ever thought
of not using Python?

That aside, the fundamental problem is what I perceive a fundamental
design flaw in Python's C API. In Java JNI, each function takes a
JNIEnv* pointer as their first argument. There  is nothing the
prevents you from embedding several JVMs in a process. Python can
create embedded subinterpreters, but it works differently. It swaps
subinterpreters like a finite state machine: only one is concurrently
active, and the GIL is shared. The approach is fine, except it kills
free threading of subinterpreters. The argument seems to be that
Apache's mod_python somehow depends on it (for reasons I don't
understand).





--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 2:12 am, greg [EMAIL PROTECTED] wrote:
 Andy wrote:
  1) Independent interpreters (this is the easier one--and solved, in
  principle anyway, by PEP 3121, by Martin v. Löwis

 Something like that is necessary for independent interpreters,
 but not sufficient. There are also all the built-in constants
 and type objects to consider. Most of these are statically
 allocated at the moment.


Agreed--I  was just trying to speak generally.  Or, put another way,
there's no hope for independent interpreters without the likes of PEP
3121.  Also, as Martin pointed out, there's the issue of module
cleanup some guys here may underestimate (and I'm glad Martin pointed
out the importance of it).  Without the module cleanup, every time a
dynamic library using python loads and unloads you've got leaks.  This
issue is a real problem for us since our software is loaded and
unloaded many many times in a host app (iTunes, WMP, etc).  I hadn't
raised it here yet (and I don't want to turn the discussion to this),
but lack of multiple load and unload support has been another painful
issue that we didn't expect to encounter when we went with python.


  2) Barriers to free threading.  As Jesse describes, this is simply
  just the GIL being in place, but of course it's there for a reason.
  It's there because (1) doesn't hold and there was never any specs/
  guidance put forward about what should and shouldn't be done in multi-
  threaded apps

 No, it's there because it's necessary for acceptable performance
 when multiple threads are running in one interpreter. Independent
 interpreters wouldn't mean the absence of a GIL; it would only
 mean each interpreter having its own GIL.


I see what you're saying, but let's note that what you're talking
about at this point is an interpreter containing protection from the
client level violating (supposed) direction put forth in python
multithreaded guidelines.  Glenn Linderman's post really gets at
what's at hand here.  It's really important to consider that it's not
a given that python (or any framework) has to be designed against
hazardous use.  Again, I refer you to the diagrams and guidelines in
the QuickTime API:

http://developer.apple.com/technotes/tn/tn2125.html

They tell you point-blank what you can and can't do, and it's that's
simple.  Their engineers can then simply create the implementation
around those specs and not weigh any of the implementation down with
sync mechanisms.  I'm in the camp that simplicity and convention wins
the day when it comes to an API.  It's safe to say that software
engineers expect and assume that a thread that doesn't have contact
with other threads (except for explicit, controlled message/object
passing) will run unhindered and safely, so I raise an eyebrow at the
GIL (or any internal helper sync stuff) holding up an thread's
performance when the app is designed to not need lower-level global
locks.

Anyway, let's talk about solutions.  My company looking to support
python dev community endeavor that allows the following:

- an app makes N worker threads (using the OS)

- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app.  Generally, we're talking about CPU-bound work
here.

- each interpreter has the essentials (e.g. math support, string
support, re support, and so on -- I realize this is open-ended, but
work with me here).

Let's guesstimate about what kind of work we're talking about here and
if this is even in the realm of possibility.  If we find that it *is*
possible, let's figure out what level of work we're talking about.
From there, I can get serious about writing up a PEP/spec, paid
support, and so on.

Regards,
Andy





--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters



 That aside, the fundamental problem is what I perceive a fundamental
 design flaw in Python's C API. In Java JNI, each function takes a
 JNIEnv* pointer as their first argument. There  is nothing the
 prevents you from embedding several JVMs in a process. Python can
 create embedded subinterpreters, but it works differently. It swaps
 subinterpreters like a finite state machine: only one is concurrently
 active, and the GIL is shared.

Bingo, it seems that you've hit it right on the head there.  Sadly,
that's why I regard this thread largely futile (but I'm an optimist
when it comes to cool software communities so here I am).  I've been
afraid to say it for fear of getting mauled by everyone here, but I
would definitely agree if there was a context (i.e. environment)
object passed around then perhaps we'd have the best of all worlds.
*winces*



  This is discussed earlier in the thread--they're unfortunately all
  out.

 It occurs to me that tcl is doing what you want. Have you ever thought
 of not using Python?

Bingo again.  Our research says that the options are tcl, perl
(although it's generally untested and not recommended by the
community--definitely dealbreakers for a commercial user like us), and
lua.  Also, I'd rather saw off my own right arm than adopt perl, so
that's out.  :^)

As I mentioned, we're looking to either (1) support a python dev
community effort, (2) make our own high-performance python interpreter
(that uses an env object as you described), or (3) drop python and go
to lua.  I'm favoring them in the order I list them, but the more I
discuss the issue with folks here, the more people seem to be
unfortunately very divided on (1).

Andy



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Patrick Stinson

I'm not finished reading the whole thread yet, but I've got some
things below to respond to this post with.

On Thu, Oct 23, 2008 at 9:30 AM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/23/2008 12:24 AM, came the following characters from the
 keyboard of Christian Heimes:

 Andy wrote:

 2) Barriers to free threading.  As Jesse describes, this is simply
 just the GIL being in place, but of course it's there for a reason.
 It's there because (1) doesn't hold and there was never any specs/
 guidance put forward about what should and shouldn't be done in multi-
 threaded apps (see my QuickTime API example).  Perhaps if we could go
 back in time, we would not put the GIL in place, strict guidelines
 regarding multithreaded use would have been established, and PEP 3121
 would have been mandatory for C modules.  Then again--screw that, if I
 could go back in time, I'd just go for the lottery tickets!! :^)


 I've been following this discussion with interest, as it certainly seems
 that multi-core/multi-CPU machines are the coming thing, and many
 applications will need to figure out how to use them effectively.

 I'm very - not absolute, but very - sure that Guido and the initial
 designers of Python would have added the GIL anyway. The GIL makes Python
 faster on single core machines and more stable on multi core machines. Other
 language designers think the same way. Ruby recently got a GIL. The article
 http://www.infoq.com/news/2007/05/ruby-threading-futures explains the
 rationales for a GIL in Ruby. The article also holds a quote from Guido
 about threading in general.

 Several people inside and outside the Python community think that threads
 are dangerous and don't scale. The paper
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf sums it up
 nicely, It explains why modern processors are going to cause more and more
 trouble with the Java approach to threads, too.

 Reading this PDF paper is extremely interesting (albeit somewhat dependent
 on understanding abstract theories of computation; I have enough math
 background to follow it, sort of, and most of the text can be read even
 without fully understanding the theoretical abstractions).

 I have already heard people talking about Java applications are buggy.  I
 don't believe that general sequential programs written in Java are any
 buggier than programs written in other languages... so I had interpreted
 that to mean (based on some inquiry) that complex, multi-threaded Java
 applications are buggy.  And while I also don't believe that complex,
 multi-threaded programs written in Java are any buggier than complex,
 multi-threaded programs written in other languages, it does seem to be true
 that Java is one of the currently popular languages in which to write
 complex, multi-threaded programs, because of its language support for
 threads and concurrency primitives.  These reports were from people that are
 not programmers, but are field IT people, that have bought and/or support
 software and/or hardware with drivers, that are written in Java, and seem to
 have non-ideal behavior, (apparently only) curable by stopping/restarting
 the application or driver, or sometimes requiring a reboot.

 The paper explains many traps that lead to complex, multi-threaded programs
 being buggy, and being hard to test.  I have worked with parallel machines,
 applications, and databases for 25 years, and can appreciate the succinct
 expression of the problems explained within the paper, and can, from
 experience, agree with its premises and conclusions.  Parallel applications
 only have been commercial successes when the parallelism is tightly
 constrained to well-controlled patterns that could be easily understood.
  Threads, especially in cooperation with languages that use memory
 pointers, have the potential to get out of control, in inexplicable ways.


 Python *must* gain means of concurrent execution of CPU bound code
 eventually to survive on the market. But it must get the right means or we
 are going to suffer the consequences.

 This statement, after reading the paper, seems somewhat in line with the
 author's premise that language acceptability requires that a language be
 self-contained/monolithic, and potentially sufficient to implement itself.
  That seems to also be one of the reasons that Java is used today for
 threaded applications.  It does seem to be true, given current hardware
 trends, that _some mechanism_ must be provided to obtain the benefit of
 multiple cores/CPUs to a single application, and that Python must either
 implement or interface to that mechanism to continue to be a viable language
 for large scale application development.

 Andy seems to want an implementation of independent Python processes
 implemented as threads within a single address space, that can be
 coordinated by an outer application.  This actually corresponds to the model
 promulgated in the paper as being most likely to succeed.

Re: 2.6, 3.0, and truly independent intepreters


Glenn, great post and points!


 Andy seems to want an implementation of independent Python processes
 implemented as threads within a single address space, that can be
 coordinated by an outer application.  This actually corresponds to the
 model promulgated in the paper as being most likely to succeed.

Yeah, that's the idea--let the highest levels run and coordinate the
show.


 It does seem simpler and more efficient to simply copy
 data from one memory location to another, rather than send it in a
 message, especially if the data are large.

That's the rub...  In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space.  The same argument holds for numerical processing with
large data sets.  The workers handing back huge data sets via
messaging isn't very attractive.

 One thing Andy hasn't yet explained (or I missed) is why any of his
 application is coded in a language other than Python.  

Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features.  Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.

The other area of pain that I mentioned in one of my other posts is
that what we ship, above all, can't be flaky.  The lack of module
cleanup (intended to be addressed by PEP 3121), using a duplicate copy
of the python dynamic lib, and namespace black magic to achieve
independent interpreters are all examples that have made using python
for us much more challenging and time-consuming then we ever
anticipated.

Again, if it turns out nothing can be done about our needs (which
appears to be more and more like the case), I think it's important for
everyone here to consider the points raised here in the last week.
Moreover, realize that the python dev community really stands to gain
from making python usable as a tool (rather than a monolith).  This
fact alone has caused lua to *rapidly* rise in popularity with
software companies looking to embed a powerful, lightweight
interpreter in their software.

As a python language fan an enthusiast, don't let lua win!  (I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).


Andy
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Patrick Stinson

We are in the same position as Andy here.

I think that something that would help people like us produce
something in code form is a collection of information outlining the
problem and suggested solutions, appropriate parts of the CPython's
current threading API, and pros and cons of the many various proposed
solutions to the different levels of the problem. The most valuable
information I've found is contained in the many (lengthy!) discussions
like this one, a few related PEP's, and the CPython docs, but has
anyone condensed the state of the problem into a wiki or something
similar? Maybe we should start one?

For example, Guido's post here
http://www.artima.com/weblogs/viewpost.jsp?thread=214235describes some
possible solutions to the problem, like interpreter-specific locks, or
fine-grained object locks, and he also mentions the primary
requirement of not harming from the performance of single-threaded
apps. As I understand it, that requirement does not rule out new build
configurations that provide some level of concurrency, as long as you
can still compile python so as to perform as well on single-threaded
apps.

To add to the heap of use cases, the most important thing to us is to
simple have the python language and the sip/PyQt modules available to
us. All we wanted to do was embed the interpreter and language core as
a local scripting engine, so had we patched python to provide
concurrent execution, we wouldn't have cared about all of the other
unsuppported extension modules since our scripts are quite
application-specific.

It seems to me that the very simplest move would be to remove global
static data so the app could provide all thread-related data, which
Andy suggests through references to the QuickTime API. This would
suggest compiling python without thread support so as to leave it up
to the application.

Anyway, I'm having fun reading all of these papers and news postings,
but it's true that code talks, and it could be a little easier if the
state of the problems was condensed. This could be an intense and fun
project, but frankly it's a little tough to keep it all in my head. Is
there a wiki or something out there or should we start one, or do I
just need to read more code?

On Fri, Oct 24, 2008 at 6:40 AM, Andy O'Meara [EMAIL PROTECTED] wrote:
 On Oct 24, 2:12 am, greg [EMAIL PROTECTED] wrote:
 Andy wrote:
  1) Independent interpreters (this is the easier one--and solved, in
  principle anyway, by PEP 3121, by Martin v. Löwis

 Something like that is necessary for independent interpreters,
 but not sufficient. There are also all the built-in constants
 and type objects to consider. Most of these are statically
 allocated at the moment.


 Agreed--I  was just trying to speak generally.  Or, put another way,
 there's no hope for independent interpreters without the likes of PEP
 3121.  Also, as Martin pointed out, there's the issue of module
 cleanup some guys here may underestimate (and I'm glad Martin pointed
 out the importance of it).  Without the module cleanup, every time a
 dynamic library using python loads and unloads you've got leaks.  This
 issue is a real problem for us since our software is loaded and
 unloaded many many times in a host app (iTunes, WMP, etc).  I hadn't
 raised it here yet (and I don't want to turn the discussion to this),
 but lack of multiple load and unload support has been another painful
 issue that we didn't expect to encounter when we went with python.


  2) Barriers to free threading.  As Jesse describes, this is simply
  just the GIL being in place, but of course it's there for a reason.
  It's there because (1) doesn't hold and there was never any specs/
  guidance put forward about what should and shouldn't be done in multi-
  threaded apps

 No, it's there because it's necessary for acceptable performance
 when multiple threads are running in one interpreter. Independent
 interpreters wouldn't mean the absence of a GIL; it would only
 mean each interpreter having its own GIL.


 I see what you're saying, but let's note that what you're talking
 about at this point is an interpreter containing protection from the
 client level violating (supposed) direction put forth in python
 multithreaded guidelines.  Glenn Linderman's post really gets at
 what's at hand here.  It's really important to consider that it's not
 a given that python (or any framework) has to be designed against
 hazardous use.  Again, I refer you to the diagrams and guidelines in
 the QuickTime API:

 http://developer.apple.com/technotes/tn/tn2125.html

 They tell you point-blank what you can and can't do, and it's that's
 simple.  Their engineers can then simply create the implementation
 around those specs and not weigh any of the implementation down with
 sync mechanisms.  I'm in the camp that simplicity and convention wins
 the day when it comes to an API.  It's safe to say that software
 engineers expect and assume that a thread that doesn't have contact
 with other threads

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Patrick Stinson

As a side note to the performance question, we are executing python
code in an audio thread that is used in all of the top-end music
production environments. We have found the language to perform
extremely well when executed at control-rate frequency, meaning we
aren't doing DSP computations, just responding to less-frequent events
like user input and MIDI messages.

So we are sitting this music platform with unimaginable possibilities
in the music world (of which python does not play a role), but those
little CPU spikes caused by the GIL at low latencies won't let us have
it. AFAIK, there is no music scripting language out there that would
come close, and yet we are so close! This is a big deal.

On Fri, Oct 24, 2008 at 7:42 AM, Andy O'Meara [EMAIL PROTECTED] wrote:

 Glenn, great post and points!


 Andy seems to want an implementation of independent Python processes
 implemented as threads within a single address space, that can be
 coordinated by an outer application.  This actually corresponds to the
 model promulgated in the paper as being most likely to succeed.

 Yeah, that's the idea--let the highest levels run and coordinate the
 show.


 It does seem simpler and more efficient to simply copy
 data from one memory location to another, rather than send it in a
 message, especially if the data are large.

 That's the rub...  In our case, we're doing image and video
 manipulation--stuff not good to be messaging from address space to
 address space.  The same argument holds for numerical processing with
 large data sets.  The workers handing back huge data sets via
 messaging isn't very attractive.

 One thing Andy hasn't yet explained (or I missed) is why any of his
 application is coded in a language other than Python.

 Our software runs in real time (so performance is paramount),
 interacts with other static libraries, depends on worker threads to
 perform real-time image manipulation, and leverages Windows and Mac OS
 API concepts and features.  Python's performance hits have generally
 been a huge challenge with our animators because they often have to go
 back and massage their python code to improve execution performance.
 So, in short, there are many reasons why we use python as a part
 rather than a whole.

 The other area of pain that I mentioned in one of my other posts is
 that what we ship, above all, can't be flaky.  The lack of module
 cleanup (intended to be addressed by PEP 3121), using a duplicate copy
 of the python dynamic lib, and namespace black magic to achieve
 independent interpreters are all examples that have made using python
 for us much more challenging and time-consuming then we ever
 anticipated.

 Again, if it turns out nothing can be done about our needs (which
 appears to be more and more like the case), I think it's important for
 everyone here to consider the points raised here in the last week.
 Moreover, realize that the python dev community really stands to gain
 from making python usable as a tool (rather than a monolith).  This
 fact alone has caused lua to *rapidly* rise in popularity with
 software companies looking to embed a powerful, lightweight
 interpreter in their software.

 As a python language fan an enthusiast, don't let lua win!  (I say
 this endearingly of course--I have the utmost respect for both
 communities and I only want to see CPython be an attractive pick when
 a company is looking to embed a language that won't intrude upon their
 app's design).


 Andy
 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Terry Reedy


Stefan Behnel wrote:

Terry Reedy wrote:

Everything in DLLs is compiled C extensions.  I see about 15 for Windows
3.0.


Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was
inhabited by only 15 libraries? *sigh*

... although ... wait, didn't Win3.0 have more than that already? Maybe you
meant Windows 1.0?

SCNR-ly,


Is that the equivalent of a smilely? or did you really not understand 
what I wrote?


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 10:40 AM, Andy O'Meara [EMAIL PROTECTED] wrote:
  2) Barriers to free threading.  As Jesse describes, this is simply
  just the GIL being in place, but of course it's there for a reason.
  It's there because (1) doesn't hold and there was never any specs/
  guidance put forward about what should and shouldn't be done in multi-
  threaded apps

 No, it's there because it's necessary for acceptable performance
 when multiple threads are running in one interpreter. Independent
 interpreters wouldn't mean the absence of a GIL; it would only
 mean each interpreter having its own GIL.


 I see what you're saying, but let's note that what you're talking
 about at this point is an interpreter containing protection from the
 client level violating (supposed) direction put forth in python
 multithreaded guidelines.  Glenn Linderman's post really gets at
 what's at hand here.  It's really important to consider that it's not
 a given that python (or any framework) has to be designed against
 hazardous use.  Again, I refer you to the diagrams and guidelines in
 the QuickTime API:

 http://developer.apple.com/technotes/tn/tn2125.html

 They tell you point-blank what you can and can't do, and it's that's
 simple.  Their engineers can then simply create the implementation
 around those specs and not weigh any of the implementation down with
 sync mechanisms.  I'm in the camp that simplicity and convention wins
 the day when it comes to an API.  It's safe to say that software
 engineers expect and assume that a thread that doesn't have contact
 with other threads (except for explicit, controlled message/object
 passing) will run unhindered and safely, so I raise an eyebrow at the
 GIL (or any internal helper sync stuff) holding up an thread's
 performance when the app is designed to not need lower-level global
 locks.

 Anyway, let's talk about solutions.  My company looking to support
 python dev community endeavor that allows the following:

 - an app makes N worker threads (using the OS)

 - each worker thread makes its own interpreter, pops scripts off a
 work queue, and manages exporting (and then importing) result data to
 other parts of the app.  Generally, we're talking about CPU-bound work
 here.

 - each interpreter has the essentials (e.g. math support, string
 support, re support, and so on -- I realize this is open-ended, but
 work with me here).

 Let's guesstimate about what kind of work we're talking about here and
 if this is even in the realm of possibility.  If we find that it *is*
 possible, let's figure out what level of work we're talking about.
 From there, I can get serious about writing up a PEP/spec, paid
 support, and so on.

Point of order! Just for my own sanity if anything :) I think some
minor clarifications are in order.

What are threads within Python:

Python has built in support for POSIX light weight threads. This is
what most people are talking about when they see, hear and say
threads - they mean Posix Pthreads
(http://en.wikipedia.org/wiki/POSIX_Threads) this is not what you
(Adam) seem to be asking for. PThreads are attractive due to the fact
they exist within a single interpreter, can share memory all willy
nilly, etc.

Python does in fact, use OS-Level pthreads when you request multiple threads.

The Global Interpreter Lock is fundamentally designed to make the
interpreter easier to maintain and safer: Developers do not need to
worry about other code stepping on their namespace. This makes things
thread-safe, inasmuch as having multiple PThreads within the same
interpreter space modifying global state and variable at once is,
well, bad. A c-level module, on the other hand, can sidestep/release
the GIL at will, and go on it's merry way and process away.

POSIX Threads/pthreads/threads as we get from Java, allow unsafe
programming styles. These programming styles are of the shared
everything deadlock lol kind. The GIL *partially* protects against
some of the pitfalls. You do not seem to be asking for pthreads :)

http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock
http://en.wikipedia.org/wiki/Multi-threading

However, then there are processes.

The difference between threads and processes is that they do *not
share memory* but they can share state via shared queues/pipes/message
passing - what you seem to be asking for - is the ability to
completely fork independent Python interpreters, with their own
namespace and coordinate work via a shared queue accessed with pipes
or some other communications mechanism. Correct?

Multiprocessing, as it exists within python 2.6 today actually forks
(see trunk/Lib/multiprocessing/forking.py) a completely independent
interpreter per process created and then construct pipes to
inter-communicate, and queue to do work coordination. I am not
suggesting this is good for you - I'm trying to get to exactly what
you're asking for.

Fundamentally, allowing total free-threading with Posix threads,

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 12:30 PM, Jesse Noller [EMAIL PROTECTED] wrote:
 On Fri, Oct 24, 2008 at 10:40 AM, Andy O'Meara [EMAIL PROTECTED] wrote:
  2) Barriers to free threading.  As Jesse describes, this is simply
  just the GIL being in place, but of course it's there for a reason.
  It's there because (1) doesn't hold and there was never any specs/
  guidance put forward about what should and shouldn't be done in multi-
  threaded apps

 No, it's there because it's necessary for acceptable performance
 when multiple threads are running in one interpreter. Independent
 interpreters wouldn't mean the absence of a GIL; it would only
 mean each interpreter having its own GIL.


 I see what you're saying, but let's note that what you're talking
 about at this point is an interpreter containing protection from the
 client level violating (supposed) direction put forth in python
 multithreaded guidelines.  Glenn Linderman's post really gets at
 what's at hand here.  It's really important to consider that it's not
 a given that python (or any framework) has to be designed against
 hazardous use.  Again, I refer you to the diagrams and guidelines in
 the QuickTime API:

 http://developer.apple.com/technotes/tn/tn2125.html

 They tell you point-blank what you can and can't do, and it's that's
 simple.  Their engineers can then simply create the implementation
 around those specs and not weigh any of the implementation down with
 sync mechanisms.  I'm in the camp that simplicity and convention wins
 the day when it comes to an API.  It's safe to say that software
 engineers expect and assume that a thread that doesn't have contact
 with other threads (except for explicit, controlled message/object
 passing) will run unhindered and safely, so I raise an eyebrow at the
 GIL (or any internal helper sync stuff) holding up an thread's
 performance when the app is designed to not need lower-level global
 locks.

 Anyway, let's talk about solutions.  My company looking to support
 python dev community endeavor that allows the following:

 - an app makes N worker threads (using the OS)

 - each worker thread makes its own interpreter, pops scripts off a
 work queue, and manages exporting (and then importing) result data to
 other parts of the app.  Generally, we're talking about CPU-bound work
 here.

 - each interpreter has the essentials (e.g. math support, string
 support, re support, and so on -- I realize this is open-ended, but
 work with me here).

 Let's guesstimate about what kind of work we're talking about here and
 if this is even in the realm of possibility.  If we find that it *is*
 possible, let's figure out what level of work we're talking about.
 From there, I can get serious about writing up a PEP/spec, paid
 support, and so on.

 Point of order! Just for my own sanity if anything :) I think some
 minor clarifications are in order.

 What are threads within Python:

 Python has built in support for POSIX light weight threads. This is
 what most people are talking about when they see, hear and say
 threads - they mean Posix Pthreads
 (http://en.wikipedia.org/wiki/POSIX_Threads) this is not what you
 (Adam) seem to be asking for. PThreads are attractive due to the fact
 they exist within a single interpreter, can share memory all willy
 nilly, etc.

 Python does in fact, use OS-Level pthreads when you request multiple threads.

 The Global Interpreter Lock is fundamentally designed to make the
 interpreter easier to maintain and safer: Developers do not need to
 worry about other code stepping on their namespace. This makes things
 thread-safe, inasmuch as having multiple PThreads within the same
 interpreter space modifying global state and variable at once is,
 well, bad. A c-level module, on the other hand, can sidestep/release
 the GIL at will, and go on it's merry way and process away.

 POSIX Threads/pthreads/threads as we get from Java, allow unsafe
 programming styles. These programming styles are of the shared
 everything deadlock lol kind. The GIL *partially* protects against
 some of the pitfalls. You do not seem to be asking for pthreads :)

 http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock
 http://en.wikipedia.org/wiki/Multi-threading

 However, then there are processes.

 The difference between threads and processes is that they do *not
 share memory* but they can share state via shared queues/pipes/message
 passing - what you seem to be asking for - is the ability to
 completely fork independent Python interpreters, with their own
 namespace and coordinate work via a shared queue accessed with pipes
 or some other communications mechanism. Correct?

 Multiprocessing, as it exists within python 2.6 today actually forks
 (see trunk/Lib/multiprocessing/forking.py) a completely independent
 interpreter per process created and then construct pipes to
 inter-communicate, and queue to do work coordination. I am not
 suggesting this is good for you - I'm trying

Re: 2.6, 3.0, and truly independent intepreters




 The Global Interpreter Lock is fundamentally designed to make the
 interpreter easier to maintain and safer: Developers do not need to
 worry about other code stepping on their namespace. This makes things
 thread-safe, inasmuch as having multiple PThreads within the same
 interpreter space modifying global state and variable at once is,
 well, bad. A c-level module, on the other hand, can sidestep/release
 the GIL at will, and go on it's merry way and process away.

...Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right? (even if the interpreter is 100% independent, yikes).  For
example, have a python C module designed to programmatically generate
images (and video frames) in RAM for immediate and subsequent use in
animation.  Meanwhile, we'd like to have a pthread with its own
interpreter with an instance of this module and have it dequeue jobs
as they come in (in fact, there'd be one of these threads for each
excess core present on the machine).  As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space (basically, it seems that we're talking about the
embarrassingly parallel scenario raised in that paper).  Why does it
have to be in same address space?  Convenience and simplicity--the
same reasons that most APIs let you hang yourself if the app does dumb
things with threads.  Also, when the data sets that you need to send
to and from each process is large, using the same address space makes
more and more sense.


 So, just to clarify - Andy, do you want one interpreter, $N threads
 (e.g. PThreads) or the ability to fork multiple heavyweight
 processes?

Sorry if I haven't been clear, but we're talking the app starting a
pthread, making a fresh/clean/independent interpreter, and then being
responsible for its safety at the highest level (with the payoff of
each of these threads executing without hinderance).  No different
than if you used most APIs out there where step 1 is always to make
and init a context object and the final step is always to destroy/take-
down that context object.

I'm a lousy writer sometimes, but I feel bad if you took the time to
describe threads vs processes.  The only reason I raised IPC with my
messaging isn't very attractive comment was to respond to Glenn
Linderman's points regarding tradeoffs of shared memory vs no.


Andy



--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 3:17 PM, Andy O'Meara [EMAIL PROTECTED] wrote:

 I'm a lousy writer sometimes, but I feel bad if you took the time to
 describe threads vs processes.  The only reason I raised IPC with my
 messaging isn't very attractive comment was to respond to Glenn
 Linderman's points regarding tradeoffs of shared memory vs no.


I actually took the time to bring anyone listening in up to speed, and
to clarify so I could better understand your use case. Don't feel bad,
things in the thread are moving fast and I just wanted to clear it up.

Ideally, we all want to improve the language, and the interpreter.
However trying to push it towards a particular use case is dangerous
given the idea of general use.

-jesse
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Rhamphoryncus

On Oct 24, 1:02 pm, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/24/2008 8:42 AM, came the following characters from
 the keyboard of Andy O'Meara:

  Glenn, great post and points!

 Thanks. I need to admit here that while I've got a fair bit of
 professional programming experience, I'm quite new to Python -- I've not
 learned its internals, nor even the full extent of its rich library. So
 I have some questions that are partly about the goals of the
 applications being discussed, partly about how Python is constructed,
 and partly about how the library is constructed. I'm hoping to get a
 better understanding of all of these; perhaps once a better
 understanding is achieved, limitations will be understood, and maybe
 solutions be achievable.

 Let me define some speculative Python interpreters; I think the first is
 today's Python:

 PyA: Has a GIL. PyA threads can run within a process; but are
 effectively serialized to the places where the GIL is obtained/released.
 Needs the GIL because that solves lots of problems with non-reentrant
 code (an example of non-reentrant code, is code that uses global (C
 global, or C static) variables – note that I'm not talking about Python
 vars declared global... they are only module global). In this model,
 non-reentrant code could include pieces of the interpreter, and/or
 extension modules.

 PyB: No GIL. PyB threads acquire/release a lock around each reference to
 a global variable (like with feature). Requires massive recoding of
 all code that contains global variables. Reduces performance
 significantly by the increased cost of obtaining and releasing locks.

 PyC: No locks. Instead, recoding is done to eliminate global variables
 (interpreter requires a state structure to be passed in). Extension
 modules that use globals are prohibited... this eliminates large
 portions of the library, or requires massive recoding. PyC threads do
 not share data between threads except by explicit interfaces.

 PyD: (A hybrid of PyA  PyC). The interpreter is recoded to eliminate
 global variables, and each interpreter instance is provided a state
 structure. There is still a GIL, however, because globals are
 potentially still used by some modules. Code is added to detect use of
 global variables by a module, or some contract is written whereby a
 module can be declared to be reentrant and global-free. PyA threads will
 obtain the GIL as they would today. PyC threads would be available to be
 created. PyC instances refuse to call non-reentrant modules, but also
 need not obtain the GIL... PyC threads would have limited module support
 initially, but over time, most modules can be migrated to be reentrant
 and global-free, so they can be used by PyC instances. Most 3rd-party
 libraries today are starting to care about reentrancy anyway, because of
 the popularity of threads.

PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable.  A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability.  Most other
shareable objects are immutable.  Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties.  Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.


  Our software runs in real time (so performance is paramount),
  interacts with other static libraries, depends on worker threads to
  perform real-time image manipulation, and leverages Windows and Mac OS
  API concepts and features.  Python's performance hits have generally
  been a huge challenge with our animators because they often have to go
  back and massage their python code to improve execution performance.
  So, in short, there are many reasons why we use python as a part
  rather than a whole.
[...]
  As a python language fan an enthusiast, don't let lua win!  (I say
  this endearingly of course--I have the utmost respect for both
  communities and I only want to see CPython be an attractive pick when
  a company is looking to embed a language that won't intrude upon their
  app's design).

I agree with the problem, and desire to make python fill all niches,
but let's just say I'm more ambitious with my solution. ;)
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters


Another great post, Glenn!!  Very well laid-out and posed!! Thanks for
taking the time to lay all that out.


 Questions for Andy: is the type of work you want to do in independent
 threads mostly pure Python? Or with libraries that you can control to
 some extent? Are those libraries reentrant? Could they be made
 reentrant? How much of the Python standard library would need to be
 available in reentrant mode to provide useful functionality for those
 threads? I think you want PyC


I think you've defined everything perfectly, and you're you're of
course correct about my love for for the PyC model.  :^)

Like any software that's meant to be used without restrictions, our
code and frameworks always use a context object pattern so that
there's never and non-const global/shared data).  I would go as far to
say that this is the case with more performance-oriented software than
you may think since it's usually a given for us to have to be parallel
friendly in as many ways as possible.  Perhaps Patrick can back me up
there.

As to what modules are essential...  As you point out, once
reentrant module implementations caught on in PyC or hybrid world, I
think we'd start to see real effort to whip them into compliance--
there's just so much to be gained imho.  But to answer the question,
there's the obvious ones (operator, math, etc), string/buffer
processing (string, re), C bridge stuff (struct, array), and OS basics
(time, file system, etc).  Nice-to-haves would be buffer and image
decompression (zlib, libpng, etc), crypto modules, and xml. As far as
I can imagine, I have to believe all of these modules already contain
little, if any, global data, so I have to believe they'd be super easy
to make PyC happy.  Patrick, what would you see you guys using?


  That's the rub...  In our case, we're doing image and video
  manipulation--stuff not good to be messaging from address space to
  address space.  The same argument holds for numerical processing with
  large data sets.  The workers handing back huge data sets via
  messaging isn't very attractive.

 In the module multiprocessing environment could you not use shared
 memory, then, for the large shared data items?


As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP).  Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something).  Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC.  The former is just WAY less
code, plain and simple.


Andy


--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Glenn Linderman

On approximately 10/24/2008 1:09 PM, came the following characters from 
the keyboard of Rhamphoryncus:

On Oct 24, 1:02 pm, Glenn Linderman [EMAIL PROTECTED] wrote:
  

On approximately 10/24/2008 8:42 AM, came the following characters from
the keyboard of Andy O'Meara:



Glenn, great post and points!
  

Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.

Let me define some speculative Python interpreters; I think the first is
today's Python:

PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables – note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.

PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like with feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.

PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.

PyD: (A hybrid of PyA  PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.



PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable.  A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability.  Most other
shareable objects are immutable.  Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties.  Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.
  


Hmm.  So I think your PyE is an instance is an attempt to be more 
explicit about what I said above in PyC: PyC threads do not share data 
between threads except by explicit interfaces.  I consider your 
definitions of shared data types somewhat orthogonal to the types of 
threads, in that both PyA and PyC threads could use these new shared 
data items.


I think/hope that you meant that many types are now only allowed to be 
non-shareable?  At least, I think that should be the default; they 
should be within the context of a single, independent interpreter 
instance, so other interpreters don't even know they exist, much less 
how to share them.  If so, then I understand most of the rest of your 
paragraph, and it could be a way of providing shared objects, perhaps.


I don't understand the comment that CPython's refcounting needs the 
GIL... yes, it needs the GIL if multiple threads see the object, but not 
for private objects... only one threads uses the private objects... so 
today's refcounting should suffice... with each interpreter doing its 
own refcounting and collecting its own garbage.


Shared objects would have to do refcounting in a protected way, under 
some lock.  One easy solution would be to have just two types of 
objects; non-shared private objects in a thread, and global shared 
objects; access to global shared objects would require grabbing the GIL, 
and then accessing the object, and releasing the GIL.   An interface 
could allow for grabbing

Re: 2.6, 3.0, and truly independent intepreters