Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory message passing?

Kevin Ar18 Tue, 27 Jul 2010 11:20:23 -0700

I won't even bother giving individual replies.  It's 
going to take me some time to go through all that information on the 
GIL, so I guess there's no much of a reply I can give anyways.  :)  Let me 
explain what this is all about in greater detail.




BTW, if there are more links on the GIL, feel free to post.

> Anonymous memory-mapped regions would work, with a suitable data
> abstraction. Or even memory-mapped files, which aren't really all that
> different on systems anymore.
I considered that... however, that would mean writing a significant library to 
convert Python data types to C/machine types and I wasn't looking forward to 
that prospect... although after some experimenting, maybe I will find that it 
won't be that big a deal for my particular situation.

-----------------------
What this is all about:
-----------------------
I am attempting to experiment with FBP - Flow Based Programming 
(http://www.jpaulmorrison.com/fbp/ and book: 
http://www.jpaulmorrison.com/fbp/book.pdf)  There is something very similar in 
Python: http://www.kamaelia.org/MiniAxon.html  Also, there are some 
similarities to Erlang - the share nothing memory model... and on some very 
broad levels, there are similarities that can be found in functional languages.

Consider p74 and p75 of the FBP book 
(http://www.jpaulmorrison.com/fbp/book.pdf).  Programs essentially consist of 
many "black boxes" connected together.  A box receives data, processes it and 
passes it along to another box, to output or drops/deletes it.  Each box, is 
like a mini-program written in a traditional programming language (like C++ or 
Python).

The process of connecting the boxes together was actually designed to be 
programmed visually, as you can see from the examples in the book (I have no 
idea if it works well, as I am merely starting to experiment with it).

Each box, being a self contained "program," the only data it has access to is 3 
parts:
(1) it's own internal variables
(2) The "in ports" These are connections from other boxes allowing the box to 
receive data to be processed (very similar to the arguments in a function call)
(3) The "out ports" After processing the data, the box sends results to various 
"out ports" (which, in turn, go to anther box's "in port" or to system 
output).  There is no "return" like in functions... and a box can continually 
generate many pieces of data on the "out ports", unlike a function which only 
generates one return.


------------------------
At this point, my understanding of the FBP concept is extremely limited.  
Unfortunately, the author does not have very detailed documentation on the 
implementation details.  So, I am going to try exploring the concept on my own 
and see if I can actually use it in some production code.


Implementation of FBP requires a custom scheduler for several reasons:
(1) A box can only run if it has actual data on the "in port(s)"  Thus, the 
scheduler would only schedule boxes to run when they can actually process some 
data.
(2) In theory, it may be possible to end up with hundreds or thousands of these 
light weight boxes.  Using heavy-weight OS threads or processes for every one 
is out of the question.


The Kamaelia website describes a simplistic single-threaded way to write a 
scheduler in Python that would work for the FBP concept (even though they never 
heard of FBP when they designed Kamaelia).  Based on that, it seems like 
writing a simple scheduler would be rather easy:


In a perfect world, here's what I might do:
* Assume a quad core cpu
(1) Spawn 1 process
(2) Spawn 4 threads & assign each thread to only 1 core -- in other words, 
don't let the OS handle moving threads around to different cores
(3) Inside each thread, have a mini scheduler that switches back and forth 
between the many micro-threads (or "boxes") -- note that the OS should not 
handle any of the switching between micro-threads/boxes as it does it all wrong 
(and to heavyweight) for this situation.
(4) Using a shared memory queue, each of the 4 schedulers can get the next box 
to run... or add more boxes to the schedule queue.

(5) Each box has access to its "in ports" and "out ports" only -- and nothing 
else.  These can be implemented as shared memory for speed.


Some notes:
Garbage Collection - I noticed that one of the issues mentioned about the GIL 
was garbage collection.  Within the FBP concept, this MIGHT be easily solved: 
(a) only 1 running piece of code (1 box) can access a piece of data at a time, 
so there is no worries about whether there are dangling pointers to the 
var/object somewhere, etc... (b) data must be manually "dropped" inside a box 
to get rid of it; thus, there is no need to go checking for data that is not 
used anymore

Threading protection - In theory, there is significantly less threading issues 
since: (a) only one box can control/access data at a time (b) the only place 
where there is contention is when you push/pop from the in/out ports ... and 
that is trivial to protect against.



Anyways, I appreciate the replies.  At this point, I guess I'll just go for a 
simplistic implementation to get a feel for how things work.  Then, maybe I can 
check on if something better can be done in PyPy.
                                          
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
_______________________________________________
[email protected]
http://codespeak.net/mailman/listinfo/pypy-dev

Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory message passing?

Reply via email to