Parallel Python
Has anybody tried to run parallel python applications? It appears that if your application is computation-bound using 'thread' or 'threading' modules will not get you any speedup. That is because python interpreter uses GIL(Global Interpreter Lock) for internal bookkeeping. The later allows only one python byte-code instruction to be executed at a time even if you have a multiprocessor computer. To overcome this limitation, I've created ppsmp module: http://www.parallelpython.com It provides an easy way to run parallel python applications on smp computers. I would appreciate any comments/suggestions regarding it. Thank you! -- http://mail.python.org/mailman/listinfo/python-list
Re: Parallel Python
> I always thought that if you use multiple processes (e.g. os.fork) then > Python can take advantage of multiple processors. I think the GIL locks > one processor only. The problem is that one interpreted can be run on > one processor only. Am I not right? Is your ppm module runs the same > interpreter on multiple processors? That would be very interesting, and > something new. > > > Or does it start multiple interpreters? Another way to do this is to > start multiple processes and let them communicate through IPC or a local > network. That's right. ppsmp starts multiple interpreters in separate processes and organize communication between them through IPC. Originally ppsmp was designed to speedup an existent application which is written in pure python but is quite computationally expensive (the other ways to optimize it were used too). It was also required that the application will run out of the box on the most standard Linux distributions (they all contain CPython). -- http://mail.python.org/mailman/listinfo/python-list
Re: Parallel Python
sturlamolden wrote: > [EMAIL PROTECTED] wrote: > > >That's right. ppsmp starts multiple interpreters in separate > > processes and organize communication between them through IPC. > > Thus you are basically reinventing MPI. > > http://mpi4py.scipy.org/ > http://en.wikipedia.org/wiki/Message_Passing_Interface Thanks for bringing that into consideration. I am well aware of MPI and have written several programs in C/C++ and Fortran which use it. I would agree that MPI is the most common solution to run software on a cluster (computers connected by network). Although there is another parallelization approach: PVM (Parallel Virtual Machine) http://www.csm.ornl.gov/pvm/pvm_home.html. I would say ppsmp is more similar to the later. By the way there are links to different python parallelization techniques (including MPI) from PP site: http://www.parallelpython.com/component/option,com_weblinks/catid,14/Itemid,23/ The main difference between MPI python solutions and ppsmp is that with MPI you have to organize both computations {MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else } and data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp you just submit a function with arguments to the execution server and retrieve the results later. That makes transition from serial python software to parallel much simpler with ppsmp than with MPI. To make this point clearer here is a short example: serial code 2 lines-- for input in inputs: print "Sum of primes below", input, "is", sum_primes(input) parallel code 3 lines jobs = [(input, job_server.submit(sum_primes,(input,), (isprime,), ("math",))) for input in inputs] for input, job in jobs: print "Sum of primes below", input, "is", job() --- In this example parallel execution was added at the cost of 1 line of code! The other difference with MPI is that ppsmp dynamically decides where to run each given job. For example if there are other active processes running in the system ppsmp will use in the bigger extent the processors which are free. Since in MPI the whole tasks is usually divided between processors equally at the beginning, the overall runtime will be determined by the slowest running process (the one which shares processor with another running program). In this particular case ppsmp will outperform MPI. The third, probably less important, difference is that with MPI based parallel python code you must have MPI installed in the system. Overall ppsmp is still work in progress and there are other interesting features which I would like to implement. This is the main reason why I do not open the source of ppsmp - to have better control of its future development, as advised here: http://en.wikipedia.org/wiki/Freeware :-) Best regards, Vitalii -- http://mail.python.org/mailman/listinfo/python-list
Re: Parallel Python
> > Thus there are different levels of parallelization: > > 1 file/database based; multiple batch jobs > 2 Message Passing, IPC, RPC, ... > 3 Object Sharing > 4 Sharing of global data space (Threads) > 5 Local parallelism / Vector computing, MMX, 3DNow,... > > There are good reasons for all of these levels. > Yet "parallel python" to me fakes to be on level 3 or 4 (or even 5 :-) ), > while its just a level 2 > system, where "passing", "remote", "inter-process" ... are the right vocables. In one of the previous posts I've mentioned that ppsmp is based on processes + IPC, which makes it a system with level 2 parallelization, the same level where MPI is. Also it's obvious from the fact that it's written completely in python, as python objects cannot be shared due to GIL (POSH can do sharing because it's an extension written in C). -- http://mail.python.org/mailman/listinfo/python-list
Re: Parallel Python
> Looks interesting, but is there any way to use this for a cluster of > machines over a network (not smp)? Networking capabilities will be included in the next release of Parallel Python software (http://www.parallelpython.com), which is coming soon. > Couldn't you just provide similar conveniences on top of MPI? Searching > for "Python MPI" yields a lot of existing work (as does "Python PVM"), > so perhaps someone has already done so. Yes, it's possible to do it on the top of any environment which supports IPC. > That's one more project... It seems that there is significant > interest in parallel computing in Python. Perhaps we should start a > special interest group? Not so much in order to work on a single > project; I believe that at the current state of parallel computing we > still need many different approaches to be tried. But an exchange of > experience could well be useful for all of us. Well, I may just add that everybody is welcome to start discussion regarding any parallel python project or idea in this forum: http://www.parallelpython.com/component/option,com_smf/Itemid,29/board,2.0 -- http://mail.python.org/mailman/listinfo/python-list
Re: Threaded for loop
John wrote: > I want to do something like this: > > for i = 1 in range(0,N): > for j = 1 in range(0,N): >D[i][j] = calculate(i,j) > > I would like to now do this using a fixed number of threads, say 10 > threads. > What is the easiest way to do the "parfor" in python? > > Thanks in advance for your help, As it was already mentioned before threads will not help in terms of parallelism (only one thread will be actually working). If you want to calculate this in parallel here is an easy solution: import ppsmp #start with 10 processes srv = ppsmp.Server(10) f = [] for i = 1 in range(0,N): for j = 1 in range(0,N): #it might be a little bit more complex if 'calculate' depends on other modules or calls functions f.append(srv.submit(calculate, (i,j))) for i = 1 in range(0,N): for j = 1 in range(0,N): D[i][j] = f.pop(0) You can get the latest version of ppsmp module here: http://www.parallelpython.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Threaded for loop
John wrote: > Thanks. Does it matter if I call shell commands os.system...etc in > calculate? > > Thanks, > --j The os.system command neglects important changes in the environment (redirected streams) and would not work with current version of ppsmp. Although there is a very simple workaround: print os.popen("yourcommand").read() instead of os.system("yourcommand") Here is a complete working example of that code: http://www.parallelpython.com/component/option,com_smf/Itemid,29/topic,13.0 -- http://mail.python.org/mailman/listinfo/python-list
Re: Parallel Python
On Jan 12, 11:52 am, Neal Becker <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > Has anybody tried to runparallelpythonapplications? > > It appears that if your application is computation-bound using 'thread' > > or 'threading' modules will not get you any speedup. That is because > >pythoninterpreter uses GIL(Global Interpreter Lock) for internal > > bookkeeping. The later allows only onepythonbyte-code instruction to > > be executed at a time even if you have a multiprocessor computer. > > To overcome this limitation, I've created ppsmp module: > > http://www.parallelpython.com > > It provides an easy way to runparallelpythonapplications on smp > > computers. > > I would appreciate any comments/suggestions regarding it. > > Thank you! > > Looks interesting, but is there any way to use this for a cluster of > machines over a network (not smp)? There are 2 major updates regarding Parallel Python: http:// www.parallelpython.com 1) Now (since version 1.2) parallel python software could be used for cluster-wide parallelization (or even Internet-wide). It's also renamed accordingly: pp (module is backward compatible with ppsmp) 2) Parallel Python became open source (under BSD license): http:// www.parallelpython.com/content/view/18/32/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Can Parallel Python run on a muti-CPU server ?
Hi, That is definitely possible! To achieve the best performance split your calculation either into 128 equal parts or int >>128 part of any size (then load balancing will spread workload equally). Let us know the results, if need any help with parallelization feel free to request it here: http://www.parallelpython.com/component/option,com_smf/Itemid,29/ Thank you! On Feb 7, 2:13 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm interested inParallelPythonand I learned from the website ofParallelPython > that it can run on SMP and clusters. But can it run on a our muti-CPU > server ? > We are running an origin3800 server with 128 CPUs. > > Thanks. -- http://mail.python.org/mailman/listinfo/python-list