Parallel Python

2007-01-06 Thread parallelpython
Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.
Thank you!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parallel Python

2007-01-10 Thread parallelpython
> I always thought that if you use multiple processes (e.g. os.fork) then
> Python can take advantage of multiple processors. I think the GIL locks
> one processor only. The problem is that one interpreted can be run on
> one processor only. Am I not right? Is your ppm module runs the same
> interpreter on multiple processors? That would be very interesting, and
> something new.
>
>
> Or does it start multiple interpreters? Another way to do this is to
> start multiple processes and let them communicate through IPC or a local
> network.

   That's right. ppsmp starts multiple interpreters in separate
processes and organize communication between them through IPC.

   Originally ppsmp was designed to speedup an existent application
which is written in pure python but is quite computationally expensive
(the other ways to optimize it were used too). It was also required
that the application will run out of the box on the most standard Linux
distributions (they all contain CPython).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parallel Python

2007-01-11 Thread parallelpython
sturlamolden wrote:
> [EMAIL PROTECTED] wrote:
>
> >That's right. ppsmp starts multiple interpreters in separate
> > processes and organize communication between them through IPC.
>
> Thus you are basically reinventing MPI.
>
> http://mpi4py.scipy.org/
> http://en.wikipedia.org/wiki/Message_Passing_Interface

Thanks for bringing that into consideration.

I am well aware of MPI and have written several programs in C/C++ and
Fortran which use it.
I would agree that MPI is the most common solution to run software on a
cluster (computers connected by network). Although there is another
parallelization approach: PVM (Parallel Virtual Machine)
http://www.csm.ornl.gov/pvm/pvm_home.html. I would say ppsmp is more
similar to the later.

By the way there are links to different python parallelization
techniques (including MPI) from PP site:
http://www.parallelpython.com/component/option,com_weblinks/catid,14/Itemid,23/

The main difference between MPI python solutions and ppsmp is that with
MPI you have to organize both computations
{MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else } and
data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp
you just submit a function with arguments to the execution server and
retrieve the results later.
That makes transition from serial python software to parallel much
simpler with ppsmp than with MPI.

To make this point clearer here is a short example:
serial code 2 lines--
for input in inputs:
print "Sum of primes below", input, "is", sum_primes(input)
parallel code 3 lines
jobs = [(input, job_server.submit(sum_primes,(input,), (isprime,),
("math",))) for input in inputs]
for input, job in jobs:
print "Sum of primes below", input, "is", job()
---
In this example parallel execution was added at the cost of 1 line of
code!

The other difference with MPI is that ppsmp dynamically decides where
to run each given job. For example if there are other active processes
running in the system ppsmp will use in the bigger extent the
processors which are free. Since in MPI the whole tasks is usually
divided  between processors equally at the beginning, the overall
runtime will be determined by the slowest running process (the one
which shares processor with another running program). In this
particular case ppsmp will outperform MPI.

The third, probably less important, difference is that with MPI based
parallel python code you must have MPI installed in the system.

Overall ppsmp is still work in progress and there are other interesting
features which I would like to implement. This is the main reason why I
do not open the source of ppsmp - to have better control of its future
development, as advised here: http://en.wikipedia.org/wiki/Freeware :-)

Best regards,
Vitalii

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parallel Python

2007-01-11 Thread parallelpython
>
> Thus there are different levels of parallelization:
>
> 1 file/database based; multiple batch jobs
> 2 Message Passing, IPC, RPC, ...
> 3 Object Sharing
> 4 Sharing of global data space (Threads)
> 5 Local parallelism / Vector computing, MMX, 3DNow,...
>
> There are good reasons for all of these levels.
> Yet "parallel python" to me fakes to be on level 3 or 4 (or even 5 :-) ), 
> while its just a level 2
> system, where "passing", "remote", "inter-process" ... are the right vocables.
In one of the previous posts I've mentioned that ppsmp is based on
processes + IPC, which makes it a system with level 2 parallelization,
the same level where MPI is.
Also it's obvious from the fact that it's written completely in python,
as python objects cannot be shared due to GIL (POSH can do sharing
because it's an extension written in C).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parallel Python

2007-01-13 Thread parallelpython
> Looks interesting, but is there any way to use this for a cluster of
> machines over a network (not smp)?

Networking capabilities will be included in the next release of
Parallel Python software (http://www.parallelpython.com), which is
coming soon.


> Couldn't you just provide similar conveniences on top of MPI? Searching
> for "Python MPI" yields a lot of existing work (as does "Python PVM"),
> so perhaps someone has already done so.

Yes, it's possible to do it on the top of any environment which
supports IPC.

> That's one more project... It seems that there is significant
> interest in parallel computing in Python. Perhaps we should start a
> special interest group? Not so much in order to work on a single
> project; I believe that at the current state of parallel computing we
> still need many different approaches to be tried. But an exchange of
> experience could well be useful for all of us.
Well, I may just add that everybody is welcome to start discussion
regarding any parallel python project or idea in this forum:
http://www.parallelpython.com/component/option,com_smf/Itemid,29/board,2.0

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Threaded for loop

2007-01-13 Thread parallelpython
John wrote:
> I want to do something like this:
>
> for i = 1 in range(0,N):
>  for j = 1 in range(0,N):
>D[i][j] = calculate(i,j)
>
> I would like to now do this using a fixed number of threads, say 10
> threads.
> What is the easiest way to do the "parfor" in python?
>
> Thanks in advance for your help,

As it was already mentioned before threads will not help in terms of
parallelism (only one thread will be actually working). If you want to
calculate this in parallel here is an easy solution:

import ppsmp

#start with 10 processes
srv = ppsmp.Server(10)

f = []

for i = 1 in range(0,N):
  for j = 1 in range(0,N):
  #it might be a little bit more complex if 'calculate' depends on
other modules or calls functions
  f.append(srv.submit(calculate, (i,j)))

for i = 1 in range(0,N):
  for j = 1 in range(0,N):
 D[i][j] = f.pop(0)

You can get the latest version of ppsmp module here:
http://www.parallelpython.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Threaded for loop

2007-01-14 Thread parallelpython
John wrote:
> Thanks. Does it matter if I call shell commands os.system...etc in
> calculate?
>
> Thanks,
> --j

The os.system command neglects important changes in the environment
(redirected streams) and would not work with current version of ppsmp.
Although there is a very simple workaround:
print os.popen("yourcommand").read()
instead of os.system("yourcommand")


Here is a complete working example of that code:
http://www.parallelpython.com/component/option,com_smf/Itemid,29/topic,13.0

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parallel Python

2007-02-04 Thread parallelpython
On Jan 12, 11:52 am, Neal Becker <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > Has anybody tried to runparallelpythonapplications?
> > It appears that if your application is computation-bound using 'thread'
> > or 'threading' modules will not get you any speedup. That is because
> >pythoninterpreter uses GIL(Global Interpreter Lock) for internal
> > bookkeeping. The later allows only onepythonbyte-code instruction to
> > be executed at a time even if you have a multiprocessor computer.
> > To overcome this limitation, I've created ppsmp module:
> > http://www.parallelpython.com
> > It provides an easy way to runparallelpythonapplications on smp
> > computers.
> > I would appreciate any comments/suggestions regarding it.
> > Thank you!
>
> Looks interesting, but is there any way to use this for a cluster of
> machines over a network (not smp)?

There are 2 major updates regarding Parallel Python: http://
www.parallelpython.com

1) Now (since version 1.2) parallel python software could be used for
cluster-wide parallelization (or even Internet-wide). It's also
renamed accordingly: pp (module is backward compatible with ppsmp)

2) Parallel Python became open source (under BSD license): http://
www.parallelpython.com/content/view/18/32/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can Parallel Python run on a muti-CPU server ?

2007-02-09 Thread parallelpython
Hi,

That is definitely possible!
To achieve the best performance split your calculation either into 128
equal parts or int >>128 part of any size (then load balancing will
spread workload equally). Let us know the results, if need any help
with parallelization feel free to request it here:
http://www.parallelpython.com/component/option,com_smf/Itemid,29/
Thank you!

On Feb 7, 2:13 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
wrote:
> Hi all,
>
> I'm interested inParallelPythonand I learned from the website ofParallelPython
> that it can run on SMP and clusters. But can it run on a our muti-CPU
> server ?
> We are running an origin3800 server with 128 CPUs.
>
> Thanks.


-- 
http://mail.python.org/mailman/listinfo/python-list