Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Jed Brown
On Fri, 29 Jan 2010 11:25:09 -0500, Richard Treumann  
wrote:
> Any support for automatic serialization of C++ objects would need to be in
> some sophisticated utility that is not part of MPI.  There may be such
> utilities but I do not think anyone who has been involved in the discussion
> knows of one you can use.  I certainly do not.

C++ really doesn't offer sufficient type introspection to implement
something like this.  Boost.MPI offers serialization for a few types
(e.g. some STL containers), but the general solution that you would like
just doesn't exist (you'd have to write special code for every type you
want to be able to operate on).

Python can do things like this, mpi4py can operate transparently on any
(pickleable) object, and also offers complete bindings to the low-level
MPI interface.  CL-MPI (Common Lisp) can also do these things, but it's
much less mature than mpi4py.

Jed


Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Eugene Loh

Tim wrote:


By serialization, I mean in the context of data storage and transmission. See 
http://en.wikipedia.org/wiki/Serialization

e.g. in a structure or class, if there is a pointer pointing to some memory 
outside the structure or class, one has to send the content of the memory 
besides the structure or class, right?
 

Okay, yes.  There are also MPI_Pack/MPI_Unpack functions that take 
general data types and pack them into contiguous (serialized) buffers.  
But you first have to describe to MPI what those data structures look 
like.  And it can certainly get complicated.


I think I don't have much else to contribute here.  There are lots of 
options and decisions to make based on the particulars of your data 
structures.  The general problem can certainly be complicated, as you've 
already indicated.  Good luck.


Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Richard Treumann

Tim

MPI is a library providing support for passing messages among several
distinct processes.  It offers datatype constructors that let an
application describe complex layouts of data in the local memory of a
process so a message can be sent from a complex data layout or received
into a complex layout.

MPI  does not have access to decisions made by the C++ compiler or the C++
runtime so the MPI library cannot deduce the layout for you.  To use MPI
you must either organize the data in some way that is easy to describe with
MPI datatypes or you must do rather complex data type constructions for
every message sent or received.

Any support for automatic serialization of C++ objects would need to be in
some sophisticated utility that is not part of MPI.  There may be such
utilities but I do not think anyone who has been involved in the discussion
knows of one you can use.  I certainly do not.

 Dick

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363




  
  From:   Tim <timlee...@yahoo.com> 
  

  
  To: Open MPI Users <us...@open-mpi.org>   
  

  
  Date:   01/29/2010 11:11 AM   
  

  
  Subject:    Re: [OMPI users] speed up this problem by MPI 
  

  
  Sent by:users-boun...@open-mpi.org
  

  





By serialization, I mean in the context of data storage and transmission.
See http://en.wikipedia.org/wiki/Serialization

e.g. in a structure or class, if there is a pointer pointing to some memory
outside the structure or class, one has to send the content of the memory
besides the structure or class, right?

--- On Fri, 1/29/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene....@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Friday, January 29, 2010, 11:06 AM
> Tim wrote:
>
> > Sorry, my typo. I meant to say OpenMPI documentation.
> >
> Okay.  "Open (space) MPI" is simply an implementation
> of the MPI standard -- e.g.,
http://www.mpi-forum.org/docs/mpi21-report.pdf .
> I imagine an on-line search will turn up a variety of
> tutorials and explanations of that standard.  But the
> standard, itself, is somewhat readable.
>
> > How to send/recieve and broadcast objects of
> self-defined class and of std::vector? If using
> MPI_Type_struct, the setup becomes complicated if the class
> has various types of data members, and a data member of
> another class.
> >
> I don't really know any C++, but I guess you're looking at
> it the right way.  That is, use derived MPI data types
> and "it's complicated".
>
> > How to deal with serialization problems?
> >
> Which serialization problems?  You seem to have a
> split/join problem.  The master starts, at some point
> there is parallel computation, then the masters does more
> work at the end.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Tim
By serialization, I mean in the context of data storage and transmission. See 
http://en.wikipedia.org/wiki/Serialization

e.g. in a structure or class, if there is a pointer pointing to some memory 
outside the structure or class, one has to send the content of the memory 
besides the structure or class, right?

--- On Fri, 1/29/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Friday, January 29, 2010, 11:06 AM
> Tim wrote:
> 
> > Sorry, my typo. I meant to say OpenMPI documentation.
> > 
> Okay.  "Open (space) MPI" is simply an implementation
> of the MPI standard -- e.g., http://www.mpi-forum.org/docs/mpi21-report.pdf . 
> I imagine an on-line search will turn up a variety of
> tutorials and explanations of that standard.  But the
> standard, itself, is somewhat readable.
> 
> > How to send/recieve and broadcast objects of
> self-defined class and of std::vector? If using
> MPI_Type_struct, the setup becomes complicated if the class
> has various types of data members, and a data member of
> another class.
> >  
> I don't really know any C++, but I guess you're looking at
> it the right way.  That is, use derived MPI data types
> and "it's complicated".
> 
> > How to deal with serialization problems?
> >  
> Which serialization problems?  You seem to have a
> split/join problem.  The master starts, at some point
> there is parallel computation, then the masters does more
> work at the end.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Eugene Loh

Tim wrote:


Sorry, my typo. I meant to say OpenMPI documentation.

Okay.  "Open (space) MPI" is simply an implementation of the MPI 
standard -- e.g., http://www.mpi-forum.org/docs/mpi21-report.pdf .  I 
imagine an on-line search will turn up a variety of tutorials and 
explanations of that standard.  But the standard, itself, is somewhat 
readable.



How to send/recieve and broadcast objects of self-defined class and of 
std::vector? If using MPI_Type_struct, the setup becomes complicated if the 
class has various types of data members, and a data member of another class.
 

I don't really know any C++, but I guess you're looking at it the right 
way.  That is, use derived MPI data types and "it's complicated".



How to deal with serialization problems?
 

Which serialization problems?  You seem to have a split/join problem.  
The master starts, at some point there is parallel computation, then the 
masters does more work at the end.


Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Tim
Sorry, my typo. I meant to say OpenMPI documentation. 

How to send/recieve and broadcast objects of self-defined class and of 
std::vector? If using MPI_Type_struct, the setup becomes complicated if the 
class has various types of data members, and a data member
of another class.

How to deal with serialization problems?

Are there some good reference for these problems?

--- On Fri, 1/29/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Friday, January 29, 2010, 10:39 AM
> Tim wrote:
> 
> > BTW: I would like to find some official documentation
> of OpenMP, but there seems none?
> >  
> OpenMP (a multithreading specification) has "nothing" to do
> with Open MPI (an implementation of MPI, a message-passing
> specification).  Assuming you meant OpenMP, try their
> web site:  http://openmp.org
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Eugene Loh

Tim wrote:


BTW: I would like to find some official documentation of OpenMP, but there 
seems none?
 

OpenMP (a multithreading specification) has "nothing" to do with Open 
MPI (an implementation of MPI, a message-passing specification).  
Assuming you meant OpenMP, try their web site:  http://openmp.org


Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Tim
Thanks!

How to send/recieve and broadcast objects of self-defined class and of 
std::vector? 

How to deal with serialization problems?

BTW: I would like to find some official documentation of OpenMP, but there 
seems none?

--- On Fri, 1/29/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Friday, January 29, 2010, 12:50 AM
> Tim wrote:
> 
> > Sorry, complicated_computation() and f() are
> simplified too much. They do take more inputs. 
> > Among the inputs to complicated_computation(), some is
> passed from the main() to f() by address since it is a big
> array, some is passed by value, some are created inside f()
> before the call to complicated_computation(). 
> > so actually (although not exactly) the code is like:
> >  
> I think I'm agreeing with Terry.  But, to add more
> detail:
> 
> >     int main(int argc, char **
> argv)       {     
>   int size;
> >      double *feature = new
> double[1000];
> >     // compute values of elements
> of "feature"
> >     // some operations
> >  
> The array "feature" can be computed by the master and then
> broadcast, or it could be computed redundantly by each
> process.
> 
> >     f(size, feature);   
>            // some
> operations      delete [] feature; 
>          return 0; 
>      }         
>    void f(int size, double *feature) 
>      {         
>      vector coeff; 
>         // read from a file into
> elements of coeff
> > 
> Similarly, coeff can be read in by the master and then
> broadcast, or it could be read redundantly by each process,
> or each process could read only the portion that it will
> need.
> 
> > 
> >     MyClass myobj;
> >     double * array =  new
> double [coeff.size()];         
>      for (int i = 0; i <
> coeff.size(); i++) // need to speed up by MPI.
> >     {       
>     array[i] = myobj.complicated_computation(size,
> coeff[i], feature); // time consuming   
>    }
> >  
> Each process loops only over the iterations that correspond
> to its rank.  Then, the master gathers all results.
> 
> >     // some operations using all
> elements in array         
>    delete [] array;
> >     }
> >  
> Once the slaves have finished their computations and sent
> their results to the master, they may exit.  The slaves
> will be launched at the same time as the master, but
> presumably have less to do than the master does before the
> "parallel loop" starts.  If you don't want slaves
> consuming excessive CPU time while they wait for the master,
> fix that problem later once you have the basic code
> working.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Eugene Loh

Tim wrote:

Sorry, complicated_computation() and f() are simplified too much. They do take more inputs. 

Among the inputs to complicated_computation(), some is passed from the main() to f() by address since it is a big array, some is passed by value, some are created inside f() before the call to complicated_computation(). 


so actually (although not exactly) the code is like:
 


I think I'm agreeing with Terry.  But, to add more detail:

int main(int argc, char ** argv)   
{   
 int size;

 double *feature = new double[1000];
// compute values of elements of "feature"
// some operations
 

The array "feature" can be computed by the master and then broadcast, or 
it could be computed redundantly by each process.


f(size, feature);   
// some operations  
delete [] feature;   
return 0;   
}   
  
void f(int size, double *feature)   
{   
vector coeff;  
// read from a file into elements of coeff


Similarly, coeff can be read in by the master and then broadcast, or it 
could be read redundantly by each process, or each process could read 
only the portion that it will need.




MyClass myobj;
double * array =  new double [coeff.size()];   
for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI.
{
array[i] = myobj.complicated_computation(size, coeff[i], feature); // time consuming   
}
 

Each process loops only over the iterations that correspond to its 
rank.  Then, the master gathers all results.


// some operations using all elements in array 
delete [] array;

}
 

Once the slaves have finished their computations and sent their results 
to the master, they may exit.  The slaves will be launched at the same 
time as the master, but presumably have less to do than the master does 
before the "parallel loop" starts.  If you don't want slaves consuming 
excessive CPU time while they wait for the master, fix that problem 
later once you have the basic code working.


Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Terry Frankcombe
In rank 0 main broadcast feature to all processes.
In f calculate a slice of array based on rank, then either send/recv
back to rank 0 or maybe gather.
Only rank 0 does everything else.  (Other ranks must call f after
recv'ing feature.)


On Thu, 2010-01-28 at 21:23 -0800, Tim wrote:
> Sorry, complicated_computation() and f() are simplified too much. They do 
> take more inputs. 
> 
> Among the inputs to complicated_computation(), some is passed from the main() 
> to f() by address since it is a big array, some is passed by value, some are 
> created inside f() before the call to complicated_computation(). 
> 
> so actually (although not exactly) the code is like:
> 
>  int main(int argc, char ** argv)   
>  {   
>   int size;
>   double *feature = new double[1000];
>  // compute values of elements of "feature"
>  // some operations  
>  f(size, feature);   
>  // some operations  
>  delete [] feature;   
>  return 0;   
>  }   
>
>  void f(int size, double *feature)   
>  {   
>  vector coeff;  
>  // read from a file into elements of coeff
>  MyClass myobj;
>  double * array =  new double [coeff.size()];   
>  for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI.
>  {
>  array[i] = myobj.complicated_computation(size, coeff[i], feature); // 
> time consuming   
>  }   
>  // some operations using all elements in array 
>  delete [] array;
>  }
> 
> --- On Thu, 1/28/10, Eugene Loh <eugene@sun.com> wrote:
> 
> > From: Eugene Loh <eugene@sun.com>
> > Subject: Re: [OMPI users] speed up this problem by MPI
> > To: "Open MPI Users" <us...@open-mpi.org>
> > Date: Thursday, January 28, 2010, 11:40 PM
> > Tim wrote:
> > 
> > > Thanks Eugene!
> > > 
> > > My case, after simplified, is to speed up the
> > time-consuming computation in the loop below by assigning
> > iterations to several nodes in a cluster by MPI. Each
> > iteration of the loop computes each element of an array. The
> > computation of each element is independent of others in the
> > array.
> > >   int main(int argc, char
> > ** argv)   {   
> >// some operations 
> >  f(size);   
> >// some
> > operations 
> >return 0;   
> >} 
> >void f(int size)   
> >{   // some
> > operations 
> >int i; 
> >  double * array =  new double
> > [size];   
> >for (i = 0; i < size; i++) // need to
> > speed up by MPI.
> > > {   
> >array[i] = complicated_computation(); //
> > time consuming 
> > What are the inputs to complicated_computation()? 
> > Does each process know what the inputs are?  Or, do
> > they need to come from the master process?  Are there
> > many inputs?
> > 
> > > }   
> >// some operations using all
> > elements in array   
> >delete [] array; 
> >}
> > >  
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> 
> 
>   
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Tim
Sorry, complicated_computation() and f() are simplified too much. They do take 
more inputs. 

Among the inputs to complicated_computation(), some is passed from the main() 
to f() by address since it is a big array, some is passed by value, some are 
created inside f() before the call to complicated_computation(). 

so actually (although not exactly) the code is like:

 int main(int argc, char ** argv)   
 {   
  int size;
  double *feature = new double[1000];
 // compute values of elements of "feature"
 // some operations  
 f(size, feature);   
 // some operations  
 delete [] feature;   
 return 0;   
 }   

 void f(int size, double *feature)   
 {   
 vector coeff;  
 // read from a file into elements of coeff
 MyClass myobj;
 double * array =  new double [coeff.size()];   
 for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI.
 {
 array[i] = myobj.complicated_computation(size, coeff[i], feature); // time 
consuming   
 }   
 // some operations using all elements in array 
 delete [] array;
 }

--- On Thu, 1/28/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Thursday, January 28, 2010, 11:40 PM
> Tim wrote:
> 
> > Thanks Eugene!
> > 
> > My case, after simplified, is to speed up the
> time-consuming computation in the loop below by assigning
> iterations to several nodes in a cluster by MPI. Each
> iteration of the loop computes each element of an array. The
> computation of each element is independent of others in the
> array.
> >       int main(int argc, char
> ** argv)       {   
>    // some operations     
>          f(size);   
>            // some
> operations         
>    return 0;   
>    }         
>    void f(int size)   
>    {       // some
> operations         
>    int i;         
>      double * array =  new double
> [size];           
>    for (i = 0; i < size; i++) // need to
> speed up by MPI.
> >     {   
>    array[i] = complicated_computation(); //
> time consuming     
> What are the inputs to complicated_computation()? 
> Does each process know what the inputs are?  Or, do
> they need to come from the master process?  Are there
> many inputs?
> 
> >     }       
>        // some operations using all
> elements in array           
>    delete [] array; 
>    }
> >  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Eugene Loh

Tim wrote:


Thanks Eugene!

My case, after simplified, is to speed up the time-consuming computation in the 
loop below by assigning iterations to several nodes in a cluster by MPI. Each 
iteration of the loop computes each element of an array. The computation of 
each element is independent of others in the array.
  
int main(int argc, char ** argv)   
{   
// some operations   
f(size);   
// some operations 
return 0;   
}   
  
void f(int size)   
{   
// some operations 
int i;   
double * array =  new double [size];   
for (i = 0; i < size; i++) // need to speed up by MPI.
{   
array[i] = complicated_computation(); // time consuming
 

What are the inputs to complicated_computation()?  Does each process 
know what the inputs are?  Or, do they need to come from the master 
process?  Are there many inputs?


}   
// some operations using all elements in array   
delete [] array; 
}
 



Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Gus Correa

Hi Tim

Your OpenMP layout suggests that there are no data dependencies
in your "complicated_computation()" and the operations therein
are local.
I will assume this is true in what I suggest.

In MPI you could use MPI_Scatter to distribute the (initial)
array values before the computational loop,
and MPI_Gather to collect the results after the loop.
This approach would stay relatively close
to your current program logic/structure.

The process that distributes and collects the array,
typically rank 0, takes responsibility to read/initialize,
and write/report the results.
Normally it also takes part in the computation,
as there is no reason for it to be just the "master",
and sit idle while the "slave" processes do the work.

On this ("master", rank 0) process the array would be allocated with
the "global" "size".
On the remaining processes ("slaves"), the allocated array
could be smaller, just as big as to hold the array segment that is 
computed/manipulated there.

How much memory you need to allocate depends on how many
processes you launch, and can be controlled dynamically,
at run time (see below).

At the very beginning of the program you need to
1) initialize MPI (MPI_Init),
2) get each process rank (MPI_Comm_rank), and
3) get the number of processes (MPI_Comm_size).
Memory allocation would probably come after that,
once you know how many processes are at work.

At the end of the program you need to
4) shut MPI down (MPI_Finalize).

In OpenMP you can use $OMP_NUM_THREADS to decide at run time
how many processes to use.
In MPI this is done when you launch the executable
by the mpirun command:  "mpirun -n $NPROC my_mpi_executable",
where $NPROC is the counterpart of $OMP_NUM_THREADS,
i.e., the number of processes you want to launch.

If you have access to a library, check Peter S. Pacheco's book
"Parallel Programming with MPI", as it has examples similar to
your problem, and will get you going with MPI in no time.
You will also need to check the syntactic details of the MPI functions.

I hope this helps.
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-



Tim wrote:

Hi,

(1). I am wondering how I can speed up the time-consuming computation in the 
loop of my code below using MPI?
   
 int main(int argc, char ** argv)   
 {   
 // some operations   
 f(size);   
 // some operations 
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations  
 int i;   
 double * array =  new double [size];   
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }   
 // some operations using all elements in array   
 delete [] array;  
 }


As shown in the code, I want to do some operations before and after the part to 
be paralleled with MPI, but I don't know how to specify where the parallel part 
begins and ends.

(2) My current code is using OpenMP to speed up the comutation. 

 void f(int size)   
 {   
 // some operations   
 int i;   
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait  
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }  
 } 
 // some operations using all elements in array   
 }


I wonder if I change to use MPI, is it possible to have the code written both 
for OpenMP and MPI? If it is possible, how to write the code and how to compile 
and run the code?

Thanks and regards!


  
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Tim
Thanks Eugene!

My case, after simplified, is to speed up the time-consuming computation in the 
loop below by assigning iterations to several nodes in a cluster by MPI. Each 
iteration of the loop computes each element of an array. The computation of 
each element is independent of others in the array.

 int main(int argc, char ** argv)   
 {   
 // some operations   
 f(size);   
 // some operations 
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations 
 int i;   
 double * array =  new double [size];   
 for (i = 0; i < size; i++) // need to speed up by MPI.
 {   
 array[i] = complicated_computation(); // time consuming
 }   
 // some operations using all elements in array   
 delete [] array; 
 }

--- On Thu, 1/28/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Thursday, January 28, 2010, 8:31 PM
> Tim wrote:
> 
> > Thanks, Eugene.
> > 
> > I admit I am not that smart to understand well how to
> use MPI, but I did read some basic materials about it and
> understand how some simple problems are solved by MPI. 
> > But dealing with an array in my case, I am not certain
> about how to apply MPI to it. Are you saying to use send and
> recieve to transfer the value computed for each element from
> child process to parent process?
> > 
> You can, but typically that would entail too much
> communication overhead for each element.
> 
> > Do you allocate a copy of the array for each process?
> >  
> You can, but typically that would entail excessive memory
> consumption.
> 
> Typically, one allocates only a portion of the array on
> each process.  E.g., if the array has 10,000 elements
> and you have four processes, the first gets the first 2,500
> elements, the second the next 2,500, and so on.
> 
> > Also I only need the loop that computes every element
> of the array to be parallelized.
> > 
> If you only need the initial computation of array elements
> to be parallelized, perhaps any of the above strategies
> could work.  It depends on how expensive the
> computation of each element is.
> 
> > Someone said that the parallel part begins with
> MPI_Init and ends with MPI_Finilize,
> > 
> Well, usually all processes are launched in parallel. 
> So, the parallel begins "immediately."  Inter-process
> communications using MPI, however, must take place between
> the MPI_Init and MPI_Finalize calls.
> 
> > and one can do any serial computations before and/or
> after these calls. But I have wrote some MPI programs, and
> found that the parallel part is not restricted between
> MPI_Init and MPI_Finilize, but instead the whole program. If
> the rest part of the code has to be wrapped for process with
> ID 0, I have little idea about how to apply that to my case
> since the rest part would be the parts before and after the
> loop in the function and the whole in main().
> > 
> I don't understand your case very clearly.  I will
> take a guess.  You could have all processes start and
> call MPI_Init.  Then, slave processes can go to sleep,
> waking occasionally to check if the master has sent a signal
> to begin computation.  The master does what it has to
> do and then sends wake signals.  Each slave computes
> its portion and sends that portion back to the master. 
> Each slave exits.  The master gathers all the pieces
> and resumes its computation.  Does that sound right?
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Terry Frankcombe
On Thu, 2010-01-28 at 17:05 -0800, Tim wrote:
> Also I only need the loop that computes every element of the array to
> be parallelized. Someone said that the parallel part begins with
> MPI_Init and ends with MPI_Finilize, and one can do any serial
> computations before and/or after these calls. But I have wrote some
> MPI programs, and found that the parallel part is not restricted
> between MPI_Init and MPI_Finilize, but instead the whole program. If
> the rest part of the code has to be wrapped for process with ID 0, I
> have little idea about how to apply that to my case since the rest
> part would be the parts before and after the loop in the function and
> the whole in main().

I think you're being polluted by your OpenMP experience!  ;-)

Unlike in OpenMP, there is no concept of "parallel region" when using
MPI.  MPI allows you to pass data between processes.  That's all.  It's
up to you to write your code in such a way that the data is used allow
parallel computation.

Often MPI_Init and MPI_Finilize are amongst the first and last things
done in a parallel code, respectively.  They effectively say "set up
stuff so I can pass messages effectively" and "clean that up".  Each
process runs from start to finish "independently".

As an aside, using MPI is much more invasive than OpenMP.  Parallelising
an existing serial code can be hard with MPI.  But if you start from
scratch you usually end up with a better code with MPI than with OpenMP
(e.g. MPI makes you think about data locality, whereas you can ignore
all the bad things bad locality does and still have a working code with
OpenMP.)




Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Natarajan CS
Hi Tom,
sorry to add something in the same vein as Eugene's reply. i think
this is an excellent resource
http://ci-tutor.ncsa.illinois.edu/login.php. It's a great online course and
detailed! Before I took proper classes, this helped me a lot!!

On Thu, Jan 28, 2010 at 7:05 PM, Tim <timlee...@yahoo.com> wrote:

> Thanks, Eugene.
>
> I admit I am not that smart to understand well how to use MPI, but I did
> read some basic materials about it and understand how some simple problems
> are solved by MPI.
>
> But dealing with an array in my case, I am not certain about how to apply
> MPI to it. Are you saying to use send and recieve to transfer the value
> computed for each element from child process to parent process? Do you
> allocate a copy of the array for each process?
>
> Also I only need the loop that computes every element of the array to be
> parallelized. Someone said that the parallel part begins with MPI_Init and
> ends with MPI_Finilize, and one can do any serial computations before and/or
> after these calls. But I have wrote some MPI programs, and found that the
> parallel part is not restricted between MPI_Init and MPI_Finilize, but
> instead the whole program. If the rest part of the code has to be wrapped
> for process with ID 0, I have little idea about how to apply that to my case
> since the rest part would be the parts before and after the loop in the
> function and the whole in main().
>
> If someone could give a sample of how to apply MPI in my case, it will
> clarify a lot of my questions. Usually I can learn a lot from good examples.
>
> Thanks!
>
> --- On Thu, 1/28/10, Eugene Loh <eugene....@sun.com> wrote:
>
> > From: Eugene Loh <eugene@sun.com>
> > Subject: Re: [OMPI users] speed up this problem by MPI
> > To: "Open MPI Users" <us...@open-mpi.org>
> > Date: Thursday, January 28, 2010, 7:30 PM
> > Take a look at some introductory MPI
> > materials to learn how to use MPI and what it's about.
> > There should be resources on-line... take a look around.
> >
> > The main idea is that you would have many processes, each
> > process would have part of the array.  Thereafter, if a
> > process needs data or results from any other process, such
> > data would have to be exchanged between the processes
> > explicitly.
> >
> > Many codes have both OpenMP and MPI parallelization, but
> > you should first familiarize yourself with the basics of MPI
> > before dealing with "hybrid" codes.
> >
> > Tim wrote:
> >
> > > Hi,
> > >
> > > (1). I am wondering how I can speed up the
> > time-consuming computation in the loop of my code below
> > using MPI?
> > >   int main(int argc, char
> > ** argv)   {
> >// some operations
> >  f(size);
> >// some
> > operations
> >return 0;
> >}
> >   void f(int size)
> >{   // some
> > operations
> > int i;
> >double * array =  new double
> > [size];
> >for (i = 0; i < size; i++) // how can I
> > use MPI to speed up this loop to compute all elements in the
> > array?   {
> >array[i] = complicated_computation(); //
> > time comsuming computation
> >}
> >// some operations using all elements in
> > array
> >delete [] array;  }
> > >
> > > As shown in the code, I want to do some operations
> > before and after the part to be paralleled with MPI, but I
> > don't know how to specify where the parallel part begins and
> > ends.
> > >
> > > (2) My current code is using OpenMP to speed up the
> > comutation.
> > > void f(int size)
> >{   // some
> > operations
> >int i;
> >  double * array =  new double
> > [size];
> >omp_set_num_threads(_nb_threads);
> > #pragma omp parallel shared(array)
> > private(i)  {
> > > #pragma omp for
> > schedule(dynamic) nowait
> > for (i = 0; i < size; i++) // how can I use
> > MPI to speed up this loop to compute all elements in the
> > array?   {
> >array[i] = complicated_computation(); //
> > time comsuming computation
> >}
> >   } // some operations using
> > all elements in array
> >  }
> > >
> > > I wonder if I change to use MPI, is it possible to
> > have the code written both for OpenMP and MPI? If it is
> > possible, how to write the code and how to compile and run
> > the code?
> > >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Eugene Loh

Tim wrote:


Thanks, Eugene.

I admit I am not that smart to understand well how to use MPI, but I did read some basic materials about it and understand how some simple problems are solved by MPI. 


But dealing with an array in my case, I am not certain about how to apply MPI 
to it. Are you saying to use send and recieve to transfer the value computed 
for each element from child process to parent process?

You can, but typically that would entail too much communication overhead 
for each element.



Do you allocate a copy of the array for each process?
 


You can, but typically that would entail excessive memory consumption.

Typically, one allocates only a portion of the array on each process.  
E.g., if the array has 10,000 elements and you have four processes, the 
first gets the first 2,500 elements, the second the next 2,500, and so on.



Also I only need the loop that computes every element of the array to be 
parallelized.

If you only need the initial computation of array elements to be 
parallelized, perhaps any of the above strategies could work.  It 
depends on how expensive the computation of each element is.



Someone said that the parallel part begins with MPI_Init and ends with 
MPI_Finilize,

Well, usually all processes are launched in parallel.  So, the parallel 
begins "immediately."  Inter-process communications using MPI, however, 
must take place between the MPI_Init and MPI_Finalize calls.



and one can do any serial computations before and/or after these calls. But I 
have wrote some MPI programs, and found that the parallel part is not 
restricted between MPI_Init and MPI_Finilize, but instead the whole program. If 
the rest part of the code has to be wrapped for process with ID 0, I have 
little idea about how to apply that to my case since the rest part would be the 
parts before and after the loop in the function and the whole in main().

I don't understand your case very clearly.  I will take a guess.  You 
could have all processes start and call MPI_Init.  Then, slave processes 
can go to sleep, waking occasionally to check if the master has sent a 
signal to begin computation.  The master does what it has to do and then 
sends wake signals.  Each slave computes its portion and sends that 
portion back to the master.  Each slave exits.  The master gathers all 
the pieces and resumes its computation.  Does that sound right?


Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Tim
Thanks, Eugene.

I admit I am not that smart to understand well how to use MPI, but I did read 
some basic materials about it and understand how some simple problems are 
solved by MPI. 

But dealing with an array in my case, I am not certain about how to apply MPI 
to it. Are you saying to use send and recieve to transfer the value computed 
for each element from child process to parent process? Do you allocate a copy 
of the array for each process?

Also I only need the loop that computes every element of the array to be 
parallelized. Someone said that the parallel part begins with MPI_Init and ends 
with MPI_Finilize, and one can do any serial computations before and/or after 
these calls. But I have wrote some MPI programs, and found that the parallel 
part is not restricted between MPI_Init and MPI_Finilize, but instead the whole 
program. If the rest part of the code has to be wrapped for process with ID 0, 
I have little idea about how to apply that to my case since the rest part would 
be the parts before and after the loop in the function and the whole in main().

If someone could give a sample of how to apply MPI in my case, it will clarify 
a lot of my questions. Usually I can learn a lot from good examples.

Thanks!

--- On Thu, 1/28/10, Eugene Loh <eugene@sun.com> wrote:

> From: Eugene Loh <eugene@sun.com>
> Subject: Re: [OMPI users] speed up this problem by MPI
> To: "Open MPI Users" <us...@open-mpi.org>
> Date: Thursday, January 28, 2010, 7:30 PM
> Take a look at some introductory MPI
> materials to learn how to use MPI and what it's about. 
> There should be resources on-line... take a look around.
> 
> The main idea is that you would have many processes, each
> process would have part of the array.  Thereafter, if a
> process needs data or results from any other process, such
> data would have to be exchanged between the processes
> explicitly.
> 
> Many codes have both OpenMP and MPI parallelization, but
> you should first familiarize yourself with the basics of MPI
> before dealing with "hybrid" codes.
> 
> Tim wrote:
> 
> > Hi,
> > 
> > (1). I am wondering how I can speed up the
> time-consuming computation in the loop of my code below
> using MPI?
> >       int main(int argc, char
> ** argv)       {   
>    // some operations     
>          f(size);   
>            // some
> operations         
>    return 0;   
>    }           
>   void f(int size)   
>    {       // some
> operations             
> int i;           
>    double * array =  new double
> [size];           
>    for (i = 0; i < size; i++) // how can I
> use MPI to speed up this loop to compute all elements in the
> array?       {   
>    array[i] = complicated_computation(); //
> time comsuming computation   
>    }           
>    // some operations using all elements in
> array           
>    delete [] array;      }
> > 
> > As shown in the code, I want to do some operations
> before and after the part to be paralleled with MPI, but I
> don't know how to specify where the parallel part begins and
> ends.
> > 
> > (2) My current code is using OpenMP to speed up the
> comutation. 
> >     void f(int size)   
>    {       // some
> operations           
>    int i;         
>      double * array =  new double
> [size];   
>    omp_set_num_threads(_nb_threads); 
>     #pragma omp parallel shared(array)
> private(i)      {
> >     #pragma omp for
> schedule(dynamic) nowait         
>     for (i = 0; i < size; i++) // how can I use
> MPI to speed up this loop to compute all elements in the
> array?       {   
>    array[i] = complicated_computation(); //
> time comsuming computation   
>    }           
>   }     // some operations using
> all elements in array         
>      }
> > 
> > I wonder if I change to use MPI, is it possible to
> have the code written both for OpenMP and MPI? If it is
> possible, how to write the code and how to compile and run
> the code?
> >  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Eugene Loh
Take a look at some introductory MPI materials to learn how to use MPI 
and what it's about.  There should be resources on-line... take a look 
around.


The main idea is that you would have many processes, each process would 
have part of the array.  Thereafter, if a process needs data or results 
from any other process, such data would have to be exchanged between the 
processes explicitly.


Many codes have both OpenMP and MPI parallelization, but you should 
first familiarize yourself with the basics of MPI before dealing with 
"hybrid" codes.


Tim wrote:


Hi,

(1). I am wondering how I can speed up the time-consuming computation in the 
loop of my code below using MPI?
  
int main(int argc, char ** argv)   
{   
// some operations   
f(size);   
// some operations 
return 0;   
}   
   
void f(int size)   
{   
// some operations  
int i;   
double * array =  new double [size];   
for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
{   
array[i] = complicated_computation(); // time comsuming computation   
}   
// some operations using all elements in array   
delete [] array;  
}


As shown in the code, I want to do some operations before and after the part to 
be paralleled with MPI, but I don't know how to specify where the parallel part 
begins and ends.

(2) My current code is using OpenMP to speed up the comutation. 

void f(int size)   
{   
// some operations   
int i;   
double * array =  new double [size];   
omp_set_num_threads(_nb_threads);  
#pragma omp parallel shared(array) private(i)  
{
#pragma omp for schedule(dynamic) nowait  
for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
{   
array[i] = complicated_computation(); // time comsuming computation   
}  
} 
// some operations using all elements in array   
}


I wonder if I change to use MPI, is it possible to have the code written both 
for OpenMP and MPI? If it is possible, how to write the code and how to compile 
and run the code?
 



[OMPI users] speed up this problem by MPI

2010-01-28 Thread Tim
Hi,

(1). I am wondering how I can speed up the time-consuming computation in the 
loop of my code below using MPI?

 int main(int argc, char ** argv)   
 {   
 // some operations   
 f(size);   
 // some operations 
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations  
 int i;   
 double * array =  new double [size];   
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to 
compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }   
 // some operations using all elements in array   
 delete [] array;  
 }

As shown in the code, I want to do some operations before and after the part to 
be paralleled with MPI, but I don't know how to specify where the parallel part 
begins and ends.

(2) My current code is using OpenMP to speed up the comutation. 

 void f(int size)   
 {   
 // some operations   
 int i;   
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait  
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to 
compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }  
 } 
 // some operations using all elements in array   
 }

I wonder if I change to use MPI, is it possible to have the code written both 
for OpenMP and MPI? If it is possible, how to write the code and how to compile 
and run the code?

Thanks and regards!