Hi jeffy,
Thanks for your reply.
I am not understanding how MPI_Reduce would be useful.
Say I have 3 processes and each process has array [1,2,3,4]
When each process calculates the prefix sum using cuda each process will
have array as [1,3,6,10]
so if I use MPI_Reduce to gather results it
You probably want MPI_Reduce, instead.
http://www.open-mpi.org/doc/v1.6/man3/MPI_Reduce.3.php
On May 15, 2012, at 11:27 PM, Rohan Deshpande wrote:
> I am performing Prefix scan operation on cluster
>
> I have 3 MPI tasks and master task is responsible for distributing the data
>
> Now,
I am performing Prefix scan operation on cluster
I have 3 MPI tasks and master task is responsible for distributing the data
Now, each task calculates sum of its own part of array using GPUs and
returns the results to master task.
Master task also calculates its own part of array using GPU.