I discovered I made a minor change that cost me dearly (I had thought I had tested this single change but perhaps didn't track the timing data closely).

MPI_Type_creat_struct performs well only when all the data is continuous in memory (at least for OpenMPI 1.1.2).

Is this normal or expected?

In my case the program has a f90 structure with 11 integers, 2 logicals, and five 50 element integer arrays. But at the first stage of the program only the first element of those arrays are used. But using MPI_Type_create_struct it is more efficient to send the entire 263 words of continuous memory (58 sec's) than to try and send only 18 words of noncontinuous memory (64 sec's). At the second stage it's 33 words and at that stage it becomes 47 sec's vs. 163 sec's, an extra 116 seconds, which dominates the push of my overall wall clock time from 130 to 278 seconds. The third stage increases from 13 seconds to 37 seconds.

Because I need to send this block of data back and forward a lot I was hoping to find a way to speed up this data transfer of this odd block of data and a couple other variables. I may try PACK and UNPACK on the structure, but calling those lots of times can't be more efficient.

Previously I was equivalencing the structure to a integer array and sending the integer array as a fast dirty solution to get started and it worked. Not completely portable no doubt.

Michael

ps. I don't currently have valgrind installed on this cluster and valgrind is not part of the Debian Linux 3.1r3 distribution. Without any experience with valgrind I'm not sure how useful valgrind will be with a MPI program of 500+ subroutines and 50K+ lines running on 16 processes. It took us a bit to get profiling working for the OpenMP version of this code.

On Mar 6, 2007, at 11:28 AM, George Bosilca wrote:

I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137
seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack,
for packing 8 integers. I know the code and I'm affirmative there is
no way to spend 27 seconds over there.

Can you run your application using valgrind with the callgrind tool.
This will give you some basic informations about where the time is
spend. This will give us additional information about where to look.

   Thanks,
     george.

On Mar 6, 2007, at 11:26 AM, Michael wrote:

I have a section of code were I need to send 8 separate integers via
BCAST.

Initially I was just putting the 8 integers into an array and then
sending that array.

I just tried using MPI_PACK on those 8 integers and I'm seeing a
massive slow down in the code, I have a lot of other communication
and this section is being used only 5 times.  I went from 140 seconds
to 277 seconds on 16 processors using TCP via a dual gigabit ethernet
setup (I'm the only user working on this system today).

This was run with OpenMPI 1.1.2 to maintain compatibility with a
major HPC site.

Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

"Half of what I say is meaningless; but I say it so that the other
half may reach you"
                                   Kahlil Gibran


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to