Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Jeff Squyres
On Jan 27, 2012, at 5:12 AM, Brett Tully wrote:

> Looking at the change log for 1.5.1 I see:
> - Use memmove (instead of memcpy) when necessary (e.g., source and 
> destination overlap).

Checking the logs, it looks like that fix was in 1.4.3, too.

Do you know if your application has sends/receives where the source and 
destination overlap?

Just curious -- have you run your application thought a memory checking 
debugger, like valgrind?  Sometimes application memory corruption can show up 
in very strange (and non-deterministic) ways.

> It seems as though this might be a likely candidate for a change that might 
> fix my problems if I am indeed using 1.5.3 following the installation of 
> OpenFOAM?
> 
> On Fri, Jan 27, 2012 at 10:02 AM, Brett Tully wrote:
> Interesting. In the same set of updates, I installed OpenFOAM from their 
> Ubuntu deb package and it claims to ship with openmpi. I just downloaded 
> their Third-party source tar and unzipped it to see what version of openmpi 
> they are using, and it is 1.5.3. However, when I do man openmpi, or 
> ompi_info, I get the same version as before (1.4.3). How do I determine for 
> sure what is being included when I compile something using mpicc?

You need to be sure that the mpicc (etc.) you are using to compile your app 
exactly matches the mpirun.  mpicc --showme:version will show you the version 
that it is using.  In general, you should be able to "which mpicc;which 
mpirun;which ompi_info" to see where your executables are coming from.  This 
will likely give you a good clue to ensure that a) everything is matching (you 
want to ensure that your LD_LIBRARY_PATH also matches, or that your desired 
libmpi.so is in a system default library path), and b) what version they are. 

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread TERRY DONTJE
 ompi_info should tell you the current version of Open MPI your path is 
pointing to.
Are you sure your path is pointing to the area that the OpenFOAM package 
delivered Open MPI into?


--td
On 1/27/2012 5:02 AM, Brett Tully wrote:
Interesting. In the same set of updates, I installed OpenFOAM from 
their Ubuntu deb package and it claims to ship with openmpi. I just 
downloaded their Third-party source tar and unzipped it to see what 
version of openmpi they are using, and it is 1.5.3. However, when I do 
man openmpi, or ompi_info, I get the same version as before (1.4.3). 
How do I determine for sure what is being included when I compile 
something using mpicc?


Thanks,
Brett.


On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres > wrote:


What version did you upgrade to?  (we don't control the Ubuntu
packaging)

I see a bullet in the soon-to-be-released 1.4.5 release notes:

- Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
 Senin for reporting the problem.

But that would be surprising if this is what fixed your issue,
especially since it's not released yet.  :-)



On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:

> As of two days ago, this problem has disappeared and the tests
that I had written and run each night are now passing. Having
looked through the update log of my machine (Ubuntu 11.10) it
appears as though I got a new version of mpi-default-dev
(0.6ubuntu1). I would like to understand this problem in more
detail -- is it possible to see what changed in this update?
> Thanks,
> Brett.
>
>
>
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma > wrote:
> I guess your output is from different ranks.   YOu can add rank
infor inside print to tell like follows:
>
> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
gathered[i].node);
>
> From my side, I did not see anything wrong from your code in
Open MPI 1.4.3. after I add rank, the output is
> rank 5: gathered[0].node = 0
> rank 5: gathered[1].node = 1
> rank 5: gathered[2].node = 2
> rank 5: gathered[3].node = 3
> rank 5: gathered[4].node = 4
> rank 5: gathered[5].node = 5
> rank 3: gathered[0].node = 0
> rank 3: gathered[1].node = 1
> rank 3: gathered[2].node = 2
> rank 3: gathered[3].node = 3
> rank 3: gathered[4].node = 4
> rank 3: gathered[5].node = 5
> rank 1: gathered[0].node = 0
> rank 1: gathered[1].node = 1
> rank 1: gathered[2].node = 2
> rank 1: gathered[3].node = 3
> rank 1: gathered[4].node = 4
> rank 1: gathered[5].node = 5
> rank 0: gathered[0].node = 0
> rank 0: gathered[1].node = 1
> rank 0: gathered[2].node = 2
> rank 0: gathered[3].node = 3
> rank 0: gathered[4].node = 4
> rank 0: gathered[5].node = 5
> rank 4: gathered[0].node = 0
> rank 4: gathered[1].node = 1
> rank 4: gathered[2].node = 2
> rank 4: gathered[3].node = 3
> rank 4: gathered[4].node = 4
> rank 4: gathered[5].node = 5
> rank 2: gathered[0].node = 0
> rank 2: gathered[1].node = 1
> rank 2: gathered[2].node = 2
> rank 2: gathered[3].node = 3
> rank 2: gathered[4].node = 4
> rank 2: gathered[5].node = 5
>
> Is that what you expected?
>
> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully
> wrote:
> Dear all,
>
> I have not used OpenMPI much before, but am maintaining a large
legacy application. We noticed a bug to do with a call to
MPI_Allgather as summarised in this post to Stackoverflow:

http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>
> In the process of looking further into the problem, I noticed
that the following function results in strange behaviour.
>
> void test_all_gather() {
>
> struct _TEST_ALL_GATHER {
> int node;
> };
>
> int ierr, size, rank;
> ierr = MPI_Comm_size(MPI_COMM_WORLD, );
> ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
>
> struct _TEST_ALL_GATHER local;
> struct _TEST_ALL_GATHER *gathered;
>
> gathered = (struct _TEST_ALL_GATHER*) malloc(size *
sizeof(*gathered));
>
> local.node = rank;
>
> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
MPI_COMM_WORLD);
>
> int i;
> for (i = 0; i < numnodes; ++i) {
> (void) printf("gathered[%d].node = %d\n", i,
gathered[i].node);
> }
>
> FREE(gathered);
> }
>
> At one point, this function printed the following:
> gathered[0].node = 2
> gathered[1].node = 3
> gathered[2].node = 2
> gathered[3].node = 3
> 

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Brett Tully
Looking at the change log for 1.5.1 I see:
- Use memmove (instead of memcpy) when necessary (e.g., source
and destination overlap).

It seems as though this might be a likely candidate for a change that might
fix my problems if I am indeed using 1.5.3 following the installation of
OpenFOAM?

On Fri, Jan 27, 2012 at 10:02 AM, Brett Tully wrote:

> Interesting. In the same set of updates, I installed OpenFOAM from their
> Ubuntu deb package and it claims to ship with openmpi. I just downloaded
> their Third-party source tar and unzipped it to see what version of openmpi
> they are using, and it is 1.5.3. However, when I do man openmpi, or
> ompi_info, I get the same version as before (1.4.3). How do I determine for
> sure what is being included when I compile something using mpicc?
>
> Thanks,
> Brett.
>
>
>
> On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres  wrote:
>
>> What version did you upgrade to?  (we don't control the Ubuntu packaging)
>>
>> I see a bullet in the soon-to-be-released 1.4.5 release notes:
>>
>> - Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
>>  Senin for reporting the problem.
>>
>> But that would be surprising if this is what fixed your issue, especially
>> since it's not released yet.  :-)
>>
>>
>>
>> On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:
>>
>> > As of two days ago, this problem has disappeared and the tests that I
>> had written and run each night are now passing. Having looked through the
>> update log of my machine (Ubuntu 11.10) it appears as though I got a new
>> version of mpi-default-dev (0.6ubuntu1). I would like to understand this
>> problem in more detail -- is it possible to see what changed in this update?
>> > Thanks,
>> > Brett.
>> >
>> >
>> >
>> > On Fri, Dec 9, 2011 at 6:43 PM, teng ma  wrote:
>> > I guess your output is from different ranks.   YOu can add rank infor
>> inside print to tell like follows:
>> >
>> > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
>> gathered[i].node);
>> >
>> > From my side, I did not see anything wrong from your code in Open MPI
>> 1.4.3. after I add rank, the output is
>> > rank 5: gathered[0].node = 0
>> > rank 5: gathered[1].node = 1
>> > rank 5: gathered[2].node = 2
>> > rank 5: gathered[3].node = 3
>> > rank 5: gathered[4].node = 4
>> > rank 5: gathered[5].node = 5
>> > rank 3: gathered[0].node = 0
>> > rank 3: gathered[1].node = 1
>> > rank 3: gathered[2].node = 2
>> > rank 3: gathered[3].node = 3
>> > rank 3: gathered[4].node = 4
>> > rank 3: gathered[5].node = 5
>> > rank 1: gathered[0].node = 0
>> > rank 1: gathered[1].node = 1
>> > rank 1: gathered[2].node = 2
>> > rank 1: gathered[3].node = 3
>> > rank 1: gathered[4].node = 4
>> > rank 1: gathered[5].node = 5
>> > rank 0: gathered[0].node = 0
>> > rank 0: gathered[1].node = 1
>> > rank 0: gathered[2].node = 2
>> > rank 0: gathered[3].node = 3
>> > rank 0: gathered[4].node = 4
>> > rank 0: gathered[5].node = 5
>> > rank 4: gathered[0].node = 0
>> > rank 4: gathered[1].node = 1
>> > rank 4: gathered[2].node = 2
>> > rank 4: gathered[3].node = 3
>> > rank 4: gathered[4].node = 4
>> > rank 4: gathered[5].node = 5
>> > rank 2: gathered[0].node = 0
>> > rank 2: gathered[1].node = 1
>> > rank 2: gathered[2].node = 2
>> > rank 2: gathered[3].node = 3
>> > rank 2: gathered[4].node = 4
>> > rank 2: gathered[5].node = 5
>> >
>> > Is that what you expected?
>> >
>> > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully 
>> wrote:
>> > Dear all,
>> >
>> > I have not used OpenMPI much before, but am maintaining a large legacy
>> application. We noticed a bug to do with a call to MPI_Allgather as
>> summarised in this post to Stackoverflow:
>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>> >
>> > In the process of looking further into the problem, I noticed that the
>> following function results in strange behaviour.
>> >
>> > void test_all_gather() {
>> >
>> > struct _TEST_ALL_GATHER {
>> > int node;
>> > };
>> >
>> > int ierr, size, rank;
>> > ierr = MPI_Comm_size(MPI_COMM_WORLD, );
>> > ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
>> >
>> > struct _TEST_ALL_GATHER local;
>> > struct _TEST_ALL_GATHER *gathered;
>> >
>> > gathered = (struct _TEST_ALL_GATHER*) malloc(size *
>> sizeof(*gathered));
>> >
>> > local.node = rank;
>> >
>> > MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>> > gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>> MPI_COMM_WORLD);
>> >
>> > int i;
>> > for (i = 0; i < numnodes; ++i) {
>> > (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
>> > }
>> >
>> > FREE(gathered);
>> > }
>> >
>> > At one point, this function printed the following:
>> > gathered[0].node = 2
>> > gathered[1].node = 3
>> > gathered[2].node = 2
>> > gathered[3].node = 3
>> > gathered[4].node = 4
>> > gathered[5].node = 5
>> >
>> > Can anyone suggest a 

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Brett Tully
Interesting. In the same set of updates, I installed OpenFOAM from their
Ubuntu deb package and it claims to ship with openmpi. I just downloaded
their Third-party source tar and unzipped it to see what version of openmpi
they are using, and it is 1.5.3. However, when I do man openmpi, or
ompi_info, I get the same version as before (1.4.3). How do I determine for
sure what is being included when I compile something using mpicc?

Thanks,
Brett.


On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres  wrote:

> What version did you upgrade to?  (we don't control the Ubuntu packaging)
>
> I see a bullet in the soon-to-be-released 1.4.5 release notes:
>
> - Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
>  Senin for reporting the problem.
>
> But that would be surprising if this is what fixed your issue, especially
> since it's not released yet.  :-)
>
>
>
> On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:
>
> > As of two days ago, this problem has disappeared and the tests that I
> had written and run each night are now passing. Having looked through the
> update log of my machine (Ubuntu 11.10) it appears as though I got a new
> version of mpi-default-dev (0.6ubuntu1). I would like to understand this
> problem in more detail -- is it possible to see what changed in this update?
> > Thanks,
> > Brett.
> >
> >
> >
> > On Fri, Dec 9, 2011 at 6:43 PM, teng ma  wrote:
> > I guess your output is from different ranks.   YOu can add rank infor
> inside print to tell like follows:
> >
> > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
> gathered[i].node);
> >
> > From my side, I did not see anything wrong from your code in Open MPI
> 1.4.3. after I add rank, the output is
> > rank 5: gathered[0].node = 0
> > rank 5: gathered[1].node = 1
> > rank 5: gathered[2].node = 2
> > rank 5: gathered[3].node = 3
> > rank 5: gathered[4].node = 4
> > rank 5: gathered[5].node = 5
> > rank 3: gathered[0].node = 0
> > rank 3: gathered[1].node = 1
> > rank 3: gathered[2].node = 2
> > rank 3: gathered[3].node = 3
> > rank 3: gathered[4].node = 4
> > rank 3: gathered[5].node = 5
> > rank 1: gathered[0].node = 0
> > rank 1: gathered[1].node = 1
> > rank 1: gathered[2].node = 2
> > rank 1: gathered[3].node = 3
> > rank 1: gathered[4].node = 4
> > rank 1: gathered[5].node = 5
> > rank 0: gathered[0].node = 0
> > rank 0: gathered[1].node = 1
> > rank 0: gathered[2].node = 2
> > rank 0: gathered[3].node = 3
> > rank 0: gathered[4].node = 4
> > rank 0: gathered[5].node = 5
> > rank 4: gathered[0].node = 0
> > rank 4: gathered[1].node = 1
> > rank 4: gathered[2].node = 2
> > rank 4: gathered[3].node = 3
> > rank 4: gathered[4].node = 4
> > rank 4: gathered[5].node = 5
> > rank 2: gathered[0].node = 0
> > rank 2: gathered[1].node = 1
> > rank 2: gathered[2].node = 2
> > rank 2: gathered[3].node = 3
> > rank 2: gathered[4].node = 4
> > rank 2: gathered[5].node = 5
> >
> > Is that what you expected?
> >
> > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully 
> wrote:
> > Dear all,
> >
> > I have not used OpenMPI much before, but am maintaining a large legacy
> application. We noticed a bug to do with a call to MPI_Allgather as
> summarised in this post to Stackoverflow:
> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
> >
> > In the process of looking further into the problem, I noticed that the
> following function results in strange behaviour.
> >
> > void test_all_gather() {
> >
> > struct _TEST_ALL_GATHER {
> > int node;
> > };
> >
> > int ierr, size, rank;
> > ierr = MPI_Comm_size(MPI_COMM_WORLD, );
> > ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
> >
> > struct _TEST_ALL_GATHER local;
> > struct _TEST_ALL_GATHER *gathered;
> >
> > gathered = (struct _TEST_ALL_GATHER*) malloc(size *
> sizeof(*gathered));
> >
> > local.node = rank;
> >
> > MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> > gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> MPI_COMM_WORLD);
> >
> > int i;
> > for (i = 0; i < numnodes; ++i) {
> > (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
> > }
> >
> > FREE(gathered);
> > }
> >
> > At one point, this function printed the following:
> > gathered[0].node = 2
> > gathered[1].node = 3
> > gathered[2].node = 2
> > gathered[3].node = 3
> > gathered[4].node = 4
> > gathered[5].node = 5
> >
> > Can anyone suggest a place to start looking into why this might be
> happening? There is a section of the code that calls MPI_Comm_split, but I
> am not sure if that is related...
> >
> > Running on Ubuntu 11.10 and a summary of ompi_info:
> > Package: Open MPI buildd@allspice Distribution
> > Open MPI: 1.4.3
> > Open MPI SVN revision: r23834
> > Open MPI release date: Oct 05, 2010
> > Open RTE: 1.4.3
> > Open RTE SVN revision: r23834
> > Open RTE release date: Oct 05, 2010
> > OPAL: 1.4.3
> > OPAL SVN 

Re: [OMPI users] MPI_Allgather problem

2012-01-26 Thread Jeff Squyres
What version did you upgrade to?  (we don't control the Ubuntu packaging)

I see a bullet in the soon-to-be-released 1.4.5 release notes:

- Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
  Senin for reporting the problem.

But that would be surprising if this is what fixed your issue, especially since 
it's not released yet.  :-)



On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:

> As of two days ago, this problem has disappeared and the tests that I had 
> written and run each night are now passing. Having looked through the update 
> log of my machine (Ubuntu 11.10) it appears as though I got a new version of 
> mpi-default-dev (0.6ubuntu1). I would like to understand this problem in more 
> detail -- is it possible to see what changed in this update?
> Thanks,
> Brett.
> 
> 
> 
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma  wrote:
> I guess your output is from different ranks.   YOu can add rank infor inside 
> print to tell like follows:
> 
> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i, gathered[i].node);
> 
> From my side, I did not see anything wrong from your code in Open MPI 1.4.3. 
> after I add rank, the output is
> rank 5: gathered[0].node = 0
> rank 5: gathered[1].node = 1
> rank 5: gathered[2].node = 2
> rank 5: gathered[3].node = 3
> rank 5: gathered[4].node = 4
> rank 5: gathered[5].node = 5
> rank 3: gathered[0].node = 0
> rank 3: gathered[1].node = 1
> rank 3: gathered[2].node = 2
> rank 3: gathered[3].node = 3
> rank 3: gathered[4].node = 4
> rank 3: gathered[5].node = 5
> rank 1: gathered[0].node = 0
> rank 1: gathered[1].node = 1
> rank 1: gathered[2].node = 2
> rank 1: gathered[3].node = 3
> rank 1: gathered[4].node = 4
> rank 1: gathered[5].node = 5
> rank 0: gathered[0].node = 0
> rank 0: gathered[1].node = 1
> rank 0: gathered[2].node = 2
> rank 0: gathered[3].node = 3
> rank 0: gathered[4].node = 4
> rank 0: gathered[5].node = 5
> rank 4: gathered[0].node = 0
> rank 4: gathered[1].node = 1
> rank 4: gathered[2].node = 2
> rank 4: gathered[3].node = 3
> rank 4: gathered[4].node = 4
> rank 4: gathered[5].node = 5
> rank 2: gathered[0].node = 0
> rank 2: gathered[1].node = 1
> rank 2: gathered[2].node = 2
> rank 2: gathered[3].node = 3
> rank 2: gathered[4].node = 4
> rank 2: gathered[5].node = 5
> 
> Is that what you expected? 
> 
> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully  wrote:
> Dear all,
> 
> I have not used OpenMPI much before, but am maintaining a large legacy 
> application. We noticed a bug to do with a call to MPI_Allgather as 
> summarised in this post to Stackoverflow: 
> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
> 
> In the process of looking further into the problem, I noticed that the 
> following function results in strange behaviour.
> 
> void test_all_gather() {
> 
> struct _TEST_ALL_GATHER {
> int node;
> };
> 
> int ierr, size, rank;
> ierr = MPI_Comm_size(MPI_COMM_WORLD, );
> ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
> 
> struct _TEST_ALL_GATHER local;
> struct _TEST_ALL_GATHER *gathered;
> 
> gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));
> 
> local.node = rank;
> 
> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, 
> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, MPI_COMM_WORLD);
> 
> int i;
> for (i = 0; i < numnodes; ++i) {
> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
> }
> 
> FREE(gathered);
> }
> 
> At one point, this function printed the following:
> gathered[0].node = 2
> gathered[1].node = 3
> gathered[2].node = 2
> gathered[3].node = 3
> gathered[4].node = 4
> gathered[5].node = 5
> 
> Can anyone suggest a place to start looking into why this might be happening? 
> There is a section of the code that calls MPI_Comm_split, but I am not sure 
> if that is related...
> 
> Running on Ubuntu 11.10 and a summary of ompi_info:
> Package: Open MPI buildd@allspice Distribution
> Open MPI: 1.4.3
> Open MPI SVN revision: r23834
> Open MPI release date: Oct 05, 2010
> Open RTE: 1.4.3
> Open RTE SVN revision: r23834
> Open RTE release date: Oct 05, 2010
> OPAL: 1.4.3
> OPAL SVN revision: r23834
> OPAL release date: Oct 05, 2010
> Ident string: 1.4.3
> Prefix: /usr
> Configured architecture: x86_64-pc-linux-gnu
> Configure host: allspice
> Configured by: buildd
> 
> Thanks!
> Brett
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> | Teng Ma  Univ. of Tennessee |
> | t...@cs.utk.eduKnoxville, TN |
> | http://web.eecs.utk.edu/~tma/   |
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> 

Re: [OMPI users] MPI_Allgather problem

2012-01-26 Thread Brett Tully
As of two days ago, this problem has disappeared and the tests that I had
written and run each night are now passing. Having looked through the
update log of my machine (Ubuntu 11.10) it appears as though I got a new
version of mpi-default-dev (0.6ubuntu1). I would like to understand this
problem in more detail -- is it possible to see what changed in this update?
Thanks,
Brett.


>
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma  wrote:
>
>> I guess your output is from different ranks.   YOu can add rank infor
>> inside print to tell like follows:
>>
>> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
>> gathered[i].node);
>>
>> From my side, I did not see anything wrong from your code in Open MPI
>> 1.4.3. after I add rank, the output is
>> rank 5: gathered[0].node = 0
>> rank 5: gathered[1].node = 1
>> rank 5: gathered[2].node = 2
>> rank 5: gathered[3].node = 3
>> rank 5: gathered[4].node = 4
>> rank 5: gathered[5].node = 5
>> rank 3: gathered[0].node = 0
>> rank 3: gathered[1].node = 1
>> rank 3: gathered[2].node = 2
>> rank 3: gathered[3].node = 3
>> rank 3: gathered[4].node = 4
>> rank 3: gathered[5].node = 5
>> rank 1: gathered[0].node = 0
>> rank 1: gathered[1].node = 1
>> rank 1: gathered[2].node = 2
>> rank 1: gathered[3].node = 3
>> rank 1: gathered[4].node = 4
>> rank 1: gathered[5].node = 5
>> rank 0: gathered[0].node = 0
>> rank 0: gathered[1].node = 1
>> rank 0: gathered[2].node = 2
>> rank 0: gathered[3].node = 3
>> rank 0: gathered[4].node = 4
>> rank 0: gathered[5].node = 5
>> rank 4: gathered[0].node = 0
>> rank 4: gathered[1].node = 1
>> rank 4: gathered[2].node = 2
>> rank 4: gathered[3].node = 3
>> rank 4: gathered[4].node = 4
>> rank 4: gathered[5].node = 5
>> rank 2: gathered[0].node = 0
>> rank 2: gathered[1].node = 1
>> rank 2: gathered[2].node = 2
>> rank 2: gathered[3].node = 3
>> rank 2: gathered[4].node = 4
>> rank 2: gathered[5].node = 5
>>
>> Is that what you expected?
>>
>> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully wrote:
>>
>>> Dear all,
>>>
>>> I have not used OpenMPI much before, but am maintaining a large legacy
>>> application. We noticed a bug to do with a call to MPI_Allgather as
>>> summarised in this post to Stackoverflow:
>>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>>>
>>> In the process of looking further into the problem, I noticed that the
>>> following function results in strange behaviour.
>>>
>>> void test_all_gather() {
>>>
>>> struct _TEST_ALL_GATHER {
>>> int node;
>>> };
>>>
>>> int ierr, size, rank;
>>> ierr = MPI_Comm_size(MPI_COMM_WORLD, );
>>> ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
>>>
>>> struct _TEST_ALL_GATHER local;
>>> struct _TEST_ALL_GATHER *gathered;
>>>
>>> gathered = (struct _TEST_ALL_GATHER*) malloc(size *
>>> sizeof(*gathered));
>>>
>>> local.node = rank;
>>>
>>> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>>> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>>> MPI_COMM_WORLD);
>>>
>>> int i;
>>> for (i = 0; i < numnodes; ++i) {
>>> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
>>> }
>>>
>>> FREE(gathered);
>>> }
>>>
>>> At one point, this function printed the following:
>>> gathered[0].node = 2
>>> gathered[1].node = 3
>>> gathered[2].node = 2
>>> gathered[3].node = 3
>>> gathered[4].node = 4
>>> gathered[5].node = 5
>>>
>>> Can anyone suggest a place to start looking into why this might be
>>> happening? There is a section of the code that calls MPI_Comm_split, but I
>>> am not sure if that is related...
>>>
>>> Running on Ubuntu 11.10 and a summary of ompi_info:
>>> Package: Open MPI buildd@allspice Distribution
>>> Open MPI: 1.4.3
>>> Open MPI SVN revision: r23834
>>> Open MPI release date: Oct 05, 2010
>>> Open RTE: 1.4.3
>>> Open RTE SVN revision: r23834
>>> Open RTE release date: Oct 05, 2010
>>> OPAL: 1.4.3
>>> OPAL SVN revision: r23834
>>> OPAL release date: Oct 05, 2010
>>> Ident string: 1.4.3
>>> Prefix: /usr
>>> Configured architecture: x86_64-pc-linux-gnu
>>> Configure host: allspice
>>> Configured by: buildd
>>>
>>> Thanks!
>>> Brett
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> | Teng Ma  Univ. of Tennessee |
>> | t...@cs.utk.eduKnoxville, TN |
>> | http://web.eecs.utk.edu/~tma/   |
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] MPI_Allgather problem

2011-12-09 Thread teng ma
I guess your output is from different ranks.   YOu can add rank infor
inside print to tell like follows:

(void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
gathered[i].node);

>From my side, I did not see anything wrong from your code in Open MPI
1.4.3. after I add rank, the output is
rank 5: gathered[0].node = 0
rank 5: gathered[1].node = 1
rank 5: gathered[2].node = 2
rank 5: gathered[3].node = 3
rank 5: gathered[4].node = 4
rank 5: gathered[5].node = 5
rank 3: gathered[0].node = 0
rank 3: gathered[1].node = 1
rank 3: gathered[2].node = 2
rank 3: gathered[3].node = 3
rank 3: gathered[4].node = 4
rank 3: gathered[5].node = 5
rank 1: gathered[0].node = 0
rank 1: gathered[1].node = 1
rank 1: gathered[2].node = 2
rank 1: gathered[3].node = 3
rank 1: gathered[4].node = 4
rank 1: gathered[5].node = 5
rank 0: gathered[0].node = 0
rank 0: gathered[1].node = 1
rank 0: gathered[2].node = 2
rank 0: gathered[3].node = 3
rank 0: gathered[4].node = 4
rank 0: gathered[5].node = 5
rank 4: gathered[0].node = 0
rank 4: gathered[1].node = 1
rank 4: gathered[2].node = 2
rank 4: gathered[3].node = 3
rank 4: gathered[4].node = 4
rank 4: gathered[5].node = 5
rank 2: gathered[0].node = 0
rank 2: gathered[1].node = 1
rank 2: gathered[2].node = 2
rank 2: gathered[3].node = 3
rank 2: gathered[4].node = 4
rank 2: gathered[5].node = 5

Is that what you expected?

On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully wrote:

> Dear all,
>
> I have not used OpenMPI much before, but am maintaining a large legacy
> application. We noticed a bug to do with a call to MPI_Allgather as
> summarised in this post to Stackoverflow:
> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>
> In the process of looking further into the problem, I noticed that the
> following function results in strange behaviour.
>
> void test_all_gather() {
>
> struct _TEST_ALL_GATHER {
> int node;
> };
>
> int ierr, size, rank;
> ierr = MPI_Comm_size(MPI_COMM_WORLD, );
> ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
>
> struct _TEST_ALL_GATHER local;
> struct _TEST_ALL_GATHER *gathered;
>
> gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));
>
> local.node = rank;
>
> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> MPI_COMM_WORLD);
>
> int i;
> for (i = 0; i < numnodes; ++i) {
> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
> }
>
> FREE(gathered);
> }
>
> At one point, this function printed the following:
> gathered[0].node = 2
> gathered[1].node = 3
> gathered[2].node = 2
> gathered[3].node = 3
> gathered[4].node = 4
> gathered[5].node = 5
>
> Can anyone suggest a place to start looking into why this might be
> happening? There is a section of the code that calls MPI_Comm_split, but I
> am not sure if that is related...
>
> Running on Ubuntu 11.10 and a summary of ompi_info:
> Package: Open MPI buildd@allspice Distribution
> Open MPI: 1.4.3
> Open MPI SVN revision: r23834
> Open MPI release date: Oct 05, 2010
> Open RTE: 1.4.3
> Open RTE SVN revision: r23834
> Open RTE release date: Oct 05, 2010
> OPAL: 1.4.3
> OPAL SVN revision: r23834
> OPAL release date: Oct 05, 2010
> Ident string: 1.4.3
> Prefix: /usr
> Configured architecture: x86_64-pc-linux-gnu
> Configure host: allspice
> Configured by: buildd
>
> Thanks!
> Brett
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
| Teng Ma  Univ. of Tennessee |
| t...@cs.utk.eduKnoxville, TN |
| http://web.eecs.utk.edu/~tma/   |


[OMPI users] MPI_Allgather problem

2011-12-09 Thread Brett Tully
Dear all,

I have not used OpenMPI much before, but am maintaining a large legacy
application. We noticed a bug to do with a call to MPI_Allgather as
summarised in this post to Stackoverflow:
http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results

In the process of looking further into the problem, I noticed that the
following function results in strange behaviour.

void test_all_gather() {

struct _TEST_ALL_GATHER {
int node;
};

int ierr, size, rank;
ierr = MPI_Comm_size(MPI_COMM_WORLD, );
ierr = MPI_Comm_rank(MPI_COMM_WORLD, );

struct _TEST_ALL_GATHER local;
struct _TEST_ALL_GATHER *gathered;

gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));

local.node = rank;

MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
MPI_COMM_WORLD);

int i;
for (i = 0; i < numnodes; ++i) {
(void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
}

FREE(gathered);
}

At one point, this function printed the following:
gathered[0].node = 2
gathered[1].node = 3
gathered[2].node = 2
gathered[3].node = 3
gathered[4].node = 4
gathered[5].node = 5

Can anyone suggest a place to start looking into why this might be
happening? There is a section of the code that calls MPI_Comm_split, but I
am not sure if that is related...

Running on Ubuntu 11.10 and a summary of ompi_info:
Package: Open MPI buildd@allspice Distribution
Open MPI: 1.4.3
Open MPI SVN revision: r23834
Open MPI release date: Oct 05, 2010
Open RTE: 1.4.3
Open RTE SVN revision: r23834
Open RTE release date: Oct 05, 2010
OPAL: 1.4.3
OPAL SVN revision: r23834
OPAL release date: Oct 05, 2010
Ident string: 1.4.3
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: allspice
Configured by: buildd

Thanks!
Brett