Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread TERRY DONTJE
BTW, the changes prior to r26496 failed some of the MTT test runs on several systems. So if the current implementation is deemed not "correct" I suspect we will need to figure out if there are changes to the tests that need to be done. See http://www.open-mpi.org/mtt/index.php?do_redir=2066

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread George Bosilca
On May 24, 2012, at 23:48 , Dave Goodell wrote: > On May 24, 2012, at 10:34 PM CDT, George Bosilca wrote: > >> On May 24, 2012, at 23:18, Dave Goodell wrote: >> >>> So I take back my prior "right". Upon further inspection of the text and >>> the MPICH2 code I believe it

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread Dave Goodell
On May 24, 2012, at 10:34 PM CDT, George Bosilca wrote: > On May 24, 2012, at 23:18, Dave Goodell wrote: > >> So I take back my prior "right". Upon further inspection of the text and >> the MPICH2 code I believe it to be true that the number of the elements in >> the

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread George Bosilca
On May 24, 2012, at 23:18, Dave Goodell wrote: > On May 24, 2012, at 8:13 PM CDT, Jeff Squyres wrote: > >> On May 24, 2012, at 11:57 AM, Lisandro Dalcin wrote: >> >>> The standard says this: >>> >>> "Within each group, all processes provide the same recvcounts >>>

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread Dave Goodell
On May 24, 2012, at 8:13 PM CDT, Jeff Squyres wrote: > On May 24, 2012, at 11:57 AM, Lisandro Dalcin wrote: > >> The standard says this: >> >> "Within each group, all processes provide the same recvcounts >> argument, and provide input vectors of sum_i^n recvcounts[i] elements >> stored in the

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread Dave Goodell
On May 24, 2012, at 10:57 AM CDT, Lisandro Dalcin wrote: > On 24 May 2012 12:40, George Bosilca wrote: > >> I don't see much difference with the other collective. The generic behavior >> is that you apply the operation on the local group but the result is moved >> into

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Jeff Squyres
On May 24, 2012, at 11:57 AM, Lisandro Dalcin wrote: > The standard says this: > > "Within each group, all processes provide the same recvcounts > argument, and provide input vectors of sum_i^n recvcounts[i] elements > stored in the send buffers, where n is the size of the group" > > So, I

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Lisandro Dalcin
On 24 May 2012 12:40, George Bosilca wrote: > On May 24, 2012, at 11:22 , Jeff Squyres wrote: > >> On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote: >> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER all had the issue.  Now fixed on the

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread George Bosilca
On May 24, 2012, at 11:22 , Jeff Squyres wrote: > On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote: > >>> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER >>> all had the issue. Now fixed on the trunk, and will be in 1.6.1. >> >> Please be careful with

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Dave Goodell
On May 24, 2012, at 10:22 AM CDT, Jeff Squyres wrote: > I read it to be: reduce the data in the local group, scatter the results to > the remote group. > > As such, the reduce COUNT is sum(recvcounts), and is used for the reduction > in the local group. Then use recvcounts to scatter it to

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Jeff Squyres
On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote: >> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER all >> had the issue. Now fixed on the trunk, and will be in 1.6.1. > > Please be careful with REDUCE_SCATTER[_BLOCK] . My understanding of > the MPI standard is

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Lisandro Dalcin
On 24 May 2012 10:28, Jeff Squyres wrote: > On May 24, 2012, at 6:53 AM, Jonathan Dursi wrote: > >> It seems like this might also be an issue for gatherv and reduce_scatter as >> well. > > > Gah.  I spot-checked a few of these before my first commit, but didn't see > these.

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread George Bosilca
This bug had the opportunity to appear in all collectives supporting intercommunicators where we check the receive buffer(s) consistency. In addition to what Jeff fixed already, I fix it in ALLTOALLV, ALLTOALLW and GATHER. george. On May 24, 2012, at 09:37 , Jeff Squyres wrote: > On May

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Jeff Squyres
On May 24, 2012, at 9:28 AM, Jeff Squyres wrote: > So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER all > had the issue. Now fixed on the trunk, and will be in 1.6.1. I forgot to mention -- this issue exists waaay back in the Open MPI code base. I spot-checked Open

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Jeff Squyres
On May 24, 2012, at 6:53 AM, Jonathan Dursi wrote: > It seems like this might also be an issue for gatherv and reduce_scatter as > well. Gah. I spot-checked a few of these before my first commit, but didn't see these. So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Jonathan Dursi
It seems like this might also be an issue for gatherv and reduce_scatter as well. - Jonathan -- Jonathan Dursi | SciNet, Compute/Calcul Canada | www.SciNetHPC.ca

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-24 Thread Jeff Squyres
Many thanks for trans-coding to C; this was a major help in debugging the issue. Thankfully, it turned out to be a simple bug. OMPI's parameter checking for MPI_ALLGATHERV was using the *local* group size when checking the recvcounts parameter, where it really should have been using the

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Jonathan Dursi
On 23 May 9:37PM, Jonathan Dursi wrote: On the other hand, it works everywhere if I pad the rcounts array with an extra valid value (0 or 1, or for that matter 783), or replace the allgatherv with an allgather. .. and it fails with 7 even where it worked (but succeeds with 8) if I pad

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Bennet Fauber
In case it is helpful to those who may not have the Intel compilers, these are the libraries against which the two executables of Lisandro's allgather.c get linked: with Intel compilers: = $ ldd a.out

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Jonathan Dursi
Fails for me with 1.4.3 with gcc, but works with intel; works with 1.4.4 with gcc or intel; fails with 1.5.5 with either. Succeeds with intelmpi. On the other hand, it works everywhere if I pad the rcounts array with an extra valid value (0 or 1, or for that matter 783), or replace the

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Bennet Fauber
On Wed, 23 May 2012, Lisandro Dalcin wrote: On 23 May 2012 19:04, Jeff Squyres wrote: Thanks for all the info! But still, can we get a copy of the test in C?  That would make it significantly easier for us to tell if there is a problem with Open MPI -- mainly because we

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Jeff Squyres
Thanks for all the info! But still, can we get a copy of the test in C? That would make it significantly easier for us to tell if there is a problem with Open MPI -- mainly because we don't know anything about the internals of mpi4py. On May 23, 2012, at 5:43 PM, Bennet Fauber wrote: >

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Bennet Fauber
Jeff, Well, not really, since the test is written in python ;-) The mpi4py source code is at http://code.google.com/p/mpi4py/downloads/list but I'm not sure what else I can provide, though. I'm more the reporting middleman here. I'd be happy to try to connect you and the

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Bennet Fauber
Thanks, Ralph, On Wed, 23 May 2012, Ralph Castain wrote: I don't honestly think many of us have any knowledge of mpi4py. Does this test work with other MPIs? The mpi4py developers have said they've never seen this using mpich2. I have not been able to test that myself. MPI_Allgather

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Jeff Squyres
Can you provide us with a C version of the test? On May 23, 2012, at 4:52 PM, Bennet Fauber wrote: > I've installed the latest mpi4py-1.3 on several systems, and there is a > repeated bug when running > > $ mpirun -np 5 python test/runtests.py > > where it throws an error on mpigather

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Ralph Castain
I don't honestly think many of us have any knowledge of mpi4py. Does this test work with other MPIs? MPI_Allgather seems to be passing our tests, so I suspect it is something in the binding. If you can provide the actual test, I'm willing to take a look at it. On May 23, 2012, at 2:52 PM,

[OMPI users] possible bug exercised by mpi4py

2012-05-23 Thread Bennet Fauber
I've installed the latest mpi4py-1.3 on several systems, and there is a repeated bug when running $ mpirun -np 5 python test/runtests.py where it throws an error on mpigather with openmpi-1.4.4 and hangs with openmpi-1.3. It runs to completion and passes all tests when run with -np