BTW, the changes prior to r26496 failed some of the MTT test runs on
several systems. So if the current implementation is deemed not
"correct" I suspect we will need to figure out if there are changes to
the tests that need to be done.
See http://www.open-mpi.org/mtt/index.php?do_redir=2066
On May 24, 2012, at 23:48 , Dave Goodell wrote:
> On May 24, 2012, at 10:34 PM CDT, George Bosilca wrote:
>
>> On May 24, 2012, at 23:18, Dave Goodell wrote:
>>
>>> So I take back my prior "right". Upon further inspection of the text and
>>> the MPICH2 code I believe it
On May 24, 2012, at 10:34 PM CDT, George Bosilca wrote:
> On May 24, 2012, at 23:18, Dave Goodell wrote:
>
>> So I take back my prior "right". Upon further inspection of the text and
>> the MPICH2 code I believe it to be true that the number of the elements in
>> the
On May 24, 2012, at 23:18, Dave Goodell wrote:
> On May 24, 2012, at 8:13 PM CDT, Jeff Squyres wrote:
>
>> On May 24, 2012, at 11:57 AM, Lisandro Dalcin wrote:
>>
>>> The standard says this:
>>>
>>> "Within each group, all processes provide the same recvcounts
>>>
On May 24, 2012, at 8:13 PM CDT, Jeff Squyres wrote:
> On May 24, 2012, at 11:57 AM, Lisandro Dalcin wrote:
>
>> The standard says this:
>>
>> "Within each group, all processes provide the same recvcounts
>> argument, and provide input vectors of sum_i^n recvcounts[i] elements
>> stored in the
On May 24, 2012, at 10:57 AM CDT, Lisandro Dalcin wrote:
> On 24 May 2012 12:40, George Bosilca wrote:
>
>> I don't see much difference with the other collective. The generic behavior
>> is that you apply the operation on the local group but the result is moved
>> into
On May 24, 2012, at 11:57 AM, Lisandro Dalcin wrote:
> The standard says this:
>
> "Within each group, all processes provide the same recvcounts
> argument, and provide input vectors of sum_i^n recvcounts[i] elements
> stored in the send buffers, where n is the size of the group"
>
> So, I
On 24 May 2012 12:40, George Bosilca wrote:
> On May 24, 2012, at 11:22 , Jeff Squyres wrote:
>
>> On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote:
>>
So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER
all had the issue. Now fixed on the
On May 24, 2012, at 11:22 , Jeff Squyres wrote:
> On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote:
>
>>> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER
>>> all had the issue. Now fixed on the trunk, and will be in 1.6.1.
>>
>> Please be careful with
On May 24, 2012, at 10:22 AM CDT, Jeff Squyres wrote:
> I read it to be: reduce the data in the local group, scatter the results to
> the remote group.
>
> As such, the reduce COUNT is sum(recvcounts), and is used for the reduction
> in the local group. Then use recvcounts to scatter it to
On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote:
>> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER all
>> had the issue. Now fixed on the trunk, and will be in 1.6.1.
>
> Please be careful with REDUCE_SCATTER[_BLOCK] . My understanding of
> the MPI standard is
On 24 May 2012 10:28, Jeff Squyres wrote:
> On May 24, 2012, at 6:53 AM, Jonathan Dursi wrote:
>
>> It seems like this might also be an issue for gatherv and reduce_scatter as
>> well.
>
>
> Gah. I spot-checked a few of these before my first commit, but didn't see
> these.
This bug had the opportunity to appear in all collectives supporting
intercommunicators where we check the receive buffer(s) consistency. In
addition to what Jeff fixed already, I fix it in ALLTOALLV, ALLTOALLW and
GATHER.
george.
On May 24, 2012, at 09:37 , Jeff Squyres wrote:
> On May
On May 24, 2012, at 9:28 AM, Jeff Squyres wrote:
> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER all
> had the issue. Now fixed on the trunk, and will be in 1.6.1.
I forgot to mention -- this issue exists waaay back in the Open MPI code base.
I spot-checked Open
On May 24, 2012, at 6:53 AM, Jonathan Dursi wrote:
> It seems like this might also be an issue for gatherv and reduce_scatter as
> well.
Gah. I spot-checked a few of these before my first commit, but didn't see
these.
So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER
It seems like this might also be an issue for gatherv and reduce_scatter
as well.
- Jonathan
--
Jonathan Dursi | SciNet, Compute/Calcul Canada | www.SciNetHPC.ca
Many thanks for trans-coding to C; this was a major help in debugging the issue.
Thankfully, it turned out to be a simple bug. OMPI's parameter checking for
MPI_ALLGATHERV was using the *local* group size when checking the recvcounts
parameter, where it really should have been using the
On 23 May 9:37PM, Jonathan Dursi wrote:
On the other hand, it works everywhere if I pad the rcounts array with
an extra valid value (0 or 1, or for that matter 783), or replace the
allgatherv with an allgather.
.. and it fails with 7 even where it worked (but succeeds with 8) if I
pad
In case it is helpful to those who may not have the Intel compilers, these
are the libraries against which the two executables of Lisandro's
allgather.c get linked:
with Intel compilers:
=
$ ldd a.out
Fails for me with 1.4.3 with gcc, but works with intel; works with 1.4.4
with gcc or intel; fails with 1.5.5 with either. Succeeds with intelmpi.
On the other hand, it works everywhere if I pad the rcounts array with
an extra valid value (0 or 1, or for that matter 783), or replace the
On Wed, 23 May 2012, Lisandro Dalcin wrote:
On 23 May 2012 19:04, Jeff Squyres wrote:
Thanks for all the info!
But still, can we get a copy of the test in C? That would make it
significantly easier for us to tell if there is a problem with Open MPI --
mainly because we
Thanks for all the info!
But still, can we get a copy of the test in C? That would make it
significantly easier for us to tell if there is a problem with Open MPI --
mainly because we don't know anything about the internals of mpi4py.
On May 23, 2012, at 5:43 PM, Bennet Fauber wrote:
>
Jeff,
Well, not really, since the test is written in python ;-)
The mpi4py source code is at
http://code.google.com/p/mpi4py/downloads/list
but I'm not sure what else I can provide, though.
I'm more the reporting middleman here. I'd be happy to try to connect you
and the
Thanks, Ralph,
On Wed, 23 May 2012, Ralph Castain wrote:
I don't honestly think many of us have any knowledge of mpi4py. Does
this test work with other MPIs?
The mpi4py developers have said they've never seen this using mpich2. I
have not been able to test that myself.
MPI_Allgather
Can you provide us with a C version of the test?
On May 23, 2012, at 4:52 PM, Bennet Fauber wrote:
> I've installed the latest mpi4py-1.3 on several systems, and there is a
> repeated bug when running
>
> $ mpirun -np 5 python test/runtests.py
>
> where it throws an error on mpigather
I don't honestly think many of us have any knowledge of mpi4py. Does this test
work with other MPIs?
MPI_Allgather seems to be passing our tests, so I suspect it is something in
the binding. If you can provide the actual test, I'm willing to take a look at
it.
On May 23, 2012, at 2:52 PM,
I've installed the latest mpi4py-1.3 on several systems, and there is a
repeated bug when running
$ mpirun -np 5 python test/runtests.py
where it throws an error on mpigather with openmpi-1.4.4 and hangs with
openmpi-1.3.
It runs to completion and passes all tests when run with -np
27 matches
Mail list logo