I have tested for the MPI_ABORT problem I was seeing and it appears
to be fixed in the trunk.
Michael
On Oct 28, 2006, at 8:45 AM, Jeff Squyres wrote:
Sorry for the delay on this -- is this still the case with the OMPI
trunk?
We think we finally have all the issues solved with MPI_ABORT on
On Sat, 2006-10-28 at 08:45 -0400, Jeff Squyres wrote:
> Sorry for the delay on this -- is this still the case with the OMPI
> trunk?
>
> We think we finally have all the issues solved with MPI_ABORT on the
> trunk.
>
Nah, it was a problem with overutilization, i.e. 4tasks on 2 cpus in one
On Mon, 2006-10-16 at 10:13 +0200, Åke Sandgren wrote:
> On Fri, 2006-10-06 at 00:04 -0400, Jeff Squyres wrote:
> > On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
> >
> > > System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> > > Intel ifort 9.0.32 all tests
On 10/5/06 10:04 PM, "Jeff Squyres" wrote:
> On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
>
>> The final auxiliary test is for BLACS_ABORT.
>> Immediately after this message, all processes should be killed.
>> If processes survive the call, your
On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
> System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> Intel ifort 9.0.32 all tests with 4 processors (comments below)
>
> OpenMPi 1.1.1 patched and OpenMPI 1.1.2 patched:
>C & F tests: no errors with default
Thanks Michael.
The seg-fault is related to some orterun problem. I notice it
yesterday and we try to find a fix. For the rest I'm quite happy that
the BLACS problem was solved.
Thanks for your help,
george.
On Oct 5, 2006, at 2:42 PM, Michael Kluskens wrote:
On Oct 4, 2006, at
On Oct 4, 2006, at 7:51 PM, George Bosilca wrote:
This is the correct patch (same as previous minus the debugging
statements).
On Oct 4, 2006, at 7:42 PM, George Bosilca wrote:
The problem was found and fixed. Until the patch get applied to the
1.1 and 1.2 branches please use the attached
The problem was found and fixed. Until the patch get applied to the
1.1 and 1.2 branches please use the attached patch.
Thanks for you help for discovering and fixing this bug,
george.
ddt.patch
Description: Binary data
On Oct 4, 2006, at 5:32 PM, George Bosilca wrote:
That's just
That's just amazing. We pass all the trapezoidal tests but we fail
the general ones (rectangular matrix) if the leading dimension of the
matrix on the destination processor is greater than the leading
dimension on the sender. At least now I narrow down the place where
the error occur ...
OK, that was my 5 minutes hall of shame. Setting the verbosity level
in bt.dat to 6 give me enough information to know exactly the data-
type share. Now, I know how to fix things ...
george.
On Oct 4, 2006, at 4:35 PM, George Bosilca wrote:
I'm working on this bug. As far as I see the
I'm working on this bug. As far as I see the patch from the bug 365
do not help us here. However, on my 64 bits machines (not Opteron but
G5) I don't get the segfault. Anyway, I get the bad data transmission
for test #1 and #51. So far my main problem is that I cannot
reproduce these
On Oct 4, 2006, at 8:22 AM, Harald Forbert wrote:
The TRANSCOMM setting that we are using here and that I think is the
correct one is "-DUseMpi2" since OpenMPI implements the corresponding
mpi2 calls. You need a recent version of BLACS for this setting
to be available (1.1 with patch 3 should
> Additional note on the the BLACS vs. OpenMPI 1.1.1 & 1.3 problems:
>
> The BLACS install program xtc_CsameF77 says to not use -DCsameF77
> with OpenMPI; however, because of an oversight I used it in my first
> tests -- for OpenMPI 1.1.1 the errors are the same with and without
> this
Additional note on the the BLACS vs. OpenMPI 1.1.1 & 1.3 problems:
The BLACS install program xtc_CsameF77 says to not use -DCsameF77
with OpenMPI; however, because of an oversight I used it in my first
tests -- for OpenMPI 1.1.1 the errors are the same with and without
this setting;
Thanks Michael -- I've updated ticket 356 with this info for v1.1, and
created ticket 464 for the trunk (v1.3) issue.
https://svn.open-mpi.org/trac/ompi/ticket/356
https://svn.open-mpi.org/trac/ompi/ticket/464
On 10/3/06 10:53 AM, "Michael Kluskens" wrote:
> Summary:
>
>
Summary:
OpenMPI 1.1.1 and 1.3a1r11943 have different bugs with regards to
BLACS 1.1p3.
1.3 fails where 1.1.1 passes and vice-versus.
(1.1.1): Integer, real, double precision SDRV tests fail cases 1 &
51, then lots of errors until Integer SUM test then all tests pass.
(1.3): No errors
16 matches
Mail list logo