I have tested for the MPI_ABORT problem I was seeing and it appears
to be fixed in the trunk.
Michael
On Oct 28, 2006, at 8:45 AM, Jeff Squyres wrote:
Sorry for the delay on this -- is this still the case with the OMPI
trunk?
We think we finally have all the issues solved with MPI_ABORT on
On Sat, 2006-10-28 at 08:45 -0400, Jeff Squyres wrote:
> Sorry for the delay on this -- is this still the case with the OMPI
> trunk?
>
> We think we finally have all the issues solved with MPI_ABORT on the
> trunk.
>
Nah, it was a problem with overutilization, i.e. 4tasks on 2 cpus in one
n
Sorry for the delay on this -- is this still the case with the OMPI
trunk?
We think we finally have all the issues solved with MPI_ABORT on the
trunk.
On Oct 16, 2006, at 8:29 AM, Åke Sandgren wrote:
On Mon, 2006-10-16 at 10:13 +0200, Åke Sandgren wrote:
On Fri, 2006-10-06 at 00:04 -04
On Mon, 2006-10-16 at 10:13 +0200, Åke Sandgren wrote:
> On Fri, 2006-10-06 at 00:04 -0400, Jeff Squyres wrote:
> > On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
> >
> > > System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> > > Intel ifort 9.0.32 all tests with 4 processors (c
On Fri, 2006-10-06 at 00:04 -0400, Jeff Squyres wrote:
> On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
>
> > System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> > Intel ifort 9.0.32 all tests with 4 processors (comments below)
> >
> > OpenMPi 1.1.1 patched and OpenMPI 1.1.2 p
On Oct 6, 2006, at 12:04 AM, Jeff Squyres wrote:
On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
Intel ifort 9.0.32 all tests with 4 processors (comments below)
Good. Can you expand on what you mean by "slowed down"?
Bad
On Oct 5, 2006, at 4:41 PM, George Bosilca wrote:
Once you run the performance tests please let me know the outcome.
Ignoring the other issue I just posted here are timings for BLACS
1.1p3 Tester with OpenMPI & MPICH2 on two nodes of a dual-opteron
system running Debian Linux 3.1r3, compi
On 10/5/06 10:04 PM, "Jeff Squyres" wrote:
> On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
>
>> The final auxiliary test is for BLACS_ABORT.
>> Immediately after this message, all processes should be killed.
>> If processes survive the call, your BLACS_ABORT is incorrect.
>> {0,2}, pnum=2,
On 10/5/06 2:42 PM, "Michael Kluskens" wrote:
> System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> Intel ifort 9.0.32 all tests with 4 processors (comments below)
>
> OpenMPi 1.1.1 patched and OpenMPI 1.1.2 patched:
>C & F tests: no errors with default data set. F test
Thanks Michael.
The seg-fault is related to some orterun problem. I notice it
yesterday and we try to find a fix. For the rest I'm quite happy that
the BLACS problem was solved.
Thanks for your help,
george.
On Oct 5, 2006, at 2:42 PM, Michael Kluskens wrote:
On Oct 4, 2006, at 7:
On Oct 4, 2006, at 7:51 PM, George Bosilca wrote:
This is the correct patch (same as previous minus the debugging
statements).
On Oct 4, 2006, at 7:42 PM, George Bosilca wrote:
The problem was found and fixed. Until the patch get applied to the
1.1 and 1.2 branches please use the attached pa
This is the correct patch (same as previous minus the debugging
statements).
Thanks,
george.
ddt.patch
Description: Binary data
On Oct 4, 2006, at 7:42 PM, George Bosilca wrote:
The problem was found and fixed. Until the patch get applied to the
1.1 and 1.2 branches please use the
The problem was found and fixed. Until the patch get applied to the
1.1 and 1.2 branches please use the attached patch.
Thanks for you help for discovering and fixing this bug,
george.
ddt.patch
Description: Binary data
On Oct 4, 2006, at 5:32 PM, George Bosilca wrote:
That's just
That's just amazing. We pass all the trapezoidal tests but we fail
the general ones (rectangular matrix) if the leading dimension of the
matrix on the destination processor is greater than the leading
dimension on the sender. At least now I narrow down the place where
the error occur ...
OK, that was my 5 minutes hall of shame. Setting the verbosity level
in bt.dat to 6 give me enough information to know exactly the data-
type share. Now, I know how to fix things ...
george.
On Oct 4, 2006, at 4:35 PM, George Bosilca wrote:
I'm working on this bug. As far as I see the pat
I'm working on this bug. As far as I see the patch from the bug 365
do not help us here. However, on my 64 bits machines (not Opteron but
G5) I don't get the segfault. Anyway, I get the bad data transmission
for test #1 and #51. So far my main problem is that I cannot
reproduce these errors
On Oct 4, 2006, at 8:22 AM, Harald Forbert wrote:
The TRANSCOMM setting that we are using here and that I think is the
correct one is "-DUseMpi2" since OpenMPI implements the corresponding
mpi2 calls. You need a recent version of BLACS for this setting
to be available (1.1 with patch 3 should be
> Additional note on the the BLACS vs. OpenMPI 1.1.1 & 1.3 problems:
>
> The BLACS install program xtc_CsameF77 says to not use -DCsameF77
> with OpenMPI; however, because of an oversight I used it in my first
> tests -- for OpenMPI 1.1.1 the errors are the same with and without
> this set
Additional note on the the BLACS vs. OpenMPI 1.1.1 & 1.3 problems:
The BLACS install program xtc_CsameF77 says to not use -DCsameF77
with OpenMPI; however, because of an oversight I used it in my first
tests -- for OpenMPI 1.1.1 the errors are the same with and without
this setting; howeve
Thanks Michael -- I've updated ticket 356 with this info for v1.1, and
created ticket 464 for the trunk (v1.3) issue.
https://svn.open-mpi.org/trac/ompi/ticket/356
https://svn.open-mpi.org/trac/ompi/ticket/464
On 10/3/06 10:53 AM, "Michael Kluskens" wrote:
> Summary:
>
> OpenMPI 1.1.1 and 1.3a
Summary:
OpenMPI 1.1.1 and 1.3a1r11943 have different bugs with regards to
BLACS 1.1p3.
1.3 fails where 1.1.1 passes and vice-versus.
(1.1.1): Integer, real, double precision SDRV tests fail cases 1 &
51, then lots of errors until Integer SUM test then all tests pass.
(1.3): No errors un
21 matches
Mail list logo