Cisco is no longer an IB vendor, but I seem to recall that these kinds of errors typically indicated a fabric problem. Have you run layer 0 and 1 diagnostics to ensure that the fabric is clean?

On Sep 11, 2009, at 8:09 AM, Rolf Vandevaart wrote:

Hi, how exactly do you run this to get this error?  I tried and it
worked for me.

burl-ct-x2200-16 50 =>mpirun -mca btl_openib_warn_default_gid_prefix 0
-mca btl self,sm,openib -np 2 -host burl-ct-x2200-16,burl-ct-x2200-17
-mca btl_openib_ib_timeout 16 a.out
I am 0 at 1252670691
I am 1 at 1252670559
I am 0 at 1252670692
I am 1 at 1252670559
  burl-ct-x2200-16 51 =>

Rolf

On 09/11/09 07:18, Ake Sandgren wrote:
> Hi!
>
> The following code shows a bad behaviour when running over openib.
>
> Openmpi: 1.3.3
> With openib it dies with "error polling HP CQ with status WORK REQUEST > FLUSHED ERROR status number 5 ", with tcp or shmem it works as expected.
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
> #include "mpi.h"
>
> int main(int argc, char *argv[])
> {
>     int          rank;
>     int          n;
>
>     MPI_Init( &argc, &argv );
>
>     MPI_Comm_rank( MPI_COMM_WORLD, &rank );
>
>     fprintf(stderr, "I am %d at %d\n", rank, time(NULL));
>     fflush(stderr);
>
>     n = 4;
>     MPI_Bcast(&n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD);
>     fprintf(stderr, "I am %d at %d\n", rank, time(NULL));
>     fflush(stderr);
>     if (rank == 0) {
>       sleep(60);
>     }
>     MPI_Barrier(MPI_COMM_WORLD);
>
>     MPI_Finalize( );
>     exit(0);
> }
>
> I know about the internal openmpi reason for it do behave as it does.
> But i think that it should be allowed to behave as it does.
>
> This example is a bit engineered but there are codes where a similar
> situation can occur, i.e. the Bcast sender doing lots of other work
> after the Bcast before the next MPI call. VASP is a candidate for this.
>


--

=========================
rolf.vandeva...@sun.com
781-442-3043
=========================
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com

Reply via email to