Re: [OMPI devel] Fwd: MPI_INPLACE problem

2006-09-27 Thread Jeff Squyres
Ah, I see the problem now.

I am working on a fix; many thanks for the report!  (Further updates will be
on http://svn.open-mpi.org/trac/ompi/ticket/430)


On 9/27/06 10:24 AM, "Lisandro Dalcin"  wrote:

> Here an example of the problems I have with MPI_INPLACE in OMPI.
> Hoping this can be useful. Perhaps the problem is not in OMPI sources,
> but in my particular build. I've configured with:
> 
> $ head -n 7 config.log | tail -n 1
>   $ ./configure --disable-dlopen --prefix /usr/local/openmpi/1.1.1
> 
> First I present a very simple program that gives right results with
> OMPI, next a small modification changing the sendcount argument, wich
> gives now wrong results.
> 
> Using MPICH2, both versions give the same, right result.
> 
> My environment:
> --
> 
> $ echo $PATH
> /usr/local/openmpi/1.1.1/bin:/usr/kerberos/bin:/usr/lib/ccache/bin:/usr/local/
> bin:/bin:/usr/bin:/usr/X11R6/bin:.
> 
> $ echo $LD_LIBRARY_PATH
> /usr/local/openmpi/1.1.1/lib:/usr/local/openmpi/1.1.1/lib/openmpi
> 
> First test program
> -
> 
> This stupid program gathers the values of comm.rank at a root process
> with rank = com.size/2 and prints the gathered values.
> 
> $ cat gather.c
> #include 
> #include 
> 
> int main() {
>   int size, rank, root;
>   MPI_Init(NULL, NULL);
>   MPI_Comm_size(MPI_COMM_WORLD, );
>   MPI_Comm_rank(MPI_COMM_WORLD, );
>   root = size/2;
>   if (rank == root) {
> int i;
> int *buf = (int *) malloc(size * sizeof(int));
> for (i=0; i buf[rank] = rank;
> MPI_Gather(MPI_IN_PLACE, 1, MPI_DATATYPE_NULL,
>buf,  1, MPI_INT,
>root, MPI_COMM_WORLD);
> for (i=0; i printf("\n");
> free(buf);
>   } else {
> MPI_Gather(, 1, MPI_INT,
>NULL,  0, MPI_DATATYPE_NULL,
>root, MPI_COMM_WORLD);
>   }
>   MPI_Finalize();
> }
> 
> Run results:
> ---
> $ mpicc gather.c
> $ mpirun -n 5 a.out
> 0,1,2,3,4,
> 
> 
> Second test program
> ---
> 
> I only modify the sendcount argument at root process, which is the one
> passing sendbuf=MPI_INPLACE.
> 
> $ cat gather.c
> #include 
> #include 
> 
> int main() {
>   int size, rank, root;
>   MPI_Init(NULL, NULL);
>   MPI_Comm_size(MPI_COMM_WORLD, );
>   MPI_Comm_rank(MPI_COMM_WORLD, );
>   root = size/2;
>   if (rank == root) {
> int i;
> int *buf = (int *) malloc(size * sizeof(int));
> for (i=0; i buf[rank] = rank;
> MPI_Gather(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL,
>buf,  1, MPI_INT,
>root, MPI_COMM_WORLD);
> for (i=0; i printf("\n");
> free(buf);
>   } else {
> MPI_Gather(, 1, MPI_INT,
>NULL,  0, MPI_DATATYPE_NULL,
>root, MPI_COMM_WORLD);
>   }
>   MPI_Finalize();
> }
> 
> Run results:
> ---
> $ mpicc gather.c
> $ mpirun -n 5 a.out
> -1,-1,2,-1,-1,


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


[OMPI devel] ORTE Tutorial Materials

2006-09-27 Thread Ralph H Castain
Hello all

The materials for Thursday's session of the ORTE tutorial are now complete
and stable. I have posted them on the OpenRTE web site at:

http://www.open-rte.org/papers/tutorial-sept-2006/index.php

Both Powerpoint and PDF (printed two slides/page) formats are available.

I should have the materials for Friday on the web site later today - I may
not totally complete them until Thurs night (sigh).

The broadcast will be done using a shared desktop approach that involves a
virtual whiteboard. I haven't used this before, but I'm hoping it will all
work satisfactorily.

For those non-LANL folks attending in person, I remind you that cell phones
and computers (unless specifically pre-approved by the Lab) are not allowed
in the building. I will attempt to have hardcopies of the materials for you
to use (can't promise just yet).

Cheers
Ralph




Re: [OMPI devel] some possible bugs

2006-09-27 Thread Edgar Gabriel

Lisandro,

do you have an example for the extended collective operations tests 
which fail? It would help track down the problem. I had a quick look at 
our implementation but I can not find an obvious problem, so an example 
would be extremely helpful.


Thanks
Edgar



 - Some extended collective communications failed (not by raising
   errors, but instead aborting tracing to stdout) when using
   intercommunicators. Sometimes, the problems appeared when
   size(local_group) != size(remote_group). However, MPI_Barrier and
   MPI_Bcast worked well. I still could not get the reasons for those
   failures. I've found a similar problem in MPICH2 when configured
   with error-cheking enabled (they had a bug in some error-cheking
   macros, I reported this issue and next they told me I was right).






Re: [OMPI devel] btl_openib_max_btls

2006-09-27 Thread Nysal Jan

I was using the v1.2 branch. Gleb's fix has resolved the problem.
Thanks
--Nysal

On 9/25/06, Jeff Squyres  wrote:


What version of Open MPI are you using?

We had a bug with this on the trunk and [unreleased] v1.2 branch; it was
just fixed within the last few hours in both places.  It should not be a
problem in the released v1.1 series.

Can you confirm that you were using the OMPI trunk or the v1.2 branch?  If
you're seeing this in the v1.1 series, then we need to look at this a bit
closer...


On 9/22/06 1:25 PM, "Nysal Jan"  wrote:

> The ompi_info command shows the following description for
> "btl_openib_max_btls" parameter
> MCA btl: parameter "btl_openib_max_btls" (current value: "-1")  Maximum
> number of HCA ports to use (-1 = use all available, otherwise must be >=
1)
>
> Even though I specify "mpirun --mca btl_openib_max_btls 1 ."  2
openib
> btls are created(the HCA has 2 ports).
> When I try to run Open MPI across 2 nodes (one node has an HCA with 2
ports
> and the other has only one port). Both endpoints send the QP information
> over to the peer. Only one endpoint exists at the peer so it prints the
> following error message:
> [0,1,1][btl_openib_endpoint.c:706:mca_btl_openib_endpoint_recv] can't
find
> suitable endpoint for this peer
>
> [0,1,0][btl_openib_endpoint.c:913:mca_btl_openib_endpoint_connect] error
> posting receive errno says Operation now in progress
>
> [0,1,0][btl_openib_endpoint.c:737:mca_btl_openib_endpoint_recv] endpoint
> connect error: -1
>
> Is "btl_openib_max_btls" the maximum number of BTLs or maximum number of
> BTLs per port (which is what the current implementation "init_one_hca()"
> looks like)?
>
> -Nysal
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems