[OMPI devel] Broken master

2015-11-05 Thread Rolf vandeVaart
Hi Ralph:

Just an FYI that the following change broke the use of --host on master last 
night.


[rvandevaart@drossetti-ivy4 ompi-master-rolfv]$ git bisect bad
169c44258d5c98870872b77166390d4f9a81105e is the first bad commit
commit 169c44258d5c98870872b77166390d4f9a81105e
Author: Ralph Castain 
List-Post: devel@lists.open-mpi.org
Date:   Tue Nov 3 19:00:28 2015 -0800

Fix missing check


[rvandevaart@drossetti-ivy4 src]$ mpirun -host drossetti-ivy4 -np 2 
MPI_Isend_ator_c
[drossetti-ivy4:28764] *** Process received signal ***
[drossetti-ivy4:28764] Signal: Segmentation fault (11)
[drossetti-ivy4:28764] Signal code: Address not mapped (1)
[drossetti-ivy4:28764] Failing at address: 0x347976692d69
[drossetti-ivy4:28764] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7f633fa42710]
[drossetti-ivy4:28764] [ 1] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg/lib/libopen-pal.so.0(+0x7b1c2)[0x7f63409821c2]
[drossetti-ivy4:28764] [ 2] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg/lib/libopen-pal.so.0(opal_argv_split+0x25)[0x7f63409821fb]
[drossetti-ivy4:28764] [ 3] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg/lib/libopen-rte.so.0(orte_util_add_dash_host_nodes+0x143)[0x7f6340c82830]
[drossetti-ivy4:28764] [ 4] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg/lib/libopen-rte.so.0(orte_plm_base_setup_virtual_machine+0x1008)[0x7f634086]
[drossetti-ivy4:28764] [ 5] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg/lib/openmpi/mca_plm_rsh.so(+0x68b1)[0x7f633dc008b1]
[drossetti-ivy4:28764] [ 6] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7f63409a070c]
[drossetti-ivy4:28764] [ 7] mpirun[0x4050a1]
[drossetti-ivy4:28764] [ 8] mpirun[0x4034a4]
[drossetti-ivy4:28764] [ 9] 
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f633f6bdd1d]
[drossetti-ivy4:28764] [10] mpirun[0x4033c9]
[drossetti-ivy4:28764] *** End of error message ***
Segmentation fault
[rvandevaart@drossetti-ivy4 src]$


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Open MPI Weely Meetings

2015-11-03 Thread Rolf vandeVaart (via Doodle)
Hi there,

Rolf vandeVaart (rvandeva...@nvidia.com) invites you to participate in
the Doodle poll "Open MPI Weely Meetings."

Should we have Open MPI weekly meetings during SC15 and Thanksgiving
week? Let me know if you want to attend one or both of them.

Participate now
https://doodle.com/poll/bvfikckzpucgba8w?tmail=poll_invitecontact_participant_invitation_with_message=pollbtn

What is Doodle? Doodle is a web service that helps Rolf vandeVaart to
find a suitable date for meeting with a group of people. Learn more
about how Doodle works.
(https://doodle.com/features?tlink=checkOutLink=poll_invitecontact_participant_invitation_with_message)

--

You have received this e-mail because "Rolf vandeVaart" has invited
you to participate in the Doodle poll "Open MPI Weely Meetings."



Doodle is also available for iOS and Android.


Doodle AG, Werdstrasse 21, 8021 Zürich


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Rolf vandeVaart
The bfo was my creation many years ago.  Can we keep it around for a little 
longer?  If we blow it away, then we should probably clean up all the code I 
also have in the openib BTL for supporting failover.  There is also some 
configure code that would have to go as well.

Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
>Hjelm
>Sent: Wednesday, September 16, 2015 1:43 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>* PGP Signed by an unknown key
>
>
>Not sure. I give a +1 for blowing them away. We can bring them back later if
>needed.
>
>-Nathan
>
>On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>>As they don't even compile why are we keeping them around?
>>  George.
>>On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm 
>wrote:
>>
>>  iboffload and bfo are opal ignored by default. Neither exists in the
>>  release branch.
>>
>>  -Nathan
>>  On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>>  >While looking into a possible fix for this problem we should also
>>  cleanup
>>  >in the trunk the leftover from the OMPI_FREE_LIST.
>>  >$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>>  >./opal/mca/btl/usnic/btl_usnic_compat.h:161:
>>  > OMPI_FREE_LIST_GET_MT(list, (item))
>>  >./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
>>  >OMPI_FREE_LIST_GET_MT(_pml_base_recv_requests, item);
>>  \
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
>>  > OMPI_FREE_LIST_GET_MT(>tasks_free, item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
>>  > OMPI_FREE_LIST_GET_MT(task_list, item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
>>  > OMPI_FREE_LIST_GET_MT(>frags_free[qp_index], item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
>>  > OMPI_FREE_LIST_GET_MT(>frags_free[qp_index], item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
>>  > OMPI_FREE_LIST_GET_MT(>device-
>>frags_free[qp_index],
>>  item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
>>  > OMPI_FREE_LIST_GET_MT(>frags_free[qp_index], item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
>>  > OMPI_FREE_LIST_GET_MT(>collfrags_free, item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
>>  > OMPI_FREE_LIST_GET_MT(>ml_frags_free, item);
>>  >I wonder how these are even compiling ...
>>  >  George.
>>  >On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca
>>  
>>  >wrote:
>>  >
>>  >  Alexey,
>>  >  This is not necessarily the fix for all cases. Most of the
>>  internal uses
>>  >  of the free_list can easily accommodate to the fact that no more
>>  >  elements are available. Based on your description of the problem
>>  I would
>>  >  assume you encounter this problem once the
>>  >  MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular
>case
>>  the
>>  >  problem is that fact that we call OMPI_FREE_LIST_GET_MT and that
>>  the
>>  >  upper level is unable to correctly deal with the case where the
>>  returned
>>  >  item is NULL. In this particular case the real fix is to use the
>>  >  blocking version of the free_list accessor (similar to the case
>>  for
>>  >  send) OMPI_FREE_LIST_WAIT_MT.
>>  >  It is also possible that I misunderstood your problem. IF the
>>  solution
>>  >  above doesn't work can you describe exactly where the NULL return
>>  of the
>>  >  OMPI_FREE_LIST_GET_MT is creating an issue?
>>  >  George.
>>  >  On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>>  >   wrote:
>>  >
>>  >Hi all,
>>  >
>>  >We experimented with MPI+OpenMP hybrid application
>>  >(MPI_THREAD_MULTIPLE support level)  where several threads
>>  submits a
>>  >lot of MPI_Irecv() requests simultaneously and encountered an
>>  >intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>>  >MCA_PML_OB1_RECV_REQUEST_ALLOC()  because
>>  OMPI_FREE_LIST_GET_MT()
>>  > returned NULL.  Investigating this bug we found that sometimes
>>  the
>>  >thread calling ompi_free_list_grow()  don't have any free items
>>  in
>>  >LIFO list at exit because other threads  retrieved  all new
>>  items at
>>  >opal_atomic_lifo_pop()
>>  >
>>  >So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>>  >
>>  >
>>  >
>>  >#define OMPI_FREE_LIST_GET_MT(fl,
>>  >item)
>> 

[OMPI devel] Dual rail IB card problem

2015-08-31 Thread Rolf vandeVaart
There was a problem reported on the User's list about Open MPI always picking 
one Mellanox card when they were two in the machine.


http://www.open-mpi.org/community/lists/users/2015/08/27507.php


We dug a little deeper and I think this has to do with how hwloc is figuring 
out where one of the cards is located.  This verbose output (with some extra 
printfs) shows that it cannot figure out which NUMA node mlx4_0 is closest too. 
It can only determine it is located on HWLOC_OBJ_SYSTEM and therefore Open MPI 
assumes a distance of 0.0.  Because of this (smaller is better) Open MPI 
library always picks mlx4_0 for all sockets.  I am trying to figure out if this 
is a hwloc or Open MPI bug. Any thoughts on this?


[node1.local:05821] Checking distance for device=mlx4_1
[node1.local:05821] hwloc_distances->nbobjs=4
[node1.local:05821] hwloc_distances->latency[0]=1.00
[node1.local:05821] hwloc_distances->latency[1]=2.10
[node1.local:05821] hwloc_distances->latency[2]=2.10
[node1.local:05821] hwloc_distances->latency[3]=2.10
[node1.local:05821] hwloc_distances->latency[4]=2.10
[node1.local:05821] hwloc_distances->latency[5]=1.00
[node1.local:05821] hwloc_distances->latency[6]=2.10
[node1.local:05821] hwloc_distances->latency[7]=2.10
[node1.local:05821] ibv_obj->type = 4
[node1.local:05821] ibv_obj->logical_index=1
[node1.local:05821] my_obj->logical_index=0
[node1.local:05821] Proc is bound: distance=2.10

[node1.local:05821] Checking distance for device=mlx4_0
[node1.local:05821] hwloc_distances->nbobjs=4
[node1.local:05821] hwloc_distances->latency[0]=1.00
[node1.local:05821] hwloc_distances->latency[1]=2.10
[node1.local:05821] hwloc_distances->latency[2]=2.10
[node1.local:05821] hwloc_distances->latency[3]=2.10
[node1.local:05821] hwloc_distances->latency[4]=2.10
[node1.local:05821] hwloc_distances->latency[5]=1.00
[node1.local:05821] hwloc_distances->latency[6]=2.10
[node1.local:05821] hwloc_distances->latency[7]=2.10
[node1.local:05821] ibv_obj->type = 1 <-HWLOC_OBJ_MACHINE
[node1.local:05821] ibv_obj->type set to NULL
[node1.local:05821] Proc is bound: distance=0.00

[node1.local:05821] [rank=0] openib: skipping device mlx4_1; it is too far away
[node1.local:05821] [rank=0] openib: using port mlx4_0:1
[node1.local:05821] [rank=0] openib: using port mlx4_0:2


Machine (1024GB)
  NUMANode L#0 (P#0 256GB) + Socket L#0 + L3 L#0 (30MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 
(P#10)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 
(P#11)
  NUMANode L#1 (P#1 256GB)
Socket L#1 + L3 L#1 (30MB)
  L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 
(P#12)
  L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 
(P#13)
  L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 
(P#14)
  L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 
(P#15)
  L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 
(P#16)
  L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 
(P#17)
  L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 
(P#18)
  L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 
(P#19)
  L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 
(P#20)
  L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 
(P#21)
  L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 
(P#22)
  L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 
(P#23)
HostBridge L#5
  PCIBridge
PCI 15b3:1003
  Net L#7 "ib2"
  Net L#8 "ib3"
  OpenFabrics L#9 "mlx4_1"

  NUMANode L#2 (P#2 256GB) + Socket L#2 + L3 L#2 (30MB)
L2 L#24 (256KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 
(P#24)
L2 L#25 (256KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 
(P#25)
L2 L#26 (256KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 
(P#26)

Re: [OMPI devel] pgi and fortran in master

2015-08-26 Thread Rolf vandeVaart
I just tested this against the PGI 15.7 compiler and I see the same thing. It 
appears that we get this error on some of the files called out in 
ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h as not having an 
"easy-peasy" solution. All the other files compile just fine.  I checked the 
list of failing files against the list called out in the 
mpi-f-interfaces-bind.h file.  The mpi-f-interfaces-bind.h file calls out 32 
files, but here is the list of files that are failing which is a subset of them 
(20). Maybe that is a clue to what is going wrong.


  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_cart_create_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_cart_get_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_cart_map_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_cart_sub_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_comm_get_attr_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_comm_test_inter_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for 
mpi_dist_graph_create_adjacent_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_dist_graph_create_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for 
mpi_dist_graph_neighbors_count_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_graph_create_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_info_get_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_info_get_valuelen_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_intercomm_merge_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_op_commutative_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_op_create_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_type_get_attr_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_win_get_attr_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_win_test_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_file_get_atomicity_f08
  0 inform,   0 warnings,   9 severes, 0 fatal for mpi_file_set_atomicity_f08





From: devel  on behalf of Paul Hargrove 

Sent: Wednesday, August 26, 2015 6:50 AM
To: Open MPI Developers
Subject: [OMPI devel] pgi and fortran in master

It looks like current and past PGI fortran compilers that are happy with 1.8.x 
and 1.10.x are unhappy with master:

/bin/sh ../../../../libtool  --tag=FC   --mode=compile pgf90 -DHAVE_CONFIG_H 
-I. 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/ompi/mpi/fortran/use-mpi-f08
 -I../../../../opal/include -I../../../../ompi/include 
-I../../../../oshmem/include 
-I../../../../opal/mca/hwloc/hwloc1110/hwloc/include/private/autogen 
-I../../../../opal/mca/hwloc/hwloc1110/hwloc/include/hwloc/autogen 
-I../../../../ompi/mpiext/cuda/c   
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df
 -I../../../.. 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/opal/include
 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/orte/include
 -I../../../../orte/include 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/ompi/include
 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/oshmem/include
   
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/opal/mca/hwloc/hwloc1110/hwloc/include
 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/BLD/opal/mca/hwloc/hwloc1110/hwloc/include
 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/opal/mca/event/libevent2022/libevent
 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/opal/mca/event/libevent2022/libevent/include
 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/BLD/opal/mca/event/libevent2022/libevent/include
 -I../../../../ompi/include 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/ompi/include
 -I../../../../ompi/mpi/fortran/use-mpi-ignore-tkr -I. 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df
  -g -c -o cart_create_f08.lo 
/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/ompi/mpi/fortran/use-mpi-f08/cart_create_f08.F90
libtool: compile:  pgf90 -DHAVE_CONFIG_H -I. 
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2371-gea935df/ompi/mpi/fortran/use-mpi-f08
 -I../../../../opal/include -I../../../../ompi/include 
-I../../../../oshmem/include 
-I../../../../opal/mca/hwloc/hwloc1110/hwloc/include/private/autogen 

[OMPI devel] Open MPI 1.8.6 memory leak

2015-07-01 Thread Rolf vandeVaart
There have been two reports on the user list about memory leaks.  I have 
reproduced this leak with LAMMPS.  Note that this has nothing to do with 
CUDA-aware features.  The steps that Stefan has provided make it easy to 
reproduce.

Here are some more specific steps to reproduce derived from Stefan.

1. clone LAMMPS (git clone 
git://git.lammps.org/lammps-ro.git lammps)
2. cd src/, compile with openMPI 1.8.6.  To do this, set your path to Open MPI 
and type "make mpi"
3. run the example listed in lammps/examples/melt. To do this, first copy 
"lmp_mpi" from the src directory into the melt directory.  Then you need to 
modify the in.melt file so that it will run for a while.  Change "run 25" to 
"run25"
4. you can run by mpirun -np 2 lmp_mpi < in.melt

For reference, here is both 1.8.5 and 1.8.6 memory consumption.  1.8.5 stays 
very stable where 1.8.6 almost triples after 6 minutes of running.

Open MPI 1.8.5

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 59.0  0.0 329672 14584 pts/16   Rl   16:24   0:00 
./lmp_mpi_185_nocuda
3234126908 60.0  0.0 329672 14676 pts/16   Rl   16:24   0:00 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 98.3  0.0 329672 14932 pts/16   Rl   16:24   0:30 
./lmp_mpi_185_nocuda
3234126908 98.5  0.0 329672 14932 pts/16   Rl   16:24   0:30 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 98.9  0.0 329672 14960 pts/16   Rl   16:24   1:00 
./lmp_mpi_185_nocuda
3234126908 99.1  0.0 329672 14952 pts/16   Rl   16:24   1:00 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.1  0.0 329672 14960 pts/16   Rl   16:24   1:30 
./lmp_mpi_185_nocuda
3234126908 99.3  0.0 329672 14952 pts/16   Rl   16:24   1:30 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.2  0.0 329672 14960 pts/16   Rl   16:24   2:00 
./lmp_mpi_185_nocuda
3234126908 99.4  0.0 329672 14952 pts/16   Rl   16:24   2:00 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.3  0.0 329672 14960 pts/16   Rl   16:24   2:30 
./lmp_mpi_185_nocuda
3234126908 99.5  0.0 329672 14952 pts/16   Rl   16:24   2:30 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   2:59 
./lmp_mpi_185_nocuda
3234126908 99.5  0.0 329672 14952 pts/16   Rl   16:24   3:00 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:29 
./lmp_mpi_185_nocuda
3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   3:30 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:59 
./lmp_mpi_185_nocuda
3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:00 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   4:29 
./lmp_mpi_185_nocuda
3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:30 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   4:59 
./lmp_mpi_185_nocuda
3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:00 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:29 
./lmp_mpi_185_nocuda
3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:29 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:59 
./lmp_mpi_185_nocuda
3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:59 
./lmp_mpi_185_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND

Open MPI 1.8.6

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126755  0.0  0.0 330288 15368 pts/16   Rl   16:10   0:00 
./lmp_mpi_186_nocuda
3234126756  0.0  0.0 330284 15376 pts/16   Rl   16:10   0:00 
./lmp_mpi_186_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126755  100  0.0 409856 94976 pts/16   Rl   16:10   0:30 
./lmp_mpi_186_nocuda
3234126756  100  0.0 409848 94904 pts/16   Rl   16:10   0:30 
./lmp_mpi_186_nocuda
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
3234126755  100  0.1 489292 174320 pts/16  Rl   16:10   1:00 
./lmp_mpi_186_nocuda
3234126756  100  0.1 489288 174536 pts/16  Rl   16:10   1:00 

Re: [OMPI devel] smcuda higher exclusivity than anything else?

2015-05-20 Thread Rolf vandeVaart
A few observations.
1. The smcuda btl is only built when --with-cuda is part of the configure line 
so folks who do not do this will not even have this btl and will never run into 
this issue.
2. The priority of the smcuda btl has been higher since Open MPI 1.7.5 (March 
2014).  The idea is that if someone configured in CUDA-aware support, then they 
should not explicitly have to adjust the priority of the smcuda btl to get it 
selected.
3. This issue popped up because I made a change in the smcuda btl between 1.8.4 
and 1.8.5.  The change was that the btl_smcuda_max_send_size was bumped from 
32k to 128K.  This had a positive effect when sending and receiving GPU 
buffers.  I knew it would somewhat negatively affect host memory transfers, but 
figured that was a fair tradeoff.  Based on this report, that may not have been 
the right decision.  If one runs with Open MPI 1.8.5 and sets --mca 
btl_smcuda_max_send_size 32768, then one sees the same performance as 1.8.4 and 
similar to what one gets with the sm btl.

Interesting idea to disqualify this BTL if there are no GPUs on the machine. 

Aurelien, would that seem like a good solution?

Rolf

PS: Unfortunately, the max_send_size value is used for both GPU and CPU 
transfers, and the optimal value for each is different.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>Castain
>Sent: Wednesday, May 20, 2015 3:25 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] smcuda higher exclusivity than anything else?
>
>Rolf - this doesn’t sound right to me. I assume that smcuda is only supposed
>to build if cuda support was found/requested, but if there are no cuda
>adapters, then I would have thought it should disqualify itself.
>
>Can we do something about this for 1.8.6?
>
>> On May 20, 2015, at 11:14 AM, Aurélien Bouteiller 
>wrote:
>>
>> I was making basic performance measurements on our machine after
>installing 1.8.5, the performance were looking bad. It turns out that the
>smcuda btl has a higher exclusivity than both vader and sm, even on machines
>with no nvidia adapters. Is there a strong reason why the default exclusivity 
>is
>set so high ? Of course it can be easily fixed with a couple of mca options, 
>but
>unsuspecting users that “just run” will experience 1/3 overhead across the
>board for shared memory communication according to my measurements.
>>
>>
>> Side note: from my understanding of the smcuda component, performance
>> should be identical to the regular sm component (as long as no GPU
>operation are required). This is not the case, there is some performance
>penalty with smcuda compared to sm.
>>
>> Aurelien
>>
>> --
>> Aurélien Bouteiller ~~ https://icl.cs.utk.edu/~bouteill/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/05/17435.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: http://www.open-
>mpi.org/community/lists/devel/2015/05/17436.php

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] is anyone seeing this on their intel/inifinipath cluster?

2015-05-04 Thread Rolf vandeVaart
I am seeing it also on my cluster too.

[ivy4:27085] mca_base_component_repository_open: unable to open mca_btl_usnic: 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-uvm/64-dbg/lib/libmca_common_libfabric.so.0:
 undefined symbol: psmx_eq_open (ignored)
[ivy4:27085] mca_base_component_repository_open: unable to open mca_mtl_ofi: 
/ivylogin/home/rvandevaart/ompi-repos/ompi-master-uvm/64-dbg/lib/libmca_common_libfabric.so.0:
 undefined symbol: psmx_eq_open (ignored)

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Friday, May 01, 2015 6:08 PM
To: Open MPI Developers List
Subject: [OMPI devel] is anyone seeing this on their intel/inifinipath cluster?

Hi Folks,

I'm doing some work with master on a intel/infinipath system and there some odd 
undefined
symbols errors showing up:

/users/hpp/ompi_install/lib/libmca_common_libfabric.so.0: undefined symbol: 
psmx_eq_open

anyone else seeing this on their intel/infinipath system?

What's bizarre is that psmx_eq_open shouldn't be visible outside of the 
libfabric.so itself.  So
having libfabric internal symbols required in a ompi mca lib seems to be 
incorrect.

Howard


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] c_accumulate

2015-04-20 Thread Rolf vandeVaart
Hi Gilles:
Is your failure similar to this ticket?
https://github.com/open-mpi/ompi/issues/393
Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Monday, April 20, 2015 9:12 AM
To: Open MPI Developers
Subject: [OMPI devel] c_accumulate

Folks,

i (sometimes) get some failure with the c_accumulate test from the ibm test 
suite on one host with 4 mpi tasks

so far, i was only able to observe this on linux/sparc with the vader btl

here is a snippet of the test :


MPI_Win_create(, sizeOfInt, 1, MPI_INFO_NULL,

MPI_COMM_WORLD, );



  SendBuff = rank + 100;

  RecvBuff = 0;



  /* Accumulate to everyone, just for the heck of it */



  MPI_Win_fence(MPI_MODE_NOPRECEDE, Win);

  for (i = 0; i < size; ++i)

MPI_Accumulate(, 1, MPI_INT, i, 0, 1, MPI_INT, MPI_SUM, Win);

  MPI_Win_fence((MPI_MODE_NOPUT | MPI_MODE_NOSUCCEED), Win);

when the test fails, RecvBuff in (rank+100) instead of the accumulated value 
(100 * nprocs + (nprocs -1)*nprocs/2

i am not familiar with onesided operations nor MPI_Win_fence.
that being said, i found suspicious RecvBuff is initialized *after* 
MPI_Win_create ...

does MPI_Win_fence implies MPI_Barrier ?

if not, i guess RecvBuff should be initialized *before* MPI_Win_create.

makes sense ?

(and if it does make sense, then this issue is not related to sparc, and vader 
is not the root cause)

Cheers,

Gilles

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Problems with some IBM neighbor tests

2015-04-03 Thread Rolf vandeVaart
I ended up looking at this and it was a bug in this set of tests.  Needed to 
check for MPI_COMM_NULL in a few places.
This has been fixed.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Thursday, April 02, 2015 10:10 AM
To: de...@open-mpi.org
Subject: [OMPI devel] Problems with some IBM neighbor tests


I just recently bumped running some tests from np=4 to np=6.  I am now seeing 
failures on the following tests in the ibm/collective directory.

ineighbor_allgather, ineighbor_allgatherv, ineighbor_alltoall, 
ineighbor_alltoallv, ineighbor_alltoallw

neighbor_allfather, neighbor_allgatherv, neighbor_alltoall, neighbor_alltoallv, 
neighbor_alltoallw



The test fails like this:

[rvandevaart@drossetti-ivy4 collective]$ mpirun -np 6 ineighbor_allgather
Testing MPI_Neighbor_allgather on cartesian communicator
[drossetti-ivy4:26205] *** An error occurred in MPI_Cart_coords
[drossetti-ivy4:26205] *** reported by process [3563978753,4]
[drossetti-ivy4:26205] *** on communicator MPI_COMM_WORLD
[drossetti-ivy4:26205] *** MPI_ERR_COMM: invalid communicator
[drossetti-ivy4:26205] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,
[drossetti-ivy4:26205] ***and potentially your MPI job)
Pass!

However, these tests appear to pass for multiples of 4 like np=4, 8, 12, 16, 
20, etc...



Anyone know if this bug in test or in library?  This happens on both 1.8 and 
master.

Thanks,

Rolf


This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



[OMPI devel] Problems with some IBM neighbor tests

2015-04-02 Thread Rolf vandeVaart
I just recently bumped running some tests from np=4 to np=6.  I am now seeing 
failures on the following tests in the ibm/collective directory.

ineighbor_allgather, ineighbor_allgatherv, ineighbor_alltoall, 
ineighbor_alltoallv, ineighbor_alltoallw

neighbor_allfather, neighbor_allgatherv, neighbor_alltoall, neighbor_alltoallv, 
neighbor_alltoallw


The test fails like this:

[rvandevaart@drossetti-ivy4 collective]$ mpirun -np 6 ineighbor_allgather
Testing MPI_Neighbor_allgather on cartesian communicator
[drossetti-ivy4:26205] *** An error occurred in MPI_Cart_coords
[drossetti-ivy4:26205] *** reported by process [3563978753,4]
[drossetti-ivy4:26205] *** on communicator MPI_COMM_WORLD
[drossetti-ivy4:26205] *** MPI_ERR_COMM: invalid communicator
[drossetti-ivy4:26205] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,
[drossetti-ivy4:26205] ***and potentially your MPI job)
Pass!

However, these tests appear to pass for multiples of 4 like np=4, 8, 12, 16, 
20, etc...


Anyone know if this bug in test or in library?  This happens on both 1.8 and 
master.

Thanks,

Rolf

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] New binding warnings in master

2015-03-20 Thread Rolf vandeVaart
Greetings:

I am now seeing the following message for all my calls to mpirun on ompi 
master.  This started with last night's MTT run.  Is this intentional?


[rvandevaart@ivy0 ~]$ mpirun -np 1 hostname
--
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  ivy0

This usually is due to not having the required NUMA support installed
on the node. In some Linux distributions, the required support is
contained in the libnumactl and libnumactl-devel packages.
This is a warning only; your job will continue, though performance may be 
degraded.
--
ivy0.nvidia.com



On another note, I noticed on both 1.8 and master that we get different number 
of nodes if we specify the hostname.  This is not too big a deal, but surprised 
me.

[rvandevaart@ivy0 ~]$ /opt/openmpi/v1.8.4/bin/mpirun hostname
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
ivy0.nvidia.com
[rvandevaart@ivy0 ~]$ /opt/openmpi/v1.8.4/bin/mpirun -host ivy0 hostname
ivy0.nvidia.com
[rvandevaart@ivy0 ~]$

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] BML changes

2015-02-26 Thread Rolf vandeVaart
This message is mostly for Nathan, but figured I would go with the wider 
distribution. I have noticed some different behaviour that I assume started 
with this change.


https://github.com/open-mpi/ompi/commit/4bf7a207e90997e75ba1c60d9d191d9d96402d04


I am noticing that the openib BTL will also be used for on-node communication 
even though the sm (or smcuda) BTL is also available. I think with the 
aforementioned change that the openib BTL is listed as an available BTL that 
supports RDMA. While looking through the debugger and looking at the 
bml_endpoint, it appears that the sm BTL is listed as the eager and send BTL, 
but the openib is listed as the RDMA btl. Looking at the logic in 
pml_ob1_sendreq.h, it looks like we can end up selecting the openib btl for 
some of the communication. I ran with some various verbosity and saw that this 
was happening. With v1.8, we only appear to use the sm (or smcuda) btl.


I am wondering if this was intentional with this change or maybe a side effect.


Rolf


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

2014-12-17 Thread Rolf vandeVaart
I think this has already been fixed by Ralph this morning.  I had observed the 
same issue but is now gone.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Brice Goglin
Sent: Wednesday, December 17, 2014 3:53 PM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

Le 17/12/2014 21:43, Paul Hargrove a écrit :

Dbx gives me
t@1 (l@1) terminated by signal SEGV (no mapping at the fault address)
Current function is opal_hwloc172_hwloc_get_obj_by_depth
   74 return topology->levels[depth][idx];
(dbx) where
current thread: t@1
=>[1] opal_hwloc172_hwloc_get_obj_by_depth(topology = 0x4d49e0, depth = 0, idx 
= 0), line 74 in "traversal.c"
  [2] opal_hwloc172_hwloc_get_root_obj(topology = 0x4d49e0), line 118 in 
"helper.h"
  [3] opal_hwloc_base_get_nbobjs_by_type(topo = 0x4d49e0, target = 
OPAL_HWLOC172_hwloc_OBJ_CORE, cache_level = 0, rtype = '\003'), line 833 in 
"hwloc_base_util.c"
  [4] orte_rmaps_rr_byobj(jdata = 0x43c940, app = 0x483fe0, node_list = 
0xfd7fffdff4b0, num_slots = 2, num_procs = 2U, target = 
OPAL_HWLOC172_hwloc_OBJ_CORE, cache_level = 0), line 495 in "rmaps_rr_mappers.c"
  [5] orte_rmaps_rr_map(jdata = 0x43c940), line 165 in "rmaps_rr.c"
  [6] orte_rmaps_base_map_job(fd = -1, args = 4, cbdata = 0x4a3300), line 277 
in "rmaps_base_map_job.c"
  [7] event_process_active_single_queue(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 
0xfd7fe453afbc
  [8] event_process_active(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfd7fe453b361
  [9] opal_libevent2021_event_base_loop(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 
0xfd7fe453bc79
  [10] orterun(argc = 9, argv = 0xfd7fffdffa58), line 1081 in "orterun.c"
  [11] main(argc = 9, argv = 0xfd7fffdffa58), line 13 in "main.c"
(dbx) print depth
depth = 0
(dbx) print index
index = 0xfd7fff19c174

Pretty sure that index value is bogus.


I see "idx" instead of "index" in the code above. index may be a pointer to the 
"index()" function in your standard library?
Anyway, depth=0 and idx=0 is totally valid, especially when called from 
hwloc_get_root_obj(). Something bad happened to the topology object? Can you 
print the contents of topology and topology->nblevels and topology->levels ?

Brice

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] coll ml error with some nonblocking collectives

2014-09-15 Thread Rolf vandeVaart
Confirmed that trunk version r32658 does pass the test.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Pritchard Jr., 
Howard
Sent: Monday, September 15, 2014 4:16 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] coll ml error with some nonblocking collectives

Hi Rolf,

This may be related to change set 32659.

If you back this change out, do the tests pass?


Howard




From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Monday, September 15, 2014 8:55 AM
To: de...@open-mpi.org<mailto:de...@open-mpi.org>
Subject: [OMPI devel] coll ml error with some nonblocking collectives


I wonder if anyone else is seeing this failure. Not sure when this started but 
it is only on the trunk. Here is a link to my failures as well as an example 
below that. There are a variety of nonblocking collectives failing like this.



http://mtt.open-mpi.org/index.php?do_redir=2208



[rvandevaart@drossetti-ivy0 collective]$ mpirun --mca btl self,sm,tcp -host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 iallreduce
--

ML detected an unrecoverable error on intrinsic communicator MPI_COMM_WORLD

The program will now abort
--
[drossetti-ivy0.nvidia.com:04664] 3 more processes have sent help message 
help-mpi-coll-ml.txt / coll-ml-check-fatal-error
[rvandevaart@drossetti-ivy0 collective]$


This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



[OMPI devel] coll ml error with some nonblocking collectives

2014-09-15 Thread Rolf vandeVaart
I wonder if anyone else is seeing this failure. Not sure when this started but 
it is only on the trunk. Here is a link to my failures as well as an example 
below that. There are a variety of nonblocking collectives failing like this.


http://mtt.open-mpi.org/index.php?do_redir=2208


[rvandevaart@drossetti-ivy0 collective]$ mpirun --mca btl self,sm,tcp -host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 iallreduce
--

ML detected an unrecoverable error on intrinsic communicator MPI_COMM_WORLD

The program will now abort
--
[drossetti-ivy0.nvidia.com:04664] 3 more processes have sent help message 
help-mpi-coll-ml.txt / coll-ml-check-fatal-error
[rvandevaart@drossetti-ivy0 collective]$


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Errors on aborting programs on 1.8 r32515

2014-08-13 Thread Rolf vandeVaart
I noticed MTT failures from last night and then reproduced this morning on 1.8 
branch.  Looks like maybe a double free.  I assume it is related to fixes for 
aborting programs. Maybe related to 
https://svn.open-mpi.org/trac/ompi/changeset/32508 but not sure.

[rvandevaart@drossetti-ivy0 environment]$ pwd
/ivylogin/home/rvandevaart/tests/ompi-tests/trunk/ibm/environment
[rvandevaart@drossetti-ivy0 environment]$ mpirun --mca odls_base_verbose 20 -np 
2 abort
[...stuff deleted...]
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to tag 30 
on child [[58714,1],0]
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to tag 30 
on child [[58714,1],1]
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to tag 30 
on child [[58714,1],0]
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls: sending message to tag 30 
on child [[58714,1],1]
**
This program tests MPI_ABORT and generates error messages
ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
**
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 3.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:wait_local_proc child 
process [[58714,1],0] pid 14955 terminated
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child 
[[58714,1],0] exit code 3
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired checking 
abort file /tmp/openmpi-sessions-rvandevaart@drossetti-ivy0_0/58714/1/0/aborted 
for child [[58714,1],0]
[drossetti-ivy0.nvidia.com:14953] [[58714,0],0] odls:waitpid_fired child 
[[58714,1],0] died by call to abort
*** glibc detected *** mpirun: double free or corruption (fasttop): 
0x0130e210 ***

>From gdb:
gdb) where
#0  0x7f75ede138e5 in raise () from /lib64/libc.so.6
#1  0x7f75ede1504d in abort () from /lib64/libc.so.6
#2  0x7f75ede517f7 in __libc_message () from /lib64/libc.so.6
#3  0x7f75ede57126 in malloc_printerr () from /lib64/libc.so.6
#4  0x7f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955, 
status=768, cbdata=0x0)
at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007
#5  0x7f75eef60a78 in do_waitall (options=0) at 
../../orte/runtime/orte_wait.c:554
#6  0x7f75eef60712 in orte_wait_signal_callback (fd=17, event=8, 
arg=0x7f75ef201400) at ../../orte/runtime/orte_wait.c:421
#7  0x7f75eecaecbe in event_signal_closure (base=0x1278370, 
ev=0x7f75ef201400)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1081
#8  0x7f75eecaf7e0 in event_process_active_single_queue (base=0x1278370, 
activeq=0x12788f0)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1359
#9  0x7f75eecafaca in event_process_active (base=0x1278370)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1437
#10 0x7f75eecb0148 in opal_libevent2021_event_base_loop (base=0x1278370, 
flags=1)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1645
#11 0x00405572 in orterun (argc=7, argv=0x7fffbdf1dd08) at 
../../../../orte/tools/orterun/orterun.c:1078
#12 0x00403904 in main (argc=7, argv=0x7fffbdf1dd08) at 
../../../../orte/tools/orterun/main.c:13
(gdb) up
#1  0x7f75ede1504d in abort () from /lib64/libc.so.6
(gdb) up
#2  0x7f75ede517f7 in __libc_message () from /lib64/libc.so.6
(gdb) up
#3  0x7f75ede57126 in malloc_printerr () from /lib64/libc.so.6
(gdb) up
#4  0x7f75eef9eac4 in odls_base_default_wait_local_proc (pid=14955, 
status=768, cbdata=0x0)
at ../../../../orte/mca/odls/base/odls_base_default_fns.c:2007
2007free(abortfile);
(gdb) print abortfile
$1 = 0x130e210 ""
(gdb) 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] RFC: Change default behavior of calling ibv_fork_init

2014-07-31 Thread Rolf vandeVaart
WHAT: Change default behavior in openib to not call ibv_fork_init() even if 
available.
WHY: There are some strange interactions with ummunotify that cause errors.  In 
addition, see the additional points below.
WHEN: After next weekly meeting, August 5, 2014
DETAILS:  This change will just be a couple of lines.  Current default behavior 
is to call ibv_fork_init() if support exists. New default behavior is to call 
it only if asked for.
Essentially, default setting of btl_openib_want_fork_support will change from 
-1 (use it if available) to 0 (do not use unless asked for)


Here are details from an earlier post last year.  
http://www.open-mpi.org/community/lists/devel/2013/12/13395.php
Subject: [OMPI devel] RFC: Calling ibv_fork_init() in the openib BTL
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
List-Post: devel@lists.open-mpi.org
Date: 2013-12-06 10:15:02
To those who care about the openib BTL...
SHORT VERSION
-
Do you really want to call ibv_fork_init() in the openib BTL by default?
MORE DETAIL
---
Rolf V. pointed out to me yesterday that we're calling ibv_fork_init() in the 
openib BTL. He asked if we did the same in the usnic BTL. We don't, and here's 
why:
1. it adds a slight performance penalty for ibv_reg_mr/ibv_dereg_mr
2. the only thing ibv_fork_init() protects against is the child sending from 
memory that it thinks should already be registered:
-
MPI_Init(...)
if (0 == fork()) {
ibv_post_send(some_previously_pinned_buffer, ...);
// ^^ this can't work because the buffer is *not* pinned in the child
// (for lack of a longer explanation here)
}
-
3. ibv_fork_init() is not intended to protect against a child invoking an MPI 
function (if they do that; they get what they deserve!).
Note that #2 can't happen, because MPI doesn't expose its protection domains, 
queue pairs, or registrations (or any of its verbs constructs) at all.
Hence, all ibv_fork_init() does is a) impose a performance penalty, and b) make 
memory physically unavailable in a child process, such that:

ibv_fork_init();
a = malloc(...);
a[0] = 17;
ibv_reg_mr(a, ...);
if (0 == fork()) {
printf("this is a[0]: %d\n", a[0]);
// ^^ This will segv
}
-
But the registered memory may actually be useful in the child.
So I just thought I'd pass this along, and ask the openib-caring people of the 
world if you really still want to be calling ibv_fork_init() by default in the 
openib BTL.
--
Jeff Squyres


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Rolf vandeVaart
Thanks Ralph and Gilles!  All is looking good for me again.  I think all tests 
are passing again.  Will check results again tomorrow.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 30, 2014 10:49 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

I just fixed this one - all that was required was an ampersand as the name was 
being passed into the function instead of a pointer to the name

r32357

On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
<gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote:


Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

Just an FYI that my trunk version (r32355) does not work at all anymore if I do 
not include "--mca coll ^ml".Here is a stack trace from the ibm/pt2pt/send 
test running on a single node.



(gdb) where

#0  0x7f6c0d1321d0 in ?? ()

#1  

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
back_files=0x7f6bf3ffd6c8,

comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
map_all=false) at 
../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
reg_data=0xba28c0)

at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
priority=0x7fffe7991b58) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355

#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
component=0x7f6c0cf50940, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317

#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281

#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
../../../../ompi/mca/coll/base/coll_base_comm_select.c:117

#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918

#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
at pinit.c:84

#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32

(gdb) up

#1  

(gdb) up

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

522   if (name1->jobid < name2->jobid) {

(gdb) print name1

$1 = (const orte_process_name_t *) 0x192350001

(gdb) print *name1

Cannot access memory at address 0x192350001

(gdb) print name2

$2 = (const orte_process_name_t *) 0xbaf76c

(gdb) print *name2

$3 = {jobid = 2452946945, vpid = 1}

(gdb)







>-Original Message-

>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles

>Gouaillardet

>Sent: Wednesday, July 30, 2014 2:16 AM

>To: Open MPI Developers

>Subject: Re: [OMPI devel] trunk compilation errors in jenkins

>

>George,

>

>#4815 is indirectly related to the move :

>

>in bcol/basesmuma, we used to compare ompi_process_name_t, and now

>we (try to) compare an ompi_process_name_t and an opal_process_name_t

>(which causes a glory SIGSEGV)

>

>i proposed a temporary patch which is both broken and unelegant, could you

>please advise a correct solution ?

>

>Cheers,

>

>Gilles

>

>On 2014/07/

Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Rolf vandeVaart
Just an FYI that my trunk version (r32355) does not work at all anymore if I do 
not include "--mca coll ^ml".Here is a stack trace from the ibm/pt2pt/send 
test running on a single node.



(gdb) where

#0  0x7f6c0d1321d0 in ?? ()

#1  

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
back_files=0x7f6bf3ffd6c8,

comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
map_all=false) at 
../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
reg_data=0xba28c0)

at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
priority=0x7fffe7991b58) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355

#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
component=0x7f6c0cf50940, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317

#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281

#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
../../../../ompi/mca/coll/base/coll_base_comm_select.c:117

#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918

#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
at pinit.c:84

#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32

(gdb) up

#1  

(gdb) up

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

522   if (name1->jobid < name2->jobid) {

(gdb) print name1

$1 = (const orte_process_name_t *) 0x192350001

(gdb) print *name1

Cannot access memory at address 0x192350001

(gdb) print name2

$2 = (const orte_process_name_t *) 0xbaf76c

(gdb) print *name2

$3 = {jobid = 2452946945, vpid = 1}

(gdb)







>-Original Message-

>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles

>Gouaillardet

>Sent: Wednesday, July 30, 2014 2:16 AM

>To: Open MPI Developers

>Subject: Re: [OMPI devel] trunk compilation errors in jenkins

>

>George,

>

>#4815 is indirectly related to the move :

>

>in bcol/basesmuma, we used to compare ompi_process_name_t, and now

>we (try to) compare an ompi_process_name_t and an opal_process_name_t

>(which causes a glory SIGSEGV)

>

>i proposed a temporary patch which is both broken and unelegant, could you

>please advise a correct solution ?

>

>Cheers,

>

>Gilles

>

>On 2014/07/27 7:37, George Bosilca wrote:

>> If you have any issue with the move, I'll be happy to help and/or support

>you on your last move toward a completely generic BTL. To facilitate your

>work I exposed a minimalistic set of OMPI information at the OPAL level. Take

>a look at opal/util/proc.h for more info, but please try not to expose more.

>

>___

>devel mailing list

>de...@open-mpi.org

>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel

>Link to this post: 
>http://www.open-

>mpi.org/community/lists/devel/2014/07/15348.php

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.

Re: [OMPI devel] RFC: Bump minimum sm pool size to 128K from 64K

2014-07-26 Thread Rolf vandeVaart
Yes (my mistake)


Sent from my iPhone

On Jul 26, 2014, at 3:19 PM, "George Bosilca" 
<bosi...@icl.utk.edu<mailto:bosi...@icl.utk.edu>> wrote:

We are talking MB not KB isn't it?

  George.



On Thu, Jul 24, 2014 at 2:57 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:
WHAT: Bump up the minimum sm pool size to 128K from 64K.
WHY: When running OSU benchmark on 2 nodes and utilizing a larger 
btl_smcuda_max_send_size, we can run into the case where the free list cannot 
grow.  This is not a common case, but it is something that folks sometimes 
experiment with.  Also note that this minimum was set back 5 years ago so it 
seems that it could be time to bump it up.
WHEN: Tuesday, July 29, 2014 after weekly concall if there are no objections.


[rvandevaart@ivy0 ompi-trunk-regerror]$ svn diff 
ompi/mca/mpool/sm/mpool_sm_component.c
Index: ompi/mca/mpool/sm/mpool_sm_component.c
===
--- ompi/mca/mpool/sm/mpool_sm_component.c  (revision 32293)
+++ ompi/mca/mpool/sm/mpool_sm_component.c  (working copy)
@@ -80,7 +80,7 @@
 }
 };

-static long default_min = 67108864;
+static long default_min = 134217728;
 static unsigned long long ompi_mpool_sm_min_size;
 static int ompi_mpool_sm_verbose;

[rvandevaart@drossetti-ivy0 ompi-trunk-regerror]$
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15257.php

___
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15273.php


[OMPI devel] RFC: Bump minimum sm pool size to 128K from 64K

2014-07-24 Thread Rolf vandeVaart
WHAT: Bump up the minimum sm pool size to 128K from 64K.  
WHY: When running OSU benchmark on 2 nodes and utilizing a larger 
btl_smcuda_max_send_size, we can run into the case where the free list cannot 
grow.  This is not a common case, but it is something that folks sometimes 
experiment with.  Also note that this minimum was set back 5 years ago so it 
seems that it could be time to bump it up.
WHEN: Tuesday, July 29, 2014 after weekly concall if there are no objections.


[rvandevaart@ivy0 ompi-trunk-regerror]$ svn diff 
ompi/mca/mpool/sm/mpool_sm_component.c
Index: ompi/mca/mpool/sm/mpool_sm_component.c
===
--- ompi/mca/mpool/sm/mpool_sm_component.c  (revision 32293)
+++ ompi/mca/mpool/sm/mpool_sm_component.c  (working copy)
@@ -80,7 +80,7 @@
 }
 };
 
-static long default_min = 67108864;
+static long default_min = 134217728;
 static unsigned long long ompi_mpool_sm_min_size;
 static int ompi_mpool_sm_verbose;
 
[rvandevaart@drossetti-ivy0 ompi-trunk-regerror]$ 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] PML-bfo deadlocks for message size > eager limit after connection loss

2014-07-24 Thread Rolf vandeVaart
My guess is that no one is testing the bfo PML.  However, I would have expected 
it to still work with Open MPI 1.6.5.  From your description, it works for 
smaller messages but fails with larger ones?  So, if you just send smaller 
messages and pull the cable, things work correctly?

One idea is to reduce the output you are getting so you can focus on just the 
failover information.  There is no need for any ORTE debug information as that 
is not involved in the failover.  I would go with these:

mpirun -np 2 --hostfile /opt/ddt/nodes --pernode --mca pml bfo --mca btl 
self,sm,openib --mca btl_openib_port_error_failover 1 --mca 
btl_openib_verbose_failover 100 --mca pml_bfo_verbose 100 

You can drop this:  --mca btl_openib_failover_enabled 1  (that is on by default)
 
In terms of where you can debug, most of the failover support code is in two 
files.
ompi/mca/pml/bfo/pml_bfo_failover.c
ompi/mca/btl/openib/btl_openib_failover.c

There is also a READE here:
ompi/mca/pml/bfo/README

You could also try running without eager RDMA enabled: --mca 
btl_openib_use_eager_rdma 0

Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Christoph
>Niethammer
>Sent: Thursday, July 24, 2014 7:54 AM
>To: Open MPI Developers
>Subject: [OMPI devel] PML-bfo deadlocks for message size > eager limit after
>connection loss
>
>Hello,
>
>Is there anybody using/testing the bfo PML - especially with messages > eager
>limit?
>
>Tests using messages > eager limit with the bfo PML seem to deadlock in
>Open MPI 1.6.5 as soon as one of two infiniband connections gets lost (tested
>by disconnecting wire).
>I did not have an opportunity to test 1.8/trunk up to now.
>
>Tests were executed with the following mpirun options:
>
>mpirun -np 2 --hostfile /opt/ddt/nodes --pernode --mca pml bfo --mca
>btl_base_exclude tcp --mca pml bfo --mca btl_openib_port_error_failover 1 -
>-mca btl_openib_failover_enabled 1 --mca btl_openib_port_error_failover 1 -
>-verbose --mca oob_tcp_verbose 100 --mca btl_openib_verbose_failover 100
>--mca btl_openib_verbose 100 --mca btl_base_verbose 100 --mca
>bml_base_verbose 100 --mca pml_bfo_verbose 100 --mca pml_base_verbose
>100 --mca opal_progress_debug 100 --mca orte_debug_verbose 100 --mca
>pml_v_verbose 100 --mca orte_base_help_aggregate 0
>
>Some log output is attached below.
>
>I would appreciate any feedback concerning current status of the bfo PML as
>well as ideas how to debug and where to search for the problem inside the
>Open MPI code base.
>
>
>Best regards
>Christoph Niethammer
>
>--
>
>Christoph Niethammer
>High Performance Computing Center Stuttgart (HLRS) Nobelstrasse 19
>70569 Stuttgart
>
>Tel: ++49(0)711-685-87203
>email: nietham...@hlrs.de
>http://www.hlrs.de/people/niethammer
>
>
>
>
>
>
>[vm2:21970] defining message event: iof_hnp_receive.c 227 [vm1:16449]
>Rank 0 receiving ...
>[vm2:21970] [[22205,0],0] got show_help from [[22205,1],0]
>--
>The OpenFabrics stack has reported a network error event.  Open MPI will try
>to continue, but your job may end up failing.
>
>  Local host:vm1
>  MPI process PID:   16449
>  Error number:  10 (IBV_EVENT_PORT_ERR)
>
>This error may indicate connectivity problems within the fabric; please contact
>your system administrator.
>--
>[vm1][[22205,1],0][btl_openib.c:1350:mca_btl_openib_prepare_dst] frag-
>>sg_entry.lkey = 1829372025 .addr = 1e1bee0 frag-
>>segment.seg_key.key32[0] = 1829372025
>[vm1][[22205,1],0][btl_openib.c:1350:mca_btl_openib_prepare_dst] frag-
>>sg_entry.lkey = 1829372025 .addr = 1e28230 frag-
>>segment.seg_key.key32[0] = 1829372025 [vm2:21970] defining message
>event: iof_hnp_receive.c 227 [vm1:16449]  Bandwidth [MB/s]: 594.353640
>[vm1:16449] Rank 0: loop: 1100 [vm1:16449] Rank 0 sending ...
>[vm2:21970] defining message event: iof_hnp_receive.c 227 [vm2:21970]
>defining message event: iof_hnp_receive.c 227
>[vm1][[22205,1],0][btl_openib_failover.c:696:mca_btl_openib_endpoint_noti
>fy] [vm1:16449] BTL openib error: rank=0 mapping out lid=2:name=mthca0 to
>rank=1 on node=vm2 [vm1:16449] IB: Finished checking for pending_frags,
>total moved=0 [vm1:16449] IB: Finished checking for pending_frags, total
>moved=0 Error sending BROKEN CONNECTION buffer (Success)
>[[22205,1],1][btl_openib_component.c:3496:handle_wc] from vm2 to: 192
>error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for
>wr_id bdba80 opcode 1  vendor error 129 qp_idx 0 [vm2:21970] [[22205,0],0]
>got show_help from [[22205,1],1]
>--
>The InfiniBand retry count between two MPI processes has been exceeded.
>"Retry count" is defined in the InfiniBand spec 1.2 (section 12.7.38):
>
>The total number of times that the sender wishes the receiver to
>retry timeout, packet sequence, etc. errors 

Re: [OMPI devel] Onesided failures

2014-07-16 Thread Rolf vandeVaart
Sounds like a good plan.  Thanks for looking into this Gilles!
Regards,
Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles GOUAILLARDET
Sent: Wednesday, July 16, 2014 9:53 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Onesided failures

Rolf,

From the man page of MPI_Win_allocate_shared

It is the user's responsibility to ensure that the communicator comm represents 
a group of processes that can create a shared memory segment that can be 
accessed by all processes in the group

And from the mtt logs, you are running 4 tasks on 2 nodes.

Unless I am missing something obvious, I will update the test tomorrow and add 
a comm split to ensure MPI_Win_allocate_shared is called from single node 
communicator and skip the test if this impossible

Cheers,

Gilles

Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:
On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three tests 
fail.
http://mtt.open-mpi.org/index.php?do_redir=2205

Ibm/onesided/win_allocate_shared
Ibm/onesided/win_allocated_shared_mpifh
Ibm/onesided/win_allocated_shared_usempi

Is there a ticket that covers these failures?

Thanks,
Rolf

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



[OMPI devel] Onesided failures

2014-07-16 Thread Rolf vandeVaart
On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three tests 
fail.
http://mtt.open-mpi.org/index.php?do_redir=2205

Ibm/onesided/win_allocate_shared
Ibm/onesided/win_allocated_shared_mpifh
Ibm/onesided/win_allocated_shared_usempi

Is there a ticket that covers these failures?

Thanks,
Rolf

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] New crash on trunk (r32246)

2014-07-15 Thread Rolf vandeVaart
With the latest trunk (r32246) I am getting crashes while the program is 
shutting down.  I assume this is related to some of the changes George just 
made.  George, can you take a look when you get a chance?
Looks like everyone is getting the segv during shutdown (mpirun, orted, and 
application)  Stacktrace of the application shows this:

Program terminated with signal 11, Segmentation fault.
#0  0x7fc48c6a3145 in opal_class_finalize () at 
../../opal/class/opal_object.c:175
175free(cls->cls_construct_array);
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.107.el6_4.5.x86_64 libgcc-4.4.7-3.el6.x86_64
(gdb) where
#0  0x7fc48c6a3145 in opal_class_finalize () at 
../../opal/class/opal_object.c:175
#1  0x7fc48c6a8253 in opal_finalize_util () at 
../../opal/runtime/opal_finalize.c:110
#2  0x7fc48d2697e9 in ompi_mpi_finalize () at 
../../ompi/runtime/ompi_mpi_finalize.c:454
#3  0x7fc48d2925a9 in PMPI_Finalize () at pfinalize.c:46
#4  0x00401687 in main (argc=1, argv=0x7fff0e936fb8) at isend.c:109
(gdb) quit

mpirun -host drossetti-ivy0,drossetti-ivy1 -np 2 --mca pml ob1 --mca btl 
sm,tcp,self --mca coll_ml_disable_allgather 1 --mca 
btl_openib_warn_default_gid_prefix 0 isend [drossetti-ivy0:13073] *** Process 
received signal *** [drossetti-ivy0:13073] Signal: Segmentation fault (11) 
[drossetti-ivy0:13073] Signal code: Address not mapped (1) 
[drossetti-ivy0:13073] Failing at address: 0x7fc48abb2d68 
[drossetti-ivy0:13073] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7fc48d005500]
[drossetti-ivy0:13073] [ 1] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_class_finalize+0x4a)[0x7fc48c6a3145]
[drossetti-ivy0:13073] [ 2] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_finalize_util+0xc3)[0x7fc48c6a8253]
[drossetti-ivy0:13073] [ 3] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libmpi.so.0(ompi_mpi_finalize+0xc4c)[0x7fc48d2697e9]
[drossetti-ivy0:13073] [ 4] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libmpi.so.0(PMPI_Finalize+0x59)[0x7fc48d2925a9]
[drossetti-ivy0:13073] [ 5] isend[0x401687] [drossetti-ivy0:13073] [ 6] 
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fc48cc81cdd]
[drossetti-ivy0:13073] [ 7] isend[0x400f49] [drossetti-ivy0:13073] *** End of 
error message *** [drossetti-ivy1:29629] *** Process received signal *** 
[drossetti-ivy1:29629] Signal: Segmentation fault (11) [drossetti-ivy1:29629] 
Signal code: Address not mapped (1) [drossetti-ivy1:29629] Failing at address: 
0x7f239ded6d68 [drossetti-ivy1:29629] [ 0] 
/lib64/libpthread.so.0(+0xf500)[0x7f23a0329500]
[drossetti-ivy1:29629] [ 1] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_class_finalize+0x4a)[0x7f239f9c7145]
[drossetti-ivy1:29629] [ 2] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_finalize_util+0xc3)[0x7f239f9cc253]
[drossetti-ivy1:29629] [ 3] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libmpi.so.0(ompi_mpi_finalize+0xc4c)[0x7f23a058d7e9]
[drossetti-ivy1:29629] [ 4] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libmpi.so.0(PMPI_Finalize+0x59)[0x7f23a05b65a9]
[drossetti-ivy1:29629] [ 5] isend[0x401687] [drossetti-ivy1:29629] [ 6] 
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f239ffa5cdd]
[drossetti-ivy1:29629] [ 7] isend[0x400f49] [drossetti-ivy1:29629] *** End of 
error message ***
--
mpirun noticed that process rank 0 with PID 0 on node drossetti-ivy0 exited on 
signal 11 (Segmentation fault).
--
[drossetti-ivy0:13070] *** Process received signal *** [drossetti-ivy0:13070] 
Signal: Segmentation fault (11) [drossetti-ivy0:13070] Signal code: Address not 
mapped (1) [drossetti-ivy0:13070] Failing at address: 0x7eff348fbd68 
[drossetti-ivy0:13070] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7eff362d1500]
[drossetti-ivy0:13070] [ 1] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_class_finalize+0x4a)[0x7eff36fb4145]
[drossetti-ivy0:13070] [ 2] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_finalize_util+0xc3)[0x7eff36fb9253]
[drossetti-ivy0:13070] [ 3] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-pal.so.0(opal_finalize+0x105)[0x7eff36fb935f]
[drossetti-ivy0:13070] [ 4] 
/ivylogin/home/rvandevaart/ompi-repos/ompi-trunk-original/64-dbg-nocuda/lib/libopen-rte.so.0(orte_finalize+0xd3)[0x7eff372b9f9f]
[drossetti-ivy0:13070] [ 5] mpirun(orterun+0x15b5)[0x40573e] 
[drossetti-ivy0:13070] [ 6] mpirun(main+0x20)[0x403a14] [drossetti-ivy0:13070] 
[ 7] 

[OMPI devel] Hangs on the trunk

2014-07-14 Thread Rolf vandeVaart
I have noticed that I am seeing some tests hang on the trunk.  For example:

$ mpirun --mca btl_tcp_if_include eth0 --host drossetti-ivy0,drossetti-ivy1 -np 
2 --mca pml ob1 --mca btl sm,tcp,self --mca coll_mdisable_allgather 1 --mca 
btl_openib_warn_default_gid_prefix 0 send

It is not unusual for this test to take several minutes, particularly on slow 
networks.
Please be patient.
NOTICE: Using max message size: 10485760
Progress: [=

Is anyone else seeing this?  (This is really a hang in spite of the message 
saying it should take a few minutes)

This started with the changes Nathan did for renaming the descriptor fields - 
r32196 through r32202.
>From what I can tell, it looks like it hangs the second time the rendezvous 
>protocol is used to send the data.

Rolf

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] iallgather failures with coll ml

2014-06-11 Thread Rolf vandeVaart
Hearing no response, I assume this is not a known issue so I submitted 
https://svn.open-mpi.org/trac/ompi/ticket/4709
Nathan, is this something that you can look at?

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Friday, June 06, 2014 1:55 PM
To: de...@open-mpi.org
Subject: [OMPI devel] iallgather failures with coll ml

On the trunk, I am seeing failures of the ibm tests iallgather and 
iallgather_in_place.  Is this a known issue?

$ mpirun --mca btl self,sm,tcp --mca coll ml,basic,libnbc --host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 -np 4 iallgather
[**ERROR**]: MPI_COMM_WORLD rank 0, file iallgather.c:77:
bad answer (0) at index 1 of 4 (should be 1)
[**ERROR**]: MPI_COMM_WORLD rank 1, file iallgather.c:77:
bad answer (0) at index 1 of 4 (should be 1)

Interestingly, there is an MCA param to disable it in coll ml which allows the 
test to pass.

$ mpirun --mca coll_ml_disable_allgather 1 --mca btl self,sm,tcp --mca coll 
ml,basic,libnbc --host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 -np 4 iallgather
$ echo $?
0




This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



[OMPI devel] Open MPI Core Developer - Minutes June 10, 2014

2014-06-10 Thread Rolf vandeVaart
Minutes of June 10, 2014 Open MPI Core Developer Meeting


1.   Review 1.6 - Nothing new

2.   Review 1.8 - Most things are doing fine.  Still several tickets 
awaiting review.  If influx of bugs slows, then we will get 1.8.2 release 
ready.  Rolf was concerned about intermittent hangs, but needs to investigate.

3.   Some discussion of RFC process.  Keep it as is for now, but will 
discuss again at developers meeting

4.   Slurm is difficult to support because the slurm code is constantly 
changing.  This makes it very difficult to support different versions and 
creates lots of integration issues with ORTE.  Someone needs to step up and 
help Ralph with supporting slurm.  Joshua Ladd says Mellanox can perhaps help 
with this.

5.   Discussion about adding STCI component to OMPI/RTE framework.  Sounds 
like everyone is good with it going in.

6.   UDCM bug - problem is with user system, not with UDCM.  But maybe we 
still need to do something so that UDCM fails in a better way.  Nathan 
investigating.

7.   Jeff opened up a blocker bug - missing fortran APIs

8.   Ralph talked about the idea that he will be bringing back a 
"minimizing the modex" feature.  Used to have something that would check 
whether we needed endpoint information, and if it was not needed, we would just 
fall through and skip the barrier.  This functionality has atrophied, but Ralph 
is hoping to restore it.

9.   As much as we would like a link to the email message on the email to 
the user or development list, it is not easy to implement.  So, it will not be 
happening.


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang on trunk

2014-06-06 Thread Rolf vandeVaart
Thanks for trying Ralph.   Looks like my issues has to do with coll ml 
interaction.  If I exclude coll ml, then all my tests pass.  Do you know if 
there is a bug for this issue?
If so, then I can run my nightly tests with coll ml disabled and wait for the 
bug to be fixed.

Also, where does simple_spawn and spawn_multiple live?  I was running "spawn" 
and "spawn_multiple" from the ibm/dynamic test suite.
Your output for spawn_multiple looks different than mine.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, June 06, 2014 3:19 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang 
on trunk

Works fine for me:

[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./simple_spawn
[pid 22777] starting up!
[pid 22778] starting up!
[pid 22779] starting up!
1 completed MPI_Init
Parent [pid 22778] about to spawn!
2 completed MPI_Init
Parent [pid 22779] about to spawn!
0 completed MPI_Init
Parent [pid 22777] about to spawn!
[pid 22783] starting up!
[pid 22784] starting up!
Parent done with spawn
Parent sending message to child
Parent done with spawn
Parent done with spawn
0 completed MPI_Init
Hello from the child 0 of 2 on host bend001 pid 22783
Child 0 received msg: 38
1 completed MPI_Init
Hello from the child 1 of 2 on host bend001 pid 22784
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
22784: exiting
22778: exiting
22779: exiting
22777: exiting
22783: exiting
[rhc@bend001 mpi]$ make spawn_multiple
mpicc -g --openmpi:linkallspawn_multiple.c   -o spawn_multiple
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./spawn_multiple
Parent [pid 22797] about to spawn!
Parent [pid 22798] about to spawn!
Parent [pid 22799] about to spawn!
Parent done with spawn
Parent done with spawn
Parent sending message to children
Parent done with spawn
Hello from the child 0 of 2 on host bend001 pid 22803: argv[1] = foo
Child 0 received msg: 38
Hello from the child 1 of 2 on host bend001 pid 22804: argv[1] = bar
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 -mca coll ^ml ./intercomm_create
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, ) [rank 3]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, ) [rank 4]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, ) [rank 5]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, ) [rank 3]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, ) [rank 4]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, ) [rank 5]
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, ) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, ) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, ) (0)
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
a: intercomm_merge(0) (0) [rank 2]
c: intercomm_merge(0) (0) [rank 8]
a: intercomm_merge(0) (0) [rank 0]
a: intercomm_merge(0) (0) [rank 1]
c: intercomm_merge(0) (0) [rank 7]
b: intercomm_merge(1) (0) [rank 4]
b: intercomm_merge(1) (0) [rank 5]
c: intercomm_merge(0) (0) [rank 6]
b: intercomm_merge(1) (0) [rank 3]
a: barrier (0)
b: barrier (0)
c: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 1
dpm_base_disconnect_init: error -12 in isend to process 3
[rhc@bend001 mpi]$



On Jun 6, 2014, at 11:26 AM, Rolf vandeVaart 
<rvandeva...@nvi

[OMPI devel] Strange intercomm_create, spawn, spawn_multiple hang on trunk

2014-06-06 Thread Rolf vandeVaart
I am seeing an interesting failure on trunk.  intercomm_create, spawn, and 
spawn_multiple from the IBM tests hang if I explicitly list the hostnames to 
run on.  For example:

Good:
$ mpirun -np 2 --mca btl self,sm,tcp spawn_multiple
Parent: 0 of 2, drossetti-ivy0.nvidia.com (0 in init)
Parent: 1 of 2, drossetti-ivy0.nvidia.com (0 in init)
Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
$ 

Bad:
$ mpirun -np 2 --mca btl self,sm,tcp -host drossetti-ivy0,drossetti-ivy0 
spawn_multiple
Parent: 0 of 2, drossetti-ivy0.nvidia.com (1 in init)
Parent: 1 of 2, drossetti-ivy0.nvidia.com (1 in init)
Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
[..and we are hung here...]

I see the exact same behavior for spawn and spawn_multiple.  Ralph, any 
thoughts?  Open MPI 1.8 is fine.  I can provide more information if needed, but 
I assume this is reproducible. 

Thanks,
Rolf
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] regression with derived datatypes

2014-05-30 Thread Rolf vandeVaart
This fixed all of my issues.  Thanks.  I will add that comment to ticket also.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Thursday, May 29, 2014 5:58 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] regression with derived datatypes
>
>r31904 should fix this issue. Please test it thoughtfully and report all 
>issues.
>
>  George.
>
>
>On Fri, May 9, 2014 at 6:56 AM, Gilles Gouaillardet
><gilles.gouaillar...@iferc.org> wrote:
>> i opened #4610 https://svn.open-mpi.org/trac/ompi/ticket/4610
>> and attached a patch for the v1.8 branch
>>
>> i ran several tests from the intel_tests test suite and did not
>> observe any regression.
>>
>> please note there are still issues when running with --mca btl
>> scif,vader,self
>>
>> this might be an other issue, i will investigate more next week
>>
>> Gilles
>>
>> On 2014/05/09 18:08, Gilles Gouaillardet wrote:
>>> I ran some more investigations with --mca btl scif,self
>>>
>>> i found that the previous patch i posted was complete crap and i
>>> apologize for it.
>>>
>>> on a brighter side, and imho, the issue only occurs if fragments are
>>> received (and then processed) out of order.
>>> /* i did not observe this with the tcp btl, but i always see that
>>> with the scif btl, i guess this can be observed too with openib+RDMA
>>> */
>>>
>>> in this case only, opal_convertor_generic_simple_position(...) is
>>> invoked and does not set the pConvertor->pStack as expected by r31496
>>>
>>> i will run some more tests from now
>>>
>>> Gilles
>>>
>>> On 2014/05/08 2:23, George Bosilca wrote:
>>>> Strange. The outcome and the timing of this issue seems to highlight a link
>with the other datatype-related issue you reported earlier, and as suggested
>by Ralph with Gilles scif+vader issue.
>>>>
>>>> Generally speaking, the mechanism used to split the data in the case of
>multiple BTLs, is identical to the one used to split the data in fragments. 
>So, if
>the culprit is in the splitting logic, one might see some weirdness as soon as
>we force the exclusive usage of the send protocol, with an unconventional
>fragment size.
>>>>
>>>> In other words using the following flags “—mca btl tcp,self —mca
>btl_tcp_flags 3 —mca btl_tcp_rndv_eager_limit 23 —mca btl_tcp_eager_limit
>23 —mca btl_tcp_max_send_size 23” should always transfer wrong data,
>even when only one single BTL is in play.
>>>>
>>>>   George.
>>>>
>>>> On May 7, 2014, at 13:11 , Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>>>>
>>>>> OK.  So, I investigated a little more.  I only see the issue when I am
>running with multiple ports enabled such that I have two openib BTLs
>instantiated.  In addition, large message RDMA has to be enabled.  If those
>conditions are not met, then I do not see the problem.  For example:
>>>>> FAILS:
>>>>> Ø  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include
>>>>> mlx5_0:1,mlx5_0:2 –mca btl_openib_flags 3 MPI_Isend_ator_c
>>>>> PASS:
>>>>> Ø  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include
>>>>> mlx5_0:1 –mca btl_openib_flags 3 MPI_Isend_ator_c Ø  mpirun –np 2
>>>>> –host host1,host2 –mca btl_openib_if_include_mlx5:0:1,mlx5_0:2 –mca
>>>>> btl_openib_flags 1 MPI_Isend_ator_c
>>>>>
>>>>> So we must have some type of issue when we break up the message
>between the two openib BTLs.  Maybe someone else can confirm my
>observations?
>>>>> I was testing against the latest trunk.
>>>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/05/14766.php
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: http://www.open-
>mpi.org/community/lists/devel/2014/05/14910.php

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Intermittent hangs when exiting with error

2014-05-29 Thread Rolf vandeVaart
Ralph:
I am seeing cases where mpirun seems to hang when one of the applications exits 
with non-zero.  For example, the intel test MPI_Cart_get_c will exit that way 
if there are not enough processes to run the test.  In most cases, mpirun seems 
to return fine with the error code, but sometimes it just hangs.   I first 
started noticing this in my mtt runs.  It seems (but not conclusive) that I see 
this when both the usnic and openib are built, even though I am only using the 
openib (as I have no usnic hardware).

Anyone else seeing something like this?  Note that I see this on both 1.8 and 
trunk, but I show trunk here.


PASS:
[rvandevaart@drossetti-ivy0 src]$ mpirun --mca btl self,sm,usnic,openib --host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 -np 4 --mca 
btl_openib_warn_default_gid_prefix 0 MPI_Cart_get_c
MPITEST skip (1): WARNING --  nodes =   4   Need   6 nodes to run test
MPITEST info  (0): Starting MPI_Cart_get  test
MPITEST skip (0): WARNING --  nodes =   4   Need   6 nodes to run test
MPITEST skip (3): WARNING --  nodes =   4   Need   6 nodes to run test
MPITEST skip (2): WARNING --  nodes =   4   Need   6 nodes to run test
---
Primary job  terminated normally, but 1 process returned a non-zero exit code.. 
Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status, thus 
causing the job to be terminated. The first process to do so was:

  Process name: [[45854,1],1]
  Exit code:77
--

FAIL:
[rvandevaart@drossetti-ivy0 src]$ mpirun --mca btl self,sm,usnic,openib --host 
drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 -np 4 --mca 
btl_openib_warn_default_gid_prefix 0 MPI_Cart_get_c
MPITEST skip (1): WARNING --  nodes =   4   Need   6 nodes to run test
MPITEST info  (0): Starting MPI_Cart_get  test
MPITEST skip (0): WARNING --  nodes =   4   Need   6 nodes to run test
MPITEST skip (3): WARNING --  nodes =   4   Need   6 nodes to run test
MPITEST skip (2): WARNING --  nodes =   4   Need   6 nodes to run test
---
Primary job  terminated normally, but 1 process returned a non-zero exit code.. 
Per user-direction, the job has been aborted.
---
[...now we are hung...]

LOCAL mpirun:
[rvandevaart@drossetti-ivy0 64-mtt-nocuda]$ pstack 27705 Thread 2 (Thread 
0x7fe0c8c47700 (LWP 27706)):
#0  0x7fe0ca578533 in select () from /lib64/libc.so.6
#1  0x7fe0c8c5591e in listen_thread () from 
/geppetto/home/rvandevaart/ompi/ompi-trunk-reduction-new/64-mtt-nocuda/lib/openmpi/mca_oob_tcp.so
#2  0x7fe0ca831851 in start_thread () from /lib64/libpthread.so.0
#3  0x7fe0ca57f94d in clone () from /lib64/libc.so.6 Thread 1 (Thread 
0x7fe0cbcdd700 (LWP 27705)):
#0  0x7fe0ca576293 in poll () from /lib64/libc.so.6
#1  0x7fe0cb589575 in poll_dispatch () from 
/geppetto/home/rvandevaart/ompi/ompi-trunk-reduction-new/64-mtt-nocuda/lib/libopen-pal.so.0
#2  0x7fe0cb57df8c in opal_libevent2021_event_base_loop () from 
/geppetto/home/rvandevaart/ompi/ompi-trunk-reduction-new/64-mtt-nocuda/lib/libopen-pal.so.0
#3  0x00405572 in orterun ()
#4  0x00403904 in main ()
[rvandevaart@drossetti-ivy0 64-mtt-nocuda]$

REMOTE ORTED:
[rvandevaart@drossetti-ivy1 ~]$ pstack 10241
#0  0x7fbdcba7c258 in poll () from /lib64/libc.so.6
#1  0x7fbdcca8f575 in poll_dispatch () from 
/geppetto/home/rvandevaart/ompi/ompi-trunk-reduction-new/64-mtt-nocuda/lib/libopen-pal.so.0
#2  0x7fbdcca83f8c in opal_libevent2021_event_base_loop () from 
/geppetto/home/rvandevaart/ompi/ompi-trunk-reduction-new/64-mtt-nocuda/lib/libopen-pal.so.0
#3  0x7fbdccd572cc in orte_daemon () from 
/geppetto/home/rvandevaart/ompi/ompi-trunk-reduction-new/64-mtt-nocuda/lib/libopen-rte.so.0
#4  0x0040094a in main ()
[rvandevaart@drossetti-ivy1 ~]$


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Still problems with del_procs in trunkj

2014-05-23 Thread Rolf vandeVaart
I am still seeing problems with del_procs with openib.  Do we believe 
everything should be working?  This is with the latest trunk (updated 1 hour 
ago).

[rvandevaart@drossetti-ivy0 examples]$ mpirun --mca btl_openib_if_include 
mlx5_0:1 -np 2 -host drossetti-ivy0,drossetti-ivy1 connectivity_cConnectivity 
test on 2 processes PASSED.
connectivity_c: ../../../../../ompi/mca/btl/openib/btl_openib.c:1151: 
mca_btl_openib_del_procs: Assertion 
`((opal_object_t*)endpoint)->obj_reference_count == 1' failed.
connectivity_c: ../../../../../ompi/mca/btl/openib/btl_openib.c:1151: 
mca_btl_openib_del_procs: Assertion 
`((opal_object_t*)endpoint)->obj_reference_count == 1' failed.
--
mpirun noticed that process rank 1 with PID 28443 on node drossetti-ivy1 exited 
on signal 11 (Segmentation fault).
--
[rvandevaart@drossetti-ivy0 examples]$ 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] RFC: [UPDATE] Add some basic CUDA-aware support to reductions

2014-05-21 Thread Rolf vandeVaart
NOTE: This is an update to the RFC after review and help from George

WHAT: Add some basic support so that reduction functions can support GPU 
buffers.  Create new coll module that is only compiled in when CUDA-aware 
support is compiled in.  This patch moves the GPU data into a host buffer 
before the reduction call and move it back to GPU after the reduction call.  
Changes have no effect if CUDA-aware support is not compiled in. 

WHY: Users of CUDA-aware support expect reductions to work. 

WHEN: Friday, May 23, 2014 



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
Index: ompi/mca/coll/cuda/configure.m4
===
--- ompi/mca/coll/cuda/configure.m4	(revision 0)
+++ ompi/mca/coll/cuda/configure.m4	(revision 0)
@@ -0,0 +1,29 @@
+# -*- shell-script -*-
+#
+# Copyright (c) 2014  The University of Tennessee and The University
+# of Tennessee Research Foundation.  All rights
+# reserved.
+# Copyright (c) 2014  NVIDIA Corporation.  All rights reserved.
+# $COPYRIGHT$
+#
+# Additional copyrights may follow
+#
+# $HEADER$
+#
+
+# MCA_coll_cuda_CONFIG([action-if-can-compile],
+#  [action-if-cant-compile])
+# 
+AC_DEFUN([MCA_ompi_coll_cuda_CONFIG],[
+AC_CONFIG_FILES([ompi/mca/coll/cuda/Makefile])
+
+# make sure that CUDA-aware checks have been done
+AC_REQUIRE([OPAL_CHECK_CUDA])
+
+# Only build if CUDA support is available
+AS_IF([test "x$CUDA_SUPPORT" = "x1"],
+  [$1],
+  [$2])
+
+])dnl
+
Index: ompi/mca/coll/cuda/coll_cuda_allreduce.c
===
--- ompi/mca/coll/cuda/coll_cuda_allreduce.c	(revision 0)
+++ ompi/mca/coll/cuda/coll_cuda_allreduce.c	(revision 0)
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2014  The University of Tennessee and The University
+ * of Tennessee Research Foundation.  All rights
+ * reserved.
+ * Copyright (c) 2014  NVIDIA Corporation.  All rights reserved.
+ * $COPYRIGHT$
+ * 
+ * Additional copyrights may follow
+ * 
+ * $HEADER$
+ */
+
+#include "ompi_config.h"
+#include "coll_cuda.h"
+
+#include 
+
+#include "ompi/op/op.h"
+#include "opal/datatype/opal_convertor.h"
+#include "opal/datatype/opal_datatype_cuda.h"
+
+/*
+ *	allreduce_intra
+ *
+ *	Function:	- allreduce using other MPI collectives
+ *	Accepts:	- same as MPI_Allreduce()
+ *	Returns:	- MPI_SUCCESS or error code
+ */
+int
+mca_coll_cuda_allreduce(void *sbuf, void *rbuf, int count,
+struct ompi_datatype_t *dtype,
+struct ompi_op_t *op,
+struct ompi_communicator_t *comm,
+mca_coll_base_module_t *module)
+{
+mca_coll_cuda_module_t *s = (mca_coll_cuda_module_t*) module;
+ptrdiff_t true_lb, true_extent, lb, extent;
+char *rbuf1 = NULL, *sbuf1 = NULL, *rbuf2;
+const char *sbuf2;
+size_t bufsize;
+int rc;
+
+ompi_datatype_get_extent(dtype, , );
+ompi_datatype_get_true_extent(dtype, _lb, _extent);
+bufsize = true_extent + (ptrdiff_t)(count - 1) * extent;
+if ((MPI_IN_PLACE != sbuf) && (opal_cuda_check_bufs((char *)sbuf, NULL))) {
+sbuf1 = (char*)malloc(bufsize);
+if (NULL == sbuf1) {
+return OMPI_ERR_OUT_OF_RESOURCE;
+}
+opal_cuda_memcpy_sync(sbuf1, sbuf, bufsize);
+sbuf2 = sbuf; /* save away original buffer */
+sbuf = sbuf1 - lb;
+}
+
+if (opal_cuda_check_bufs(rbuf, NULL)) {
+rbuf1 = (char*)malloc(bufsize);
+if (NULL == rbuf1) {
+if (NULL != sbuf1) free(sbuf1);
+return OMPI_ERR_OUT_OF_RESOURCE;
+}
+opal_cuda_memcpy_sync(rbuf1, rbuf, bufsize);
+rbuf2 = rbuf; /* save away original buffer */
+rbuf = rbuf1 - lb;
+}
+rc = s->c_coll.coll_allreduce(sbuf, rbuf, count, dtype, op, comm, s->c_coll.coll_allreduce_module);
+if (NULL != sbuf1) {
+free(sbuf1);
+}
+if (NULL != rbuf1) {
+rbuf = rbuf2;
+opal_cuda_memcpy_sync(rbuf, rbuf1, bufsize);
+free(rbuf1);
+}
+return rc;
+}
+
Index: ompi/mca/coll/cuda/coll_cuda_exscan.c
===
--- ompi/mca/coll/cuda/coll_cuda_exscan.c	(revision 0)
+++ ompi/mca/coll/cuda/coll_cuda_exscan.c	(revision 0)
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2014  The University of 

[OMPI devel] Minutes of Open MPI ConCall Meeting - Tuesday, May 13, 2014

2014-05-13 Thread Rolf vandeVaart
Open MPI 1.6:

-  Release was waiting on 
https://svn.open-mpi.org/trac/ompi/ticket/3079 but during meeting we decided it 
was not necessary.  Therefore, Jeff will go ahead and roll Open MPI 1.6.6 RC1.
Open MPI 1.8:

-  Several tickets have been applied.  Some discussion about other 
tickets but details are too numerous to catch here.

-  Still having issues with 0 sized messages and MPI_Alltoallw.  I 
think this is being tracked with ticket 
https://svn.open-mpi.org/trac/ompi/ticket/4506.  Jeff will poke a few folks to 
get things moving for a fix of that issue.

-  Still leaking in some component.  Nathan looking at that issue and 
hopes to have fix soon.

-  Encourage everyone to review their CMRs and change owner after 
review is done.

Other:

RFC: autogen.sh removal is approved.  Bye, bye autogen.sh

Round Table



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Rolf vandeVaart
I tried this.  However, 23 bytes is too small so I added the 23 to the 56 (79) 
required for the PML header.  I do not get the error.

mpirun -host host0,host1 -np 2 --mca btl self,tcp --mca btl_tcp_flags 3 --mca 
btl_tcp_rndv_eager_limit 23 --mca btl_tcp_eager_limit 23 --mca 
btl_tcp_max_send_size 23 MPI_Isend_ator_c
*** An error occurred in MPI_Init
The "eager limit" MCA parameter in the tcp BTL was set to a value which
is too low for Open MPI to function properly.  Please re-run your job
with a higher eager limit value for this BTL; the exact MCA parameter
name and its corresponding minimum value is shown below.

  Local host:  host0
  BTL name:tcp
  BTL eager limit value:   23 (set via btl_tcp_eager_limit)
  BTL eager limit minimum: 56
  MCA parameter name:  btl_tcp_eager_limit 
--

mpirun -host host0,host1 -np 2 --mca btl self,tcp --mca btl_tcp_flags 3 --mca 
btl_tcp_rndv_eager_limit 79 --mca btl_tcp_eager_limit 79 --mca 
btl_tcp_max_send_size 79 MPI_Isend_ator_c
MPITEST info  (0): Starting MPI_Isend_ator: All Isend TO Root test
MPITEST info  (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
MPITEST info  (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
MPITEST info  (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
MPITEST_results: MPI_Isend_ator: All Isend TO Root all tests PASSED (3744)


From: devel [devel-boun...@open-mpi.org] On Behalf Of George Bosilca 
[bosi...@icl.utk.edu]
Sent: Wednesday, May 07, 2014 1:23 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] regression with derived datatypes

Strange. The outcome and the timing of this issue seems to highlight a link 
with the other datatype-related issue you reported earlier, and as suggested by 
Ralph with Gilles scif+vader issue.

Generally speaking, the mechanism used to split the data in the case of 
multiple BTLs, is identical to the one used to split the data in fragments. So, 
if the culprit is in the splitting logic, one might see some weirdness as soon 
as we force the exclusive usage of the send protocol, with an unconventional 
fragment size.

In other words using the following flags “—mca btl tcp,self —mca btl_tcp_flags 
3 —mca btl_tcp_rndv_eager_limit 23 —mca btl_tcp_eager_limit 23 —mca 
btl_tcp_max_send_size 23” should always transfer wrong data, even when only one 
single BTL is in play.

  George.

On May 7, 2014, at 13:11 , Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

OK.  So, I investigated a little more.  I only see the issue when I am running 
with multiple ports enabled such that I have two openib BTLs instantiated.  In 
addition, large message RDMA has to be enabled.  If those conditions are not 
met, then I do not see the problem.  For example:
FAILS:
>  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1,mlx5_0:2 
> –mca btl_openib_flags 3 MPI_Isend_ator_c
PASS:
>  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1 –mca 
> btl_openib_flags 3 MPI_Isend_ator_c
>  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include_mlx5:0:1,mlx5_0:2 
> –mca btl_openib_flags 1 MPI_Isend_ator_c

So we must have some type of issue when we break up the message between the two 
openib BTLs.  Maybe someone else can confirm my observations?
I was testing against the latest trunk.

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd
Sent: Wednesday, May 07, 2014 10:48 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] regression with derived datatypes

Rolf,
This was run on a Sandy Bridge system with ConnectX-3 cards.
Josh

On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd 
<jladd.m...@gmail.com<mailto:jladd.m...@gmail.com>> wrote:
Elena, can you run your reproducer on the trunk, please, and see if the problem 
persists?
Josh

On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
On May 7, 2014, at 10:03 AM, Elena Elkina 
<elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote:

> Yes, this commit is also in the trunk.
Yes, I understand that -- my question is: is this same *behavior* happening on 
the trunk.  I.e., is there some other effect on the trunk that is causing the 
bad behavior to not occur?

> Best,
> Elena
>
>
> On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) 
> <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
> Is this also happening on the trunk?
>
>
> Sent from my phone. No type good.
>
> On May 7, 2014, at 9:44 AM, "Elena Elkina" 
> <elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote:
>
>> Sorry,
>>
>> Fixes #4501: Datatype unpack code produces incorrect results in some case
>>
>> ---svn-p

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Rolf vandeVaart
OK.  So, I investigated a little more.  I only see the issue when I am running 
with multiple ports enabled such that I have two openib BTLs instantiated.  In 
addition, large message RDMA has to be enabled.  If those conditions are not 
met, then I do not see the problem.  For example:
FAILS:

Ø  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1,mlx5_0:2 
–mca btl_openib_flags 3 MPI_Isend_ator_c
PASS:

Ø  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1 –mca 
btl_openib_flags 3 MPI_Isend_ator_c

Ø  mpirun –np 2 –host host1,host2 –mca btl_openib_if_include_mlx5:0:1,mlx5_0:2 
–mca btl_openib_flags 1 MPI_Isend_ator_c

So we must have some type of issue when we break up the message between the two 
openib BTLs.  Maybe someone else can confirm my observations?
I was testing against the latest trunk.

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd
Sent: Wednesday, May 07, 2014 10:48 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] regression with derived datatypes

Rolf,
This was run on a Sandy Bridge system with ConnectX-3 cards.
Josh

On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd 
> wrote:
Elena, can you run your reproducer on the trunk, please, and see if the problem 
persists?
Josh

On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) 
> wrote:
On May 7, 2014, at 10:03 AM, Elena Elkina 
> wrote:

> Yes, this commit is also in the trunk.
Yes, I understand that -- my question is: is this same *behavior* happening on 
the trunk.  I.e., is there some other effect on the trunk that is causing the 
bad behavior to not occur?

> Best,
> Elena
>
>
> On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) 
> > wrote:
> Is this also happening on the trunk?
>
>
> Sent from my phone. No type good.
>
> On May 7, 2014, at 9:44 AM, "Elena Elkina" 
> > wrote:
>
>> Sorry,
>>
>> Fixes #4501: Datatype unpack code produces incorrect results in some case
>>
>> ---svn-pre-commit-ignore-below---
>>
>> r31370 [[BR]]
>> Reshape all the packing/unpacking functions to use the same skeleton. 
>> Rewrite the
>> generic_unpacking to take advantage of the same capabilitites.
>>
>> r31380 [[BR]]
>> Remove a non-necessary label.
>>
>> r31387 [[BR]]
>> Correctly save the displacement for the case where the convertor is not
>> completed. As we need to have the right displacement at the beginning
>> of the next call, we should save the position relative to the beginning
>> of the buffer and not to the last loop.
>>
>> Best regards,
>> Elena
>>
>>
>> On Wed, May 7, 2014 at 5:43 PM, Jeff Squyres (jsquyres) 
>> > wrote:
>> Can you cite the branch and SVN r number?
>>
>> Sent from my phone. No type good.
>>
>> > On May 7, 2014, at 9:24 AM, "Elena Elkina" 
>> > > wrote:
>> >
>> > b531973419a056696e6f88d813769aa4f1f1aee6
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14701.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14702.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14703.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14704.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14706.php



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended 

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Rolf vandeVaart
This seems similar to what I reported on a different thread.

http://www.open-mpi.org/community/lists/devel/2014/05/14688.php

I need to try and reproduce again.  Elena, what kind of cluster were your 
running on?

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Elena Elkina
Sent: Wednesday, May 07, 2014 10:04 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] regression with derived datatypes

Yes, this commit is also in the trunk.

Best,
Elena

On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) 
> wrote:
Is this also happening on the trunk?


Sent from my phone. No type good.

On May 7, 2014, at 9:44 AM, "Elena Elkina" 
> wrote:
Sorry,

Fixes #4501: Datatype unpack code produces incorrect results in some case

---svn-pre-commit-ignore-below---

r31370 [[BR]]
Reshape all the packing/unpacking functions to use the same skeleton. Rewrite 
the
generic_unpacking to take advantage of the same capabilitites.

r31380 [[BR]]
Remove a non-necessary label.

r31387 [[BR]]
Correctly save the displacement for the case where the convertor is not
completed. As we need to have the right displacement at the beginning
of the next call, we should save the position relative to the beginning
of the buffer and not to the last loop.

Best regards,
Elena

On Wed, May 7, 2014 at 5:43 PM, Jeff Squyres (jsquyres) 
> wrote:
Can you cite the branch and SVN r number?

Sent from my phone. No type good.

> On May 7, 2014, at 9:24 AM, "Elena Elkina" 
> > wrote:
>
> b531973419a056696e6f88d813769aa4f1f1aee6
___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14701.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14702.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14703.php


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Possible bug with derived datatypes and openib BTL in trunk

2014-04-17 Thread Rolf vandeVaart
I sent this information to George off the mailing list since the attachment was 
somewhat large.
Still strange that I guess I am the only one that sees this.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Wednesday, April 16, 2014 4:24 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Possible bug with derived datatypes and openib
>BTL in trunk
>
>Rolf,
>
>I didn't see these on my check run. Can you run the MPI_Isend_ator test with
>mpi_ddt_pack_debug and mpi_ddt_unpack_debug set to 1. I would be
>interested in the output you get on your machine.
>
>George.
>
>
>On Apr 16, 2014, at 14:34 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
>> I have seen errors when running the intel test suite using the openib BTL
>when transferring derived datatypes.  I do not see the error with sm or tcp
>BTLs.  The errors begin after this checkin.
>>
>> https://svn.open-mpi.org/trac/ompi/changeset/31370
>> Timestamp: 04/11/14 16:06:56 (5 days ago)
>> Author: bosilca
>> Message: Reshape all the packing/unpacking functions to use the same
>> skeleton. Rewrite the generic_unpacking to take advantage of the same
>capabilitites.
>>
>> Does anyone else see errors?  Here is an example running with r31370:
>>
>> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2
>> -host drossetti-ivy0,drossetti-ivy1 --mca
>> btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c MPITEST error
>> (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST
>> error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
>> MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2
>> commtype -10 data_type 13 root 1 MPITEST error (1): libmpitest.c:1608
>> i=117, int32_t value=-1, expected 117 MPITEST error (1):
>> libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST error
>> (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16
>> data_type 13 root 1 MPITEST info  (0): Starting MPI_Isend_ator: All
>> Isend TO Root test MPITEST info  (0): Node spec
>> MPITEST_comm_sizes[6]=2 too large, using 1 MPITEST info  (0): Node
>> spec MPITEST_comm_sizes[22]=2 too large, using 1 MPITEST info  (0):
>> Node spec MPITEST_comm_sizes[32]=2 too large, using 1 MPITEST error
>> (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118 MPITEST
>> error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
>> MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2
>> commtype -10 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608
>> i=117, int32_t value=-1, expected 118 MPITEST error (0):
>> libmpitest.c:1578 i=195, char value=-1, expected -60 MPITEST error
>> (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16
>> data_type 13 root 0 MPITEST error (1): libmpitest.c:1608 i=117,
>> int32_t value=-1, expected 117 MPITEST error (1): libmpitest.c:1578
>> i=195, char value=-1, expected -61 MPITEST error (1): 2 errors in
>> buffer (17,4,12) len 273 commsize 2 commtype -13 data_type 13 root 1
>> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected
>> 118 MPITEST error (0): libmpitest.c:1578 i=195, char value=-1,
>> expected -60 MPITEST error (0): 2 errors in buffer (17,4,12) len 273
>> commsize 2 commtype -13 data_type 13 root 0 MPITEST error (1):
>> libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST error
>> (1): libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST
>> error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype
>> -15 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608 i=117,
>> int32_t value=-1, expected 117 MPITEST error (0): libmpitest.c:1578
>> i=195, char value=-1, expected -61 MPITEST error (0): 2 errors in
>> buffer (17,6,12) len 273 commsize 2 commtype -15 data_type 13 root 0
>> MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of
>> 3744)
>> ---
>> Primary job  terminated normally, but 1 process returned a non-zero
>> exit code.. Per user-direction, the job has been aborted.
>> ---
>> --
>>  mpirun detected that one or more processes exited with non-zero
>> status, thus causing the job to be terminated. The first process to do
>> so was:
>>
>>  Process name: [[12363,1],0]
>>  Exit code:4
>> --
>> 
>> [rvandevaart@drossetti-ivy1 

[OMPI devel] Possible bug with derived datatypes and openib BTL in trunk

2014-04-16 Thread Rolf vandeVaart
I have seen errors when running the intel test suite using the openib BTL when 
transferring derived datatypes.  I do not see the error with sm or tcp BTLs.  
The errors begin after this checkin.

https://svn.open-mpi.org/trac/ompi/changeset/31370
Timestamp: 04/11/14 16:06:56 (5 days ago)
Author: bosilca
Message: Reshape all the packing/unpacking functions to use the same skeleton. 
Rewrite the
generic_unpacking to take advantage of the same capabilitites.

Does anyone else see errors?  Here is an example running with r31370:

[rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 -host 
drossetti-ivy0,drossetti-ivy1 --mca btl_openib_warn_default_gid_prefix 0 
MPI_Isend_ator_c
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2 commtype -10 
data_type 13 root 1
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 
data_type 13 root 1
MPITEST info  (0): Starting MPI_Isend_ator: All Isend TO Root test
MPITEST info  (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
MPITEST info  (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
MPITEST info  (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2 commtype -10 
data_type 13 root 0
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
MPITEST error (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 
data_type 13 root 0
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,4,12) len 273 commsize 2 commtype -13 
data_type 13 root 1
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
MPITEST error (0): 2 errors in buffer (17,4,12) len 273 commsize 2 commtype -13 
data_type 13 root 0
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype -15 
data_type 13 root 0
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (0): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype -15 
data_type 13 root 0
MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of 3744)
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[12363,1],0]
  Exit code:4
--
[rvandevaart@drossetti-ivy1 src]$ 


Here is an error with the trunk which is slightly different.
[rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 -host 
drossetti-ivy0,drossetti-ivy1 --mca btl_openib_warn_default_gid_prefix 0 
MPI_Isend_ator_c
[drossetti-ivy1.nvidia.com:22875] 
../../../opal/datatype/opal_datatype_position.c:72
Pointer 0x1ad414c size 4 is outside [0x1ac1d20,0x1ad1d08] for
base ptr 0x1ac1d20 count 273 and data 
[drossetti-ivy1.nvidia.com:22875] Datatype 0x1ac0220[] size 104 align 16 id 0 
length 22 used 21
true_lb 0 true_ub 232 (true_extent 232) lb 0 ub 240 (extent 240)
nbElems 21 loops 0 flags 1C4 (commited )-c--lu-GD--[---][---]
   contain lb ub OPAL_LB OPAL_UB OPAL_INT1 OPAL_INT2 OPAL_INT4 OPAL_INT8 
OPAL_UINT1 OPAL_UINT2 OPAL_UINT4 OPAL_UINT8 OPAL_FLOAT4 OPAL_FLOAT8 
OPAL_FLOAT16 
--C---P-D--[---][---]  OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4)
--C---P-D--[---][---]  OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2)
--C---P-D--[---][---]  OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8)
--C---P-D--[---][---] OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2)
--C---P-D--[---][---] OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4)
--C---P-D--[---][---] OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8)
--C---P-D--[---][---]OPAL_FLOAT4 count 1 disp 0x40 (64) 

Re: [OMPI devel] 1-question developer poll

2014-04-16 Thread Rolf vandeVaart
SVN

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
>Hjelm
>Sent: Wednesday, April 16, 2014 10:35 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] 1-question developer poll
>
>* PGP Signed by an unknown key
>
>Git
>
>On Wed, Apr 16, 2014 at 10:32:10AM +, Jeff Squyres (jsquyres) wrote:
>> What source code repository technology(ies) do you use for Open MPI
>development? (indicate all that apply)
>>
>> - SVN
>> - Mercurial
>> - Git
>>
>> I ask this question because there's serious discussions afoot to switch
>OMPI's main SVN repo to Git, and I want to get a feel for the current
>landscape out there.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-
>mpi.org/community/lists/devel/2014/04/14537.php
>
>* Unknown Key
>* 0x9AC22B15
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Rolf vandeVaart
I am still seeing the same issue where I get some type of segv unless I disable 
the coll ml component.  This may be an issue at my end, but just thought I 
would double check that we are sure this is fixed.
Thanks,
Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Hjelm,
>Nathan T
>Sent: Tuesday, March 04, 2014 2:34 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different
>command lines.
>
>There was a rounding issue in basesmuma. If the control data happened to be
>less than a page then we were trying to allocate 0 bytes. It should be fixed on
>the trunk and has been CMR'ed to 1.7.5
>
>-Nathan
>
>Please excuse the horrible Outlook-style quoting. OWA sucks.
>
>
>From: devel [devel-boun...@open-mpi.org] on behalf of Mike Dubman
>[mi...@dev.mellanox.co.il]
>Sent: Tuesday, March 04, 2014 7:04 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different
>command lines.
>
>Hi,
>
>coll/hcoll is Mellanox driven collective package.
>coll/ml is managed/supported/developed by ORNL folks.
>
>
>On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain mpi.org> wrote:
>Ummm...the "ml" stands for Mellanox. This is a component you folks
>contributed at some time. IIRC, the hcoll and/or bcol are meant to replace it,
>but you folks would know best what to do with it.
>
>
>
>On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina
>> wrote:
>Hi,
>
>Recently I often meet hangs and seg faults with different command lines and
>there are "ml" functions in the stack trace.
>When I just turn "ml" off by do -mca coll ^ml, problems disappear.
>For example,
>oshrun -np 4 --map-by node --display-map  ./ring_oshmem fails with seg fault
>while oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
>passes.
>
>The "ml" priority is low (27), but it could have issues during comm_query (it
>does all initialization staff there).
>
>"Ml" is unreliable component. So It may be reasonable do not to build this
>component by default to avoid such problems.
>
>What do you think?
>
>Best regards,
>Elena
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/date.php
>
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/date.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/index.php
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] RFC: Add two new verbose outputs to BML layer

2014-03-03 Thread Rolf vandeVaart
WHAT: Add two new verbose outputs to BML layer

WHY: There are times that I really want to know which BTLs are being used.  
These verbose outputs can help with that.

WHERE: ompi/mca/bml/r2/bml_r2.c

TIMEOUT: COB Friday, 7 March 2014

MORE DETAIL: I have run into some cases where I have added to add some 
opal_outputs to figure out what is going on with respect to which BTLs are 
selected.  I thought it would be nice to make it part of the verbose output.  
The entire change is below.

Index: ompi/mca/bml/r2/bml_r2.c
===
--- ompi/mca/bml/r2/bml_r2.c  (revision 30911)
+++ ompi/mca/bml/r2/bml_r2.c   (working copy)
@@ -14,6 +14,7 @@
  * reserved.
  * Copyright (c) 2008-2009 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2013  Intel, Inc. All rights reserved
+ * Copyright (c) 2014  NVIDIA Corporation.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -250,10 +251,24 @@
 /* skip this btl if the exclusivity is less than the 
previous */
 if(bml_btl->btl->btl_exclusivity > btl->btl_exclusivity) {
 btl->btl_del_procs(btl, 1, , _endpoints[p]);
+opal_output_verbose(20, 
ompi_bml_base_framework.framework_output,
+"mca: bml: Not using %s btl to %s 
on node %s "
+"because %s btl has higher 
exclusivity (%d > %d)",
+
btl->btl_component->btl_version.mca_component_name,
+OMPI_NAME_PRINT(>proc_name), 
proc->proc_hostname,
+
bml_btl->btl->btl_component->btl_version.mca_component_name,
+bml_btl->btl->btl_exclusivity,
+btl->btl_exclusivity);
 continue;
 }
 }

+opal_output_verbose(5, 
ompi_bml_base_framework.framework_output,
+"mca: bml: Using %s btl to %s on node %s",
+
btl->btl_component->btl_version.mca_component_name,
+OMPI_NAME_PRINT(>proc_name),
+proc->proc_hostname);
+
 /* cache the endpoint on the proc */
 bml_btl = 
mca_bml_base_btl_array_insert(_endpoint->btl_send);
 bml_btl->btl = btl;





---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Rolf vandeVaart
It could.  I added that argument 4 years ago to support by my failover work 
with the BFO.  It was a way for a BTL to pass some type of string back to the 
PML telling the PML who it was for verbose output to understand what was 
happening. 

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
>(jsquyres)
>Sent: Thursday, February 27, 2014 4:22 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in
>trunk/ompi/mca: btl/usnic rte
>
>Speaking of which, shouldn't the OB1 error handler send the error message
>string that it received as the 4th param to ompi_rte_abort() so that it can be
>printed out?
>
>
>Index: ompi/mca/pml/ob1/pml_ob1.c
>===
>
>--- ompi/mca/pml/ob1/pml_ob1.c (revision 30877)
>+++ ompi/mca/pml/ob1/pml_ob1.c (working copy)
>@@ -780,7 +780,7 @@
> return;
> }
> #endif /* OPAL_CUDA_SUPPORT */
>-ompi_rte_abort(-1, NULL);
>+ompi_rte_abort(-1, btlinfo);
> }
>
> #if OPAL_ENABLE_FT_CR== 0
>
>
>
>On Feb 27, 2014, at 1:12 PM, Jeff Squyres (jsquyres) 
>wrote:
>
>> FWIW, the following BTLs all have calls to abort() or ompi_rte_abort() within
>them:
>>
>> - usnic
>> - openib
>> - portals4
>> - the btl base itself
>>
>>
>> On Feb 27, 2014, at 7:16 AM, Ralph Castain  wrote:
>>
 The majority of places we call abort in this commit is actually down in a
>progress thread.  We didn't think it was safe to call the PML error function 
>in a
>progress thread -- is that incorrect?
>>>
>>> If not, then we probably should create some mechanism for doing so. I
>agree with George that we shouldn't call abort inside a library
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>--
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Rolf vandeVaart
I have tracked this down.  There is a missing commit that affects 
ompi_mpi_init.c causing it to initialize bml twice.
Ralph, can you apply r30310 to 1.7?

Thanks,
Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Monday, February 10, 2014 12:29 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] 1.7.5 fails on simple test

I have seen this same issue although my core dump is a little bit different.  I 
am running with tcp,self.  The first entry in the list of BTLs is garbage, but 
then there is tcp and self in the list.   Strange.  This is my core dump.  Line 
208 in bml_r2.c is where I get the SEGV.

Program terminated with signal 11, Segmentation fault.
#0  0x7fb6dec981d0 in ?? ()
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.107.el6_4.5.x86_64
(gdb) where
#0  0x7fb6dec981d0 in ?? ()
#1  
#2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
#4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
#5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
requested=0, provided=0x7fff80487cc8)
at ../../ompi/runtime/ompi_mpi_init.c:776
#6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) 
at pinit.c:84
#7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
MPI_Isend_ator_c.c:143
(gdb)
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
btl_endpoints, reachable);
(gdb) print *btl
$1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
btl_rndv_eager_limit = 140423556235000,
  btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
140423556235016,
  btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 
140423556235032,
  btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
3895459624, btl_flags = 32694,
  btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
<main_arena+184>,
  btl_del_procs = 0x7fb6e82fff38 <main_arena+184>, btl_register = 
0x7fb6e82fff48 <main_arena+200>,
  btl_finalize = 0x7fb6e82fff48 <main_arena+200>, btl_alloc = 0x7fb6e82fff58 
<main_arena+216>,
  btl_free = 0x7fb6e82fff58 <main_arena+216>, btl_prepare_src = 0x7fb6e82fff68 
<main_arena+232>,
  btl_prepare_dst = 0x7fb6e82fff68 <main_arena+232>, btl_send = 0x7fb6e82fff78 
<main_arena+248>,
  btl_sendi = 0x7fb6e82fff78 <main_arena+248>, btl_put = 0x7fb6e82fff88 
<main_arena+264>,
  btl_get = 0x7fb6e82fff88 <main_arena+264>, btl_dump = 0x7fb6e82fff98 
<main_arena+280>,
  btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
<main_arena+296>,
  btl_ft_event = 0x7fb6e82fffa8 <main_arena+296>}
(gdb)


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
Sent: Monday, February 10, 2014 4:23 AM
To: Open MPI Developers
Subject: [OMPI devel] 1.7.5 fails on simple test






$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
 -np 8 -mca pml ob1 -mca btl self,tcp 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi

[vegas12:12724] *** Process received signal ***

[vegas12:12724] Signal: Segmentation fault (11)

[vegas12:12724] Signal code:  (128)

[vegas12:12724] Failing at address: (nil)

[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]

[vegas12:12724] [ 1] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]

[vegas12:12724] [ 2] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]

[vegas12:12724] [ 3] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]

[vegas12:12724] [ 4] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]

[vegas12:12724] [ 5] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]

[vegas12:12724] [ 6] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]

[vegas12:12724] [ 7] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]

[vegas12:12724] [ 8] 
/scrap/jenkins/scrap/workspace/hp

Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Rolf vandeVaart
I have seen this same issue although my core dump is a little bit different.  I 
am running with tcp,self.  The first entry in the list of BTLs is garbage, but 
then there is tcp and self in the list.   Strange.  This is my core dump.  Line 
208 in bml_r2.c is where I get the SEGV.

Program terminated with signal 11, Segmentation fault.
#0  0x7fb6dec981d0 in ?? ()
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.107.el6_4.5.x86_64
(gdb) where
#0  0x7fb6dec981d0 in ?? ()
#1  
#2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
#4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
#5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
requested=0, provided=0x7fff80487cc8)
at ../../ompi/runtime/ompi_mpi_init.c:776
#6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) 
at pinit.c:84
#7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
MPI_Isend_ator_c.c:143
(gdb)
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
btl_endpoints, reachable);
(gdb) print *btl
$1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
btl_rndv_eager_limit = 140423556235000,
  btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
140423556235016,
  btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 
140423556235032,
  btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
3895459624, btl_flags = 32694,
  btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
,
  btl_del_procs = 0x7fb6e82fff38 , btl_register = 
0x7fb6e82fff48 ,
  btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 
,
  btl_free = 0x7fb6e82fff58 , btl_prepare_src = 0x7fb6e82fff68 
,
  btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 0x7fb6e82fff78 
,
  btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 
,
  btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 
,
  btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
,
  btl_ft_event = 0x7fb6e82fffa8 }
(gdb)


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
Sent: Monday, February 10, 2014 4:23 AM
To: Open MPI Developers
Subject: [OMPI devel] 1.7.5 fails on simple test






$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
 -np 8 -mca pml ob1 -mca btl self,tcp 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi

[vegas12:12724] *** Process received signal ***

[vegas12:12724] Signal: Segmentation fault (11)

[vegas12:12724] Signal code:  (128)

[vegas12:12724] Failing at address: (nil)

[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]

[vegas12:12724] [ 1] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]

[vegas12:12724] [ 2] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]

[vegas12:12724] [ 3] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]

[vegas12:12724] [ 4] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]

[vegas12:12724] [ 5] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]

[vegas12:12724] [ 6] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]

[vegas12:12724] [ 7] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]

[vegas12:12724] [ 8] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210]

[vegas12:12724] [ 9] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25]

[vegas12:12724] [10] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]

[vegas12:12724] [11] 

Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Rolf vandeVaart
I ran mpirun through valgrind and I got some strange complaints about an issue 
with thread 2.  I hunted around mpirun code and I see that we start a thread, 
but we never have it finish during shutdown.  Therefore, I added this snippet 
of code (probably in the wrong place) and I no longer see my intermittent 
crashes.

Ralph, what do you think?  Does this seem reasonable?

Rolf

[rvandevaart@drossetti-ivy0 ompi-v1.7]$ svn diff
Index: orte/mca/oob/tcp/oob_tcp_component.c
===
--- orte/mca/oob/tcp/oob_tcp_component.c(revision 30500)
+++ orte/mca/oob/tcp/oob_tcp_component.c(working copy)
@@ -631,6 +631,10 @@
 opal_output_verbose(2, orte_oob_base_framework.framework_output,
 "%s TCP SHUTDOWN",
 ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
+if (ORTE_PROC_IS_HNP) {
+mca_oob_tcp_component.listen_thread_active = 0;
+opal_thread_join(_oob_tcp_component.listen_thread, NULL);
+}
 
 while (NULL != (item = 
opal_list_remove_first(_oob_tcp_component.listeners))) {
 OBJ_RELEASE(item);


>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>Castain
>Sent: Thursday, January 30, 2014 12:35 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Intermittent mpirun crash?
>
>That option might explain why your test process is failing (which segfaulted as
>well), but obviously wouldn't have anything to do with mpirun
>
>On Jan 30, 2014, at 9:29 AM, Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>
>> I just retested with --mca mpi_leave_pinned 0 and that made no difference.
>I still see the mpirun crash.
>>
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>>> Bosilca
>>> Sent: Thursday, January 30, 2014 11:59 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] Intermittent mpirun crash?
>>>
>>> I got something similar 2 days ago, with a large software package
>>> abusing of MPI_Waitany/MPI_Waitsome (that was working seamlessly a
>>> month ago). I had to find a quick fix. Upon figuring out that turning
>>> the leave_pinned off fixes the problem, I did not investigate any further.
>>>
>>> Do you see a similar behavior?
>>>
>>> George.
>>>
>>> On Jan 30, 2014, at 17:26 , Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>>>
>>>> I am seeing this happening to me very intermittently.  Looks like
>>>> mpirun is
>>> getting a SEGV.  Is anyone else seeing this?
>>>> This is 1.7.4 built yesterday.  (Note that I added some stuff to
>>>> what is being printed out so the message is slightly different than
>>>> 1.7.4
>>>> output)
>>>>
>>>> mpirun - -np 6 -host
>>>> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
>>>> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
>>>> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome
>>>> using two nodes
>>>> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two
>>>> nodes all tests PASSED (742) [drossetti-ivy0:10353] *** Process
>>>> (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
>>>> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
>>>> not mapped (1) [drossetti-ivy0:10353] Failing at address:
>>>> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information -
>>>> not sleeping
>>>> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
>>>> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
>>> tests/trunk/intel_tests'
>>>>
>>>> (gdb) where
>>>> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
>>>> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from
>>>> /lib64/libgcc_s.so.1
>>>> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
>>>> #3  0x7fd320b0d622 in opal_backtrace_buffer
>>> (message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
>>>>   at
>>>> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
>>>> #4  0x7fd320b0a794 in show_stackframe (signo=11,
>>>> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
>>>> ../../../opal/util/stacktrace.c:354
>>>> #5  
>>>> #6  0x7fd31e5f208d in ?? ()
>>>> #7  0x7fd31e5e46d8 in ?? ()
>>>> #8  0xc2a8 in ?? ()
>>>&

Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Rolf vandeVaart
I just retested with --mca mpi_leave_pinned 0 and that made no difference.  I 
still see the mpirun crash.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Thursday, January 30, 2014 11:59 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Intermittent mpirun crash?
>
>I got something similar 2 days ago, with a large software package abusing of
>MPI_Waitany/MPI_Waitsome (that was working seamlessly a month ago). I
>had to find a quick fix. Upon figuring out that turning the leave_pinned off
>fixes the problem, I did not investigate any further.
>
>Do you see a similar behavior?
>
>  George.
>
>On Jan 30, 2014, at 17:26 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
>> I am seeing this happening to me very intermittently.  Looks like mpirun is
>getting a SEGV.  Is anyone else seeing this?
>> This is 1.7.4 built yesterday.  (Note that I added some stuff to what
>> is being printed out so the message is slightly different than 1.7.4
>> output)
>>
>> mpirun - -np 6 -host
>> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
>> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
>> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome
>> using two nodes
>> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes
>> all tests PASSED (742) [drossetti-ivy0:10353] *** Process
>> (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
>> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
>> not mapped (1) [drossetti-ivy0:10353] Failing at address:
>> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information - not
>> sleeping
>> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
>> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
>tests/trunk/intel_tests'
>>
>> (gdb) where
>> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
>> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from
>> /lib64/libgcc_s.so.1
>> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
>> #3  0x7fd320b0d622 in opal_backtrace_buffer
>(message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
>>at
>> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
>> #4  0x7fd320b0a794 in show_stackframe (signo=11,
>> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
>> ../../../opal/util/stacktrace.c:354
>> #5  
>> #6  0x7fd31e5f208d in ?? ()
>> #7  0x7fd31e5e46d8 in ?? ()
>> #8  0xc2a8 in ?? ()
>> #9  0x in ?? ()
>>
>>
>> --
>> - This email message is for the sole use of the intended
>> recipient(s) and may contain confidential information.  Any
>> unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> --
>> - ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel


[OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Rolf vandeVaart
I am seeing this happening to me very intermittently.  Looks like mpirun is 
getting a SEGV.  Is anyone else seeing this?
This is 1.7.4 built yesterday.  (Note that I added some stuff to what is being 
printed out so the message is slightly different than 1.7.4 output)

mpirun - -np 6 -host 
drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca 
btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome using two 
nodes
MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes all tests 
PASSED (742)
[drossetti-ivy0:10353] *** Process (mpirun)received signal ***
[drossetti-ivy0:10353] Signal: Segmentation fault (11)
[drossetti-ivy0:10353] Signal code: Address not mapped (1)
[drossetti-ivy0:10353] Failing at address: 0x7fd31e5f208d
[drossetti-ivy0:10353] End of signal information - not sleeping
gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
gmake[1]: Leaving directory 
`/geppetto/home/rvandevaart/public/ompi-tests/trunk/intel_tests'

(gdb) where
#0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
#1  0x7fd31f6210b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
#3  0x7fd320b0d622 in opal_backtrace_buffer (message_out=0x7fd31e5e33a0, 
len_out=0x7fd31e5e33ac)
at ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
#4  0x7fd320b0a794 in show_stackframe (signo=11, info=0x7fd31e5e3930, 
p=0x7fd31e5e3800) at ../../../opal/util/stacktrace.c:354
#5  
#6  0x7fd31e5f208d in ?? ()
#7  0x7fd31e5e46d8 in ?? ()
#8  0xc2a8 in ?? ()
#9  0x in ?? ()


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] 1.7.4 status update

2014-01-22 Thread Rolf vandeVaart
Hi Ralph:
In my opinion, we still try to get to a stable 1.7.4.  I think we can just keep 
the bar high (as you said in the meeting) about what types of fixes need to get 
into 1.7.4.  I have been telling folks 1.7.4 would be ready "really soon" so 
the idea of folding in 1.7.5 CMRs and delaying it is less desirable to me.

Can you remind me again about why the 1.8.0 by mid-March is a requirement?

Thanks,
Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>Castain
>Sent: Tuesday, January 21, 2014 6:41 PM
>To: Open MPI Developers
>Subject: [OMPI devel] 1.7.4 status update
>
>Hi folks
>
>I think it is safe to say that we are not going to get a release candidate out
>tonight - more Fortran problems have surfaced, along with the need to
>complete the ROMIO review. I have therefore concluded we cannot release
>1.7.4 this week. This leaves us with a couple of options:
>
>1. continue down this path, hopefully releasing 1.7.4 sometime next week,
>followed by 1.7.5 in the latter half of Feb. The risk here is that any further
>slippage in 1.7.4/5 means that we will not release it as we must roll 1.8.0 by
>mid-March. I'm not too concerned about most of those cmr's as they could be
>considered minor bug fixes and pushed to the 1.8 series, but it leaves
>oshmem potentially pushed into 1.9.0.
>
>2. "promote" all the 1.7.5 cmr's into 1.7.4 and just do a single release before
>ending the series. This eases the immediate schedule crunch, but means we
>will have to deal with all the bugs that surface when we destabilize the 1.7
>branch again.
>
>
>I'm open to suggestions. Please be prepared to discuss at next Tues telecon.
>Ralph
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] NUMA bug in openib BTL device selection

2014-01-10 Thread Rolf vandeVaart
I believe I found a bug in openib BTL and just want to see if folks agree with 
this.  When we are running on a NUMA node and we are bound to a CPU, we only 
ant to use the IB device that is closest to us.  However, I observed that we 
always used both devices regardless.  I believe there is a bug in computing the 
distances and the below change fixes it.   This was introduced with r26391 when 
we switched to using hwloc to determine distances.  It is a simple error where 
we are supposed to be accessing the array with i+j*size.

With this change, we will only use the IB devices that are close to us.

Any comments?  Otherwise, I will commit.

Rolf

Index: ompi/mca/btl/openib/btl_openib_component.c
===
--- ompi/mca/btl/openib/btl_openib_component.c  (revision 30175)
+++ ompi/mca/btl/openib/btl_openib_component.c  (working copy)
@@ -2202,10 +2202,10 @@
 if (NULL != my_obj) {
 /* Distance may be asymetrical, so calculate both of them
and take the max */
-a = hwloc_distances->latency[my_obj->logical_index *
+a = hwloc_distances->latency[my_obj->logical_index +
  (ibv_obj->logical_index * 
   hwloc_distances->nbobjs)];
-b = hwloc_distances->latency[ibv_obj->logical_index *
+b = hwloc_distances->latency[ibv_obj->logical_index +
  (my_obj->logical_index * 
   hwloc_distances->nbobjs)];
 distance = (a > b) ? a : b;
@@ -2224,10 +2224,10 @@
 ibv_obj->cpuset, 
 HWLOC_OBJ_NODE, 
++i)) {
 
-a = hwloc_distances->latency[node_obj->logical_index *
+a = hwloc_distances->latency[node_obj->logical_index +
  (ibv_obj->logical_index * 
   hwloc_distances->nbobjs)];
-b = hwloc_distances->latency[ibv_obj->logical_index *
+b = hwloc_distances->latency[ibv_obj->logical_index +
  (node_obj->logical_index * 
   hwloc_distances->nbobjs)];
 a = (a > b) ? a : b;
[rvandevaart@drossetti-ivy0 ompi-trunk-gpu-topo]$ 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] CUDA support not working?

2013-11-25 Thread Rolf vandeVaart
Let me know of any other issues you are seeing.  Ralph fixed the issue with ob1 
and we will move that into Open MPI 1.7.4.  
Not sure why I never saw that issue.  Will investigate some more.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jörg
>Bornschein
>Sent: Monday, November 25, 2013 7:41 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] CUDA support not working?
>
>On 25.11.2013, at 07:34, Solibakke Per Bjarte 
>wrote:
>
>> For version 1.7.3 and 1.7.4a1r29747 and CUDA-support..
>>
>> --with-cuda -with-hwloc -enable-shared and the use of -VT ] Running
>> without VT:
>> Error: /usr/local/lib/openmpi/mca_pml_obl.so: undefined
>> symbol: progress_one_cuda_htod_event
>>
>..
>> Suggestions for option specifications. I have followed the e-mail
>> correspondence between Jörg Bornschein, Ralph Castain
>>
>> I have changed the Makefile.am before ./configure according to
>> attached diff.pml However, nothing helps for neither 1.7.3 nor
>> 1.7.4a1r29747
>
>
>After patching Makefile.am, did you regenerate the automake/autoconf by
>running ./autogen.sh at the top-level?
>
>   j
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] MPIRUN error message after ./configure and sudo make all install...

2013-11-07 Thread Rolf vandeVaart
Solibakke:
I have not reproduced the issue, but I think I have an idea of what is 
happening.  What type of interconnect are you running over in this cluster?
Note that in the Open MPI 1.7.3 series, CUDA-aware support is only available 
within a node and between nodes using the verbs interface over Infiniband.

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 07, 2013 10:00 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] MPIRUN error message after ./configure and sudo make 
all install...

FWIW: I can never recall seeing someone use --enable-mca-dso...though I don't 
know if that is the source of the problem.

On Nov 7, 2013, at 6:00 AM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


Hello Solibakke:
Let me try and reproduce with your configure options.

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per 
Bjarte
Sent: Thursday, November 07, 2013 8:40 AM
To: 'de...@open-mpi.org<mailto:de...@open-mpi.org>'
Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make all 
install...

Hello
System with:
Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 
GPUs =1536 cores)

./configure -with-cuda -with-hwloc -enable-dlopen -enable-mca-dso 
-enable-shared -enable-vt -with-threads=posix -enable-mpi-thread-multiple 
-prefix=/usr/local

Works fine under installation:  ./configure and make, make install

Error message during mpirun -hostfile ./snp_mpi:

/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elrepror

Re: [OMPI devel] MPIRUN error message after ./configure and sudo make all install...

2013-11-07 Thread Rolf vandeVaart
Hello Solibakke:
Let me try and reproduce with your configure options.

Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per 
Bjarte
Sent: Thursday, November 07, 2013 8:40 AM
To: 'de...@open-mpi.org'
Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make all 
install...

Hello
System with:
Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 
GPUs =1536 cores)

./configure -with-cuda -with-hwloc -enable-dlopen -enable-mca-dso 
-enable-shared -enable-vt -with-threads=posix -enable-mpi-thread-multiple 
-prefix=/usr/local

Works fine under installation:  ./configure and make, make install

Error message during mpirun -hostfile ./snp_mpi:

/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event
/home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
progress_one_cuda_htod_event

Re: [OMPI devel] oshmem and CFLAGS removal

2013-10-31 Thread Rolf vandeVaart


>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
>(jsquyres)
>Sent: Thursday, October 31, 2013 4:12 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] oshmem and CFLAGS removal
>
>On Oct 31, 2013, at 3:46 PM, Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>
>> # Strip off problematic arguments
>> oshmem_CFLAGS="`echo $oshmem_CFLAGS | sed 's/-pedantic//g'`"
>> oshmem_CFLAGS="`echo $oshmem_CFLAGS | sed 's/-Wundef//g'`"
>> oshmem_CFLAGS="`echo $oshmem_CFLAGS | sed 's/-Wno-long-
>double//g'`"
>
>I think the solution is simple -- delete this line:
>
>> CFLAGS="$oshmem_CFLAGS"
>

Nope, it was not that simple.   With that change, the -pedantic and -Wundef end 
up in the CFLAGS for oshmem and I see all the warnings.
I will submit a ticket and give it to Joshua Ladd.  
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] oshmem and CFLAGS removal

2013-10-31 Thread Rolf vandeVaart
I noticed that there were some CFLAGS that were no longer set when enabling 
with --enable-picky for gcc.  Specifically, -Wundef and -pedantic  were no 
longer set.
This is not a problem for Open MPI 1.7.

I believe this is happening because of some code in the 
config/oshmem_configure_options.m4 file that is supposed to be oshmem specific, 
but seems to be bleeding into everything that gets compiled.

oshmem_CFLAGS="$CFLAGS"

# Strip off problematic arguments
oshmem_CFLAGS="`echo $oshmem_CFLAGS | sed 's/-pedantic//g'`"
oshmem_CFLAGS="`echo $oshmem_CFLAGS | sed 's/-Wundef//g'`"
oshmem_CFLAGS="`echo $oshmem_CFLAGS | sed 's/-Wno-long-double//g'`"
CFLAGS="$oshmem_CFLAGS"

Does anyone know an easy fix for this?  This is why I think some warnings 
appeared in Open MPI 1.7 that we did not see for the same change in the trunk.

Thanks,
Rolf
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Warnings in v1.7.4: rcache

2013-10-23 Thread Rolf vandeVaart
Yes, that is from one of my CMRs.  I always configure with -enable-picky  but 
that did not pick up this warning.
I will fix this in the trunk in the morning (watching the Red Sox right now :)) 
and then file CMR to bring over.
Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, October 23, 2013 7:19 PM
To: Open MPI Developers
Subject: [OMPI devel] Warnings in v1.7.4: rcache

One of the recent CMRs has created new warnings in v1.7.4:

rcache_vma.c: In function 'mca_rcache_vma_find':
rcache_vma.c:58:23: warning: pointer of type 'void *' used in arithmetic 
[-Wpedantic]
 bound_addr = addr + size - 1;
   ^
rcache_vma.c:58:30: warning: pointer of type 'void *' used in arithmetic 
[-Wpedantic]
 bound_addr = addr + size - 1;
  ^
rcache_vma.c: In function 'mca_rcache_vma_find_all':
rcache_vma.c:84:23: warning: pointer of type 'void *' used in arithmetic 
[-Wpedantic]
 bound_addr = addr + size - 1;
   ^
rcache_vma.c:84:30: warning: pointer of type 'void *' used in arithmetic 
[-Wpedantic]
 bound_addr = addr + size - 1;
  ^

Does someone know where these came from, and how to correct them?
Ralph

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] RFC: Add GPU Direct RDMA support to openib btl

2013-10-08 Thread Rolf vandeVaart
WHAT: Add GPU Direct RDMA support to openib btl
WHY: Better latency for small GPU message transfers
WHERE: Several files, see ticket for list
WHEN: Friday,  October 18, 2013 COB
More detail:
This RFC looks to make use of GPU Direct RDMA support that is coming in the 
future in Mellanox libraries.  With GPU Direct RDMA, we can register GPU memory 
with the ibv_reg_mr() calls.  Therefore, we are simply piggy backing on the 
large message RDMA support (RGET) that exists in the PML and openib BTL.  For 
best performance, we want to use the RGET protocol at small messages and the 
switch to a pipeline protocol at larger messages.

To make use of this, we add some extra code paths that are followed when moving 
GPU buffers.   If we have the support compiled in, then when we detect we have 
a GPU buffer, we use the RGET protocol even for small messages.   When the 
messages get larger, we switch to using the regular pipeline protocol.  There 
is some other support code that is added as well.  We add a flag to any GPU 
memory that is registered so we can check for cuMemAlloc/cuMemFree/cuMemAlloc 
issues.  Each GPU has a buffer ID associated with it, so we can ensure that any 
registrations in the rcache are still valid.

To view the changes, go to https://svn.open-mpi.org/trac/ompi/ticket/3836 and 
click on 
gdr.diff.



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] RFC: Remove alignment code from rcache

2013-09-18 Thread Rolf vandeVaart
I will wait another week on this since I know a lot of folks were traveling.  
Any input welcome.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Tuesday, September 10, 2013 2:46 PM
To: de...@open-mpi.org
Subject: [OMPI devel] RFC: Remove alignment code from rcache

WHAT: Remove alignment code from ompi/mca/rcache/vma module
WHY: Because it is redundant and causing problems for memory pools that want 
different alignment
WHERE: ompi/mca/rcache/vma/rcache_vma.c, 
ompi/mca/mpool/grdma/mpool_grdma_module.c (Detailed changes attached)
WHEN: Tuesday,  September 17, 2013 COB
More detail:
This RFC looks to remove the alignment code from the rcache as it seems 
unnecessary.  In all use cases in the library, alignment requirements are 
handled in the memory pool layer (or in the case of the vader btl, in the btl 
layer).  It seems more logical that the alignment is in the upper layer as that 
code is also where any registration restrictions would be known.  The rcache 
alignment code causes problems for me where I want to have different alignment 
requirements than the rcache is forcing on me.  (The rcache defaults to an 
alignment of mca_mpool_base_page_size_log=4K on my machine)  Therefore, I would 
like to make the change as attached to this email.

I have run through some tests and all seems OK.  Is there anything I am missing 
such that we need this  code in the rcache?

Thanks,
Rolf

[rvandevaart@sm064 ompi-trunk-tuesday]$ svn diff
Index: ompi/mca/rcache/vma/rcache_vma.c
===
--- ompi/mca/rcache/vma/rcache_vma.c (revision 29155)
+++ ompi/mca/rcache/vma/rcache_vma.c  (working copy)
@@ -48,15 +48,13 @@
 void* addr, size_t size, mca_mpool_base_registration_t **reg)  {
 int rc;
-void* base_addr;
-void* bound_addr;
+unsigned char* bound_addr;

 if(size == 0) {
 return OMPI_ERROR;
 }

-base_addr = down_align_addr(addr, mca_mpool_base_page_size_log);
-bound_addr = up_align_addr((void*) ((unsigned long) addr + size - 1), 
mca_mpool_base_page_size_log);
+bound_addr = addr + size - 1;

 /* Check to ensure that the cache is valid */
 if (OPAL_UNLIKELY(opal_memory_changed() && @@ -65,8 +63,8 @@
 return rc;
 }

-*reg = mca_rcache_vma_tree_find((mca_rcache_vma_module_t*)rcache, 
(unsigned char*)base_addr,
-(unsigned char*)bound_addr);
+*reg = mca_rcache_vma_tree_find((mca_rcache_vma_module_t*)rcache, 
(unsigned char*)addr,
+bound_addr);

 return OMPI_SUCCESS;
}
@@ -76,14 +74,13 @@
 int reg_cnt)
{
 int rc;
-void *base_addr, *bound_addr;
+unsigned char *bound_addr;

 if(size == 0) {
 return OMPI_ERROR;
 }

-base_addr = down_align_addr(addr, mca_mpool_base_page_size_log);
-bound_addr = up_align_addr((void*) ((unsigned long) addr + size - 1), 
mca_mpool_base_page_size_log);
+bound_addr = addr + size - 1;

 /* Check to ensure that the cache is valid */
 if (OPAL_UNLIKELY(opal_memory_changed() && @@ -93,7 +90,7 @@
 }

 return mca_rcache_vma_tree_find_all((mca_rcache_vma_module_t*)rcache,
-(unsigned char*)base_addr, (unsigned char*)bound_addr, regs,
+(unsigned char*)addr, bound_addr, regs,
 reg_cnt);
}

Index: ompi/mca/mpool/grdma/mpool_grdma_module.c
===
--- ompi/mca/mpool/grdma/mpool_grdma_module.c   (revision 29155)
+++ ompi/mca/mpool/grdma/mpool_grdma_module.c(working copy)
@@ -233,7 +233,7 @@
  * Persistent registration are always registered and placed in the cache */
 if(!(bypass_cache || persist)) {
 /* check to see if memory is registered */
-mpool->rcache->rcache_find(mpool->rcache, addr, size, reg);
+mpool->rcache->rcache_find(mpool->rcache, base, bound - base +
+ 1, reg);
 if (*reg && !(flags & MCA_MPOOL_FLAGS_INVALID)) {
 if (0 == (*reg)->ref_count) {
 /* Leave pinned must be set for this to still be in the 
rcache. */ @@ -346,7 +346,7 @@

 OPAL_THREAD_LOCK(>rcache->lock);

-rc = mpool->rcache->rcache_find(mpool->rcache, addr, size, reg);
+rc = mpool->rcache->rcache_find(mpool->rcache, base, bound - base +
+ 1, reg);
 if(NULL != *reg &&
 (mca_mpool_grdma_component.leave_pinned ||
  ((*reg)->flags & MCA_MPOOL_FLAGS_PERSIST) ||
[rvandevaart@sm064 ompi-trunk-tuesday]$

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-09-13 Thread Rolf vandeVaart
Yes, it appears the send_requests list is the one that is growing.  This list 
holds the send request structures that are in use.  After a send is completed, 
a send request is supposed to be returned to this list and then get re-used.

With 7 processes, it had reached a size of 16,324 send requests in use.  With 
the 8 processes, it had reached a size of 16,708.  Each send request is 720 
bytes (in debug build it is 872) and if we do the math we have consumed about 
12 Mbytes.

Setting some type of bound will not fix this issue.  There is something else 
going on here that is causing this problem.   I know you described the problem 
earlier on, but maybe you can explain again?  How many processes?  What type of 
cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you 
still see the problem.   Maybe someone else has suggestions too.

Rolf

PS: For those who missed a private email, I had Max add some instrumentation so 
we could see which list was growing.  We now know it is the 
mca_pml_base_send_requests list.

>-Original Message-
>From: Max Staufer [mailto:max.stau...@gmx.net]
>Sent: Friday, September 13, 2013 7:06 AM
>To: Rolf vandeVaart; de...@open-mpi.org
>Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list
>
>Hi Rolf,
>
>I applied your patch, the full output is rather big, even gzip > 10Mb, 
> which is
>not good for the mailinglist, but the head and tail are below for a 7 and 8
>processor run.
>Seem that the send requests are growing fast 4000 times in just 10 min.
>
>Do you now of a method to bound the list such that it is not growing excessivly
>?
>
>thanks
>
>Max
>
>7 Processor run
>--
>[gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236]
>Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
>Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
>Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
>env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gp

[OMPI devel] RFC: Remove alignment code from rcache

2013-09-10 Thread Rolf vandeVaart
WHAT: Remove alignment code from ompi/mca/rcache/vma module
WHY: Because it is redundant and causing problems for memory pools that want 
different alignment
WHERE: ompi/mca/rcache/vma/rcache_vma.c, 
ompi/mca/mpool/grdma/mpool_grdma_module.c (Detailed changes attached)
WHEN: Tuesday,  September 17, 2013 COB
More detail:
This RFC looks to remove the alignment code from the rcache as it seems 
unnecessary.  In all use cases in the library, alignment requirements are 
handled in the memory pool layer (or in the case of the vader btl, in the btl 
layer).  It seems more logical that the alignment is in the upper layer as that 
code is also where any registration restrictions would be known.  The rcache 
alignment code causes problems for me where I want to have different alignment 
requirements than the rcache is forcing on me.  (The rcache defaults to an 
alignment of mca_mpool_base_page_size_log=4K on my machine)  Therefore, I would 
like to make the change as attached to this email.

I have run through some tests and all seems OK.  Is there anything I am missing 
such that we need this  code in the rcache?

Thanks,
Rolf

[rvandevaart@sm064 ompi-trunk-tuesday]$ svn diff
Index: ompi/mca/rcache/vma/rcache_vma.c
===
--- ompi/mca/rcache/vma/rcache_vma.c (revision 29155)
+++ ompi/mca/rcache/vma/rcache_vma.c  (working copy)
@@ -48,15 +48,13 @@
 void* addr, size_t size, mca_mpool_base_registration_t **reg)  {
 int rc;
-void* base_addr;
-void* bound_addr;
+unsigned char* bound_addr;
 if(size == 0) {
 return OMPI_ERROR;
 }
-base_addr = down_align_addr(addr, mca_mpool_base_page_size_log);
-bound_addr = up_align_addr((void*) ((unsigned long) addr + size - 1), 
mca_mpool_base_page_size_log);
+bound_addr = addr + size - 1;

 /* Check to ensure that the cache is valid */
 if (OPAL_UNLIKELY(opal_memory_changed() && @@ -65,8 +63,8 @@
 return rc;
 }
-*reg = mca_rcache_vma_tree_find((mca_rcache_vma_module_t*)rcache, 
(unsigned char*)base_addr,
-(unsigned char*)bound_addr);
+*reg = mca_rcache_vma_tree_find((mca_rcache_vma_module_t*)rcache, 
(unsigned char*)addr,
+bound_addr);
 return OMPI_SUCCESS;
}
@@ -76,14 +74,13 @@
 int reg_cnt)
{
 int rc;
-void *base_addr, *bound_addr;
+unsigned char *bound_addr;
 if(size == 0) {
 return OMPI_ERROR;
 }
-base_addr = down_align_addr(addr, mca_mpool_base_page_size_log);
-bound_addr = up_align_addr((void*) ((unsigned long) addr + size - 1), 
mca_mpool_base_page_size_log);
+bound_addr = addr + size - 1;
 /* Check to ensure that the cache is valid */
 if (OPAL_UNLIKELY(opal_memory_changed() && @@ -93,7 +90,7 @@
 }
 return mca_rcache_vma_tree_find_all((mca_rcache_vma_module_t*)rcache,
-(unsigned char*)base_addr, (unsigned char*)bound_addr, regs,
+(unsigned char*)addr, bound_addr, regs,
 reg_cnt);
}
Index: ompi/mca/mpool/grdma/mpool_grdma_module.c
===
--- ompi/mca/mpool/grdma/mpool_grdma_module.c   (revision 29155)
+++ ompi/mca/mpool/grdma/mpool_grdma_module.c(working copy)
@@ -233,7 +233,7 @@
  * Persistent registration are always registered and placed in the cache */
 if(!(bypass_cache || persist)) {
 /* check to see if memory is registered */
-mpool->rcache->rcache_find(mpool->rcache, addr, size, reg);
+mpool->rcache->rcache_find(mpool->rcache, base, bound - base +
+ 1, reg);
 if (*reg && !(flags & MCA_MPOOL_FLAGS_INVALID)) {
 if (0 == (*reg)->ref_count) {
 /* Leave pinned must be set for this to still be in the 
rcache. */ @@ -346,7 +346,7 @@
 OPAL_THREAD_LOCK(>rcache->lock);
-rc = mpool->rcache->rcache_find(mpool->rcache, addr, size, reg);
+rc = mpool->rcache->rcache_find(mpool->rcache, base, bound - base +
+ 1, reg);
 if(NULL != *reg &&
 (mca_mpool_grdma_component.leave_pinned ||
  ((*reg)->flags & MCA_MPOOL_FLAGS_PERSIST) ||
[rvandevaart@sm064 ompi-trunk-tuesday]$


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
Between a few off-list emails, Ralph was able to reproduce this problem on odin 
when he forced the use of the oob connection code in the openib BTL.
I have created a ticket to track this issue.  Not sure what we will do with 
this issue.

https://svn.open-mpi.org/trac/ompi/ticket/3746


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Tuesday, September 03, 2013 4:52 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Correction: That line below should be:
gmake run FILE=p2p_c

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Tuesday, September 03, 2013 4:50 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

I just retried and I still get errors with the latest trunk. (29112).  If I 
back up to r29057, then everything is fine.  In addition, I can reproduce this 
on two different clusters.
Can you try running the entire intel test suite and see if that works?  Maybe a 
different test will fail for you.

   cd ompi-tests/trunk/intel_tests/src
gmake run FILE=cuda_c

You need to modify Makefile in intel_tests to make it do the right thing.  
Trying to figure out what I should do next.  As I said, I get a variety of 
different failures.  Maybe I should collect them up and see what it means.  
This failure has me dead in the water with the trunk.



From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:41 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Sigh - I cannot get it to fail. I've tried up to np=16 without getting a single 
hiccup.

Try a fresh checkout - let's make sure you don't have some old cruft laying 
around.

On Sep 3, 2013, at 12:26 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

I am running a debug build.  Here is my configure line:

../configure --enable-debug --enable-shared --disable-static 
--prefix=/home/rolf/ompi-trunk-29061/64 --with- 
wrapper-ldflags='-Wl,-rpath,${prefix}/lib' --disable-vt 
--enable-orterun-prefix-by-default -disable-io-romio  --enable-picky

The test program is from the intel test suite in our test suite.
http://svn.open-mpi.org/svn/ompi-tests/trunk/intel_tests/src/MPI_Irecv_comm_c.c<http://svn.open-mpi.org/svn/ompi-tests/trunk/intel/src/MPI_Irecv_comm_c.c>

Run with at least np=4.  The more np, the better.


From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Also, send me your test code - maybe that is required to trigger it

On Sep 3, 2013, at 12:19 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:

Dang - I just finished running it on odin without a problem. Are you seeing 
this with a debug or optimized build?


On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

Yes, it fails on the current trunk (r29112).  That is what started me on the 
journey to figure out when things went wrong.  It was working up until r29058.

From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 2:49 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Are you all the way up to the current trunk? There have been a few typo fixes 
since the original commit.

I'm not familiar with the OOB connect code in openib. The OOB itself isn't 
using free list, so I suspect it is something up in the OOB connect code 
itself. I'll take a look and see if something leaps out at me - it seems to be 
working fine on IU's odin cluster, which is the only IB-based system I can 
access


On Sep 3, 2013, at 11:34 AM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


As mentioned in the weekly conference call, I am seeing some strange errors 
when using the openib BTL.  I have narrowed down the changeset that broke 
things to the ORTE async code.

https://svn.open-mpi.org/trac/ompi/changeset/29058  (and 
https://svn.open-mpi.org/trac/ompi/changeset/29061 which was needed to fix 
compile errors)

Changeset 29057 does not have these issues.  I do not have a very good 
characterization of the failures.  The failures are not consistent.  Sometimes 
they can pass.  Sometimes the stack trace can be different.  They seem to 
happen more with larger np, like np=4 and more.

The first failure mode is a segmentation violation and it always seems to be 
that we are trying to pop something of a free list.  But the upper parts of the 
stack trace can vary.  This is with the trunk version 29061.
Ralph, any thoughts on where we go 

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
Correction: That line below should be:
gmake run FILE=p2p_c

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Tuesday, September 03, 2013 4:50 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

I just retried and I still get errors with the latest trunk. (29112).  If I 
back up to r29057, then everything is fine.  In addition, I can reproduce this 
on two different clusters.
Can you try running the entire intel test suite and see if that works?  Maybe a 
different test will fail for you.

   cd ompi-tests/trunk/intel_tests/src
gmake run FILE=cuda_c

You need to modify Makefile in intel_tests to make it do the right thing.  
Trying to figure out what I should do next.  As I said, I get a variety of 
different failures.  Maybe I should collect them up and see what it means.  
This failure has me dead in the water with the trunk.



From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:41 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Sigh - I cannot get it to fail. I've tried up to np=16 without getting a single 
hiccup.

Try a fresh checkout - let's make sure you don't have some old cruft laying 
around.

On Sep 3, 2013, at 12:26 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

I am running a debug build.  Here is my configure line:

../configure --enable-debug --enable-shared --disable-static 
--prefix=/home/rolf/ompi-trunk-29061/64 --with- 
wrapper-ldflags='-Wl,-rpath,${prefix}/lib' --disable-vt 
--enable-orterun-prefix-by-default -disable-io-romio  --enable-picky

The test program is from the intel test suite in our test suite.
http://svn.open-mpi.org/svn/ompi-tests/trunk/intel_tests/src/MPI_Irecv_comm_c.c<http://svn.open-mpi.org/svn/ompi-tests/trunk/intel/src/MPI_Irecv_comm_c.c>

Run with at least np=4.  The more np, the better.


From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Also, send me your test code - maybe that is required to trigger it

On Sep 3, 2013, at 12:19 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:


Dang - I just finished running it on odin without a problem. Are you seeing 
this with a debug or optimized build?


On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


Yes, it fails on the current trunk (r29112).  That is what started me on the 
journey to figure out when things went wrong.  It was working up until r29058.

From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 2:49 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Are you all the way up to the current trunk? There have been a few typo fixes 
since the original commit.

I'm not familiar with the OOB connect code in openib. The OOB itself isn't 
using free list, so I suspect it is something up in the OOB connect code 
itself. I'll take a look and see if something leaps out at me - it seems to be 
working fine on IU's odin cluster, which is the only IB-based system I can 
access


On Sep 3, 2013, at 11:34 AM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:



As mentioned in the weekly conference call, I am seeing some strange errors 
when using the openib BTL.  I have narrowed down the changeset that broke 
things to the ORTE async code.

https://svn.open-mpi.org/trac/ompi/changeset/29058  (and 
https://svn.open-mpi.org/trac/ompi/changeset/29061 which was needed to fix 
compile errors)

Changeset 29057 does not have these issues.  I do not have a very good 
characterization of the failures.  The failures are not consistent.  Sometimes 
they can pass.  Sometimes the stack trace can be different.  They seem to 
happen more with larger np, like np=4 and more.

The first failure mode is a segmentation violation and it always seems to be 
that we are trying to pop something of a free list.  But the upper parts of the 
stack trace can vary.  This is with the trunk version 29061.
Ralph, any thoughts on where we go from here?

[rolf@Fermi-Cluster src]$ mpirun -np 4 -host c0-0,c0-1,c0-3,c0-4 
MPI_Irecv_comm_c
MPITEST info  (0): Starting:  MPI_Irecv_comm:
[compute-0-4:04752] *** Process received signal *** [compute-0-4:04752] Signal: 
Segmentation fault (11) [compute-0-4:04752] Signal code: Address not mapped (1) 
[compute-0-4:04752] Failing at address: 0x28
--
mpirun noticed that process rank 3 w

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
I just retried and I still get errors with the latest trunk. (29112).  If I 
back up to r29057, then everything is fine.  In addition, I can reproduce this 
on two different clusters.
Can you try running the entire intel test suite and see if that works?  Maybe a 
different test will fail for you.

   cd ompi-tests/trunk/intel_tests/src
gmake run FILE=cuda_c

You need to modify Makefile in intel_tests to make it do the right thing.  
Trying to figure out what I should do next.  As I said, I get a variety of 
different failures.  Maybe I should collect them up and see what it means.  
This failure has me dead in the water with the trunk.



From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:41 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Sigh - I cannot get it to fail. I've tried up to np=16 without getting a single 
hiccup.

Try a fresh checkout - let's make sure you don't have some old cruft laying 
around.

On Sep 3, 2013, at 12:26 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


I am running a debug build.  Here is my configure line:

../configure --enable-debug --enable-shared --disable-static 
--prefix=/home/rolf/ompi-trunk-29061/64 --with- 
wrapper-ldflags='-Wl,-rpath,${prefix}/lib' --disable-vt 
--enable-orterun-prefix-by-default -disable-io-romio  --enable-picky

The test program is from the intel test suite in our test suite.
http://svn.open-mpi.org/svn/ompi-tests/trunk/intel_tests/src/MPI_Irecv_comm_c.c<http://svn.open-mpi.org/svn/ompi-tests/trunk/intel/src/MPI_Irecv_comm_c.c>

Run with at least np=4.  The more np, the better.


From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Also, send me your test code - maybe that is required to trigger it

On Sep 3, 2013, at 12:19 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:



Dang - I just finished running it on odin without a problem. Are you seeing 
this with a debug or optimized build?


On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:



Yes, it fails on the current trunk (r29112).  That is what started me on the 
journey to figure out when things went wrong.  It was working up until r29058.

From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 2:49 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Are you all the way up to the current trunk? There have been a few typo fixes 
since the original commit.

I'm not familiar with the OOB connect code in openib. The OOB itself isn't 
using free list, so I suspect it is something up in the OOB connect code 
itself. I'll take a look and see if something leaps out at me - it seems to be 
working fine on IU's odin cluster, which is the only IB-based system I can 
access


On Sep 3, 2013, at 11:34 AM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:




As mentioned in the weekly conference call, I am seeing some strange errors 
when using the openib BTL.  I have narrowed down the changeset that broke 
things to the ORTE async code.

https://svn.open-mpi.org/trac/ompi/changeset/29058  (and 
https://svn.open-mpi.org/trac/ompi/changeset/29061 which was needed to fix 
compile errors)

Changeset 29057 does not have these issues.  I do not have a very good 
characterization of the failures.  The failures are not consistent.  Sometimes 
they can pass.  Sometimes the stack trace can be different.  They seem to 
happen more with larger np, like np=4 and more.

The first failure mode is a segmentation violation and it always seems to be 
that we are trying to pop something of a free list.  But the upper parts of the 
stack trace can vary.  This is with the trunk version 29061.
Ralph, any thoughts on where we go from here?

[rolf@Fermi-Cluster src]$ mpirun -np 4 -host c0-0,c0-1,c0-3,c0-4 
MPI_Irecv_comm_c
MPITEST info  (0): Starting:  MPI_Irecv_comm:
[compute-0-4:04752] *** Process received signal *** [compute-0-4:04752] Signal: 
Segmentation fault (11) [compute-0-4:04752] Signal code: Address not mapped (1) 
[compute-0-4:04752] Failing at address: 0x28
--
mpirun noticed that process rank 3 with PID 4752 on node c0-4 exited on signal 
11 (Segmentation fault).
--
[rolf@Fermi-Cluster src]$ gdb MPI_Irecv_comm_c core.4752 GNU gdb Fedora 
(6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL versio

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
I am running a debug build.  Here is my configure line:

../configure --enable-debug --enable-shared --disable-static 
--prefix=/home/rolf/ompi-trunk-29061/64 --with- 
wrapper-ldflags='-Wl,-rpath,${prefix}/lib' --disable-vt 
--enable-orterun-prefix-by-default -disable-io-romio  --enable-picky

The test program is from the intel test suite in our test suite.
http://svn.open-mpi.org/svn/ompi-tests/trunk/intel_tests/src/MPI_Irecv_comm_c.c<http://svn.open-mpi.org/svn/ompi-tests/trunk/intel/src/MPI_Irecv_comm_c.c>

Run with at least np=4.  The more np, the better.


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 3:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Also, send me your test code - maybe that is required to trigger it

On Sep 3, 2013, at 12:19 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:


Dang - I just finished running it on odin without a problem. Are you seeing 
this with a debug or optimized build?


On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


Yes, it fails on the current trunk (r29112).  That is what started me on the 
journey to figure out when things went wrong.  It was working up until r29058.

From: devel [mailto:devel-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On 
Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 2:49 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Are you all the way up to the current trunk? There have been a few typo fixes 
since the original commit.

I'm not familiar with the OOB connect code in openib. The OOB itself isn't 
using free list, so I suspect it is something up in the OOB connect code 
itself. I'll take a look and see if something leaps out at me - it seems to be 
working fine on IU's odin cluster, which is the only IB-based system I can 
access


On Sep 3, 2013, at 11:34 AM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:



As mentioned in the weekly conference call, I am seeing some strange errors 
when using the openib BTL.  I have narrowed down the changeset that broke 
things to the ORTE async code.

https://svn.open-mpi.org/trac/ompi/changeset/29058  (and 
https://svn.open-mpi.org/trac/ompi/changeset/29061 which was needed to fix 
compile errors)

Changeset 29057 does not have these issues.  I do not have a very good 
characterization of the failures.  The failures are not consistent.  Sometimes 
they can pass.  Sometimes the stack trace can be different.  They seem to 
happen more with larger np, like np=4 and more.

The first failure mode is a segmentation violation and it always seems to be 
that we are trying to pop something of a free list.  But the upper parts of the 
stack trace can vary.  This is with the trunk version 29061.
Ralph, any thoughts on where we go from here?

[rolf@Fermi-Cluster src]$ mpirun -np 4 -host c0-0,c0-1,c0-3,c0-4 
MPI_Irecv_comm_c
MPITEST info  (0): Starting:  MPI_Irecv_comm:
[compute-0-4:04752] *** Process received signal *** [compute-0-4:04752] Signal: 
Segmentation fault (11) [compute-0-4:04752] Signal code: Address not mapped (1) 
[compute-0-4:04752] Failing at address: 0x28
--
mpirun noticed that process rank 3 with PID 4752 on node c0-4 exited on signal 
11 (Segmentation fault).
--
[rolf@Fermi-Cluster src]$ gdb MPI_Irecv_comm_c core.4752 GNU gdb Fedora 
(6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
Core was generated by `MPI_Irecv_comm_c'.
Program terminated with signal 11, Segmentation fault.
[New process 4753]
[New process 4756]
[New process 4755]
[New process 4754]
[New process 4752]
#0  0x2d6ecad6 in opal_atomic_lifo_pop (lifo=0x5996940) at 
../../../../../opal/class/opal_atomic_lifo.h:111
111 lifo->opal_lifo_head = (opal_list_item_t*)item->opal_list_next;
(gdb) where
#0  0x2d6ecad6 in opal_atomic_lifo_pop (lifo=0x5996940) at 
../../../../../opal/class/opal_atomic_lifo.h:111
#1  0x2d6ec5b4 in __ompi_free_list_wait_mt (fl=0x5996940, 
item=0x40ea8d50) at ../../../../../ompi/class/ompi_free_list.h:228
#2  0x2d6ec3f8 in post_recvs (ep=0x59f3120, qp=0, num_post=256) at 
../../../../../ompi/mca/btl/openib/btl_openib_endpoint.h:361
#3  0x2d6ec1ae in mca_btl_openib_endpoint_post_rr_nolock (ep=0x59f3120, 
qp=0)
at ../../../../../ompi/mca/btl/openib/btl_openib_endpoint.h:405
#4  0x2d6ebfad in mca_btl_openib_endpoint_post_recvs 
(endpoint=0x59f3120)
at ../../../../../ompi/mca/btl/openib/btl_openib_endpoint.c:494
#5  0x2d6fe71c in qp_create_all (endpoi

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
Yes, it fails on the current trunk (r29112).  That is what started me on the 
journey to figure out when things went wrong.  It was working up until r29058.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 2:49 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes

Are you all the way up to the current trunk? There have been a few typo fixes 
since the original commit.

I'm not familiar with the OOB connect code in openib. The OOB itself isn't 
using free list, so I suspect it is something up in the OOB connect code 
itself. I'll take a look and see if something leaps out at me - it seems to be 
working fine on IU's odin cluster, which is the only IB-based system I can 
access


On Sep 3, 2013, at 11:34 AM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


As mentioned in the weekly conference call, I am seeing some strange errors 
when using the openib BTL.  I have narrowed down the changeset that broke 
things to the ORTE async code.

https://svn.open-mpi.org/trac/ompi/changeset/29058  (and 
https://svn.open-mpi.org/trac/ompi/changeset/29061 which was needed to fix 
compile errors)

Changeset 29057 does not have these issues.  I do not have a very good 
characterization of the failures.  The failures are not consistent.  Sometimes 
they can pass.  Sometimes the stack trace can be different.  They seem to 
happen more with larger np, like np=4 and more.

The first failure mode is a segmentation violation and it always seems to be 
that we are trying to pop something of a free list.  But the upper parts of the 
stack trace can vary.  This is with the trunk version 29061.
Ralph, any thoughts on where we go from here?

[rolf@Fermi-Cluster src]$ mpirun -np 4 -host c0-0,c0-1,c0-3,c0-4 
MPI_Irecv_comm_c
MPITEST info  (0): Starting:  MPI_Irecv_comm:
[compute-0-4:04752] *** Process received signal *** [compute-0-4:04752] Signal: 
Segmentation fault (11) [compute-0-4:04752] Signal code: Address not mapped (1) 
[compute-0-4:04752] Failing at address: 0x28
--
mpirun noticed that process rank 3 with PID 4752 on node c0-4 exited on signal 
11 (Segmentation fault).
--
[rolf@Fermi-Cluster src]$ gdb MPI_Irecv_comm_c core.4752 GNU gdb Fedora 
(6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
Core was generated by `MPI_Irecv_comm_c'.
Program terminated with signal 11, Segmentation fault.
[New process 4753]
[New process 4756]
[New process 4755]
[New process 4754]
[New process 4752]
#0  0x2d6ecad6 in opal_atomic_lifo_pop (lifo=0x5996940) at 
../../../../../opal/class/opal_atomic_lifo.h:111
111 lifo->opal_lifo_head = (opal_list_item_t*)item->opal_list_next;
(gdb) where
#0  0x2d6ecad6 in opal_atomic_lifo_pop (lifo=0x5996940) at 
../../../../../opal/class/opal_atomic_lifo.h:111
#1  0x2d6ec5b4 in __ompi_free_list_wait_mt (fl=0x5996940, 
item=0x40ea8d50) at ../../../../../ompi/class/ompi_free_list.h:228
#2  0x2d6ec3f8 in post_recvs (ep=0x59f3120, qp=0, num_post=256) at 
../../../../../ompi/mca/btl/openib/btl_openib_endpoint.h:361
#3  0x2d6ec1ae in mca_btl_openib_endpoint_post_rr_nolock (ep=0x59f3120, 
qp=0)
at ../../../../../ompi/mca/btl/openib/btl_openib_endpoint.h:405
#4  0x2d6ebfad in mca_btl_openib_endpoint_post_recvs 
(endpoint=0x59f3120)
at ../../../../../ompi/mca/btl/openib/btl_openib_endpoint.c:494
#5  0x2d6fe71c in qp_create_all (endpoint=0x59f3120) at 
../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:432
#6  0x2d6fde2b in reply_start_connect (endpoint=0x59f3120, 
rem_info=0x40ea8ed0)
at ../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:245
#7  0x2d7006ae in rml_recv_cb (status=0, process_name=0x5b0bb90, 
buffer=0x40ea8f80, tag=102, cbdata=0x0)
at ../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:858
#8  0x2ae802454601 in orte_rml_base_process_msg (fd=-1, flags=4, 
cbdata=0x5b0bac0)
at ../../../../orte/mca/rml/base/rml_base_msg_handlers.c:172
#9  0x2ae8027164a1 in event_process_active_single_queue (base=0x58ac620, 
activeq=0x58aa5b0)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1367
#10 0x2ae802716b24 in event_process_active (base=0x58ac620) at 
../../../../../../opal/mca/event/libevent2021/libevent/event.c:1437
#11 0x2ae80271715c in opal_libevent2021_event_base_loop (base=0x58ac620, 
flags=1)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1645
#12 0x2ae8023e7465 in orte_progress_thread_engine (obj=0x2ae8026902c0) at 
../../orte/runtime/orte_init.c:180
#13 0x003ab1e06367 in start_thread () from /l

[OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
As mentioned in the weekly conference call, I am seeing some strange errors 
when using the openib BTL.  I have narrowed down the changeset that broke 
things to the ORTE async code.

https://svn.open-mpi.org/trac/ompi/changeset/29058  (and 
https://svn.open-mpi.org/trac/ompi/changeset/29061 which was needed to fix 
compile errors)

Changeset 29057 does not have these issues.  I do not have a very good 
characterization of the failures.  The failures are not consistent.  Sometimes 
they can pass.  Sometimes the stack trace can be different.  They seem to 
happen more with larger np, like np=4 and more.

The first failure mode is a segmentation violation and it always seems to be 
that we are trying to pop something of a free list.  But the upper parts of the 
stack trace can vary.  This is with the trunk version 29061.
Ralph, any thoughts on where we go from here?

[rolf@Fermi-Cluster src]$ mpirun -np 4 -host c0-0,c0-1,c0-3,c0-4 
MPI_Irecv_comm_c
MPITEST info  (0): Starting:  MPI_Irecv_comm:
[compute-0-4:04752] *** Process received signal *** [compute-0-4:04752] Signal: 
Segmentation fault (11) [compute-0-4:04752] Signal code: Address not mapped (1) 
[compute-0-4:04752] Failing at address: 0x28
--
mpirun noticed that process rank 3 with PID 4752 on node c0-4 exited on signal 
11 (Segmentation fault).
--
[rolf@Fermi-Cluster src]$ gdb MPI_Irecv_comm_c core.4752 GNU gdb Fedora 
(6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
Core was generated by `MPI_Irecv_comm_c'.
Program terminated with signal 11, Segmentation fault.
[New process 4753]
[New process 4756]
[New process 4755]
[New process 4754]
[New process 4752]
#0  0x2d6ecad6 in opal_atomic_lifo_pop (lifo=0x5996940) at 
../../../../../opal/class/opal_atomic_lifo.h:111
111 lifo->opal_lifo_head = (opal_list_item_t*)item->opal_list_next;
(gdb) where
#0  0x2d6ecad6 in opal_atomic_lifo_pop (lifo=0x5996940) at 
../../../../../opal/class/opal_atomic_lifo.h:111
#1  0x2d6ec5b4 in __ompi_free_list_wait_mt (fl=0x5996940, 
item=0x40ea8d50) at ../../../../../ompi/class/ompi_free_list.h:228
#2  0x2d6ec3f8 in post_recvs (ep=0x59f3120, qp=0, num_post=256) at 
../../../../../ompi/mca/btl/openib/btl_openib_endpoint.h:361
#3  0x2d6ec1ae in mca_btl_openib_endpoint_post_rr_nolock (ep=0x59f3120, 
qp=0)
at ../../../../../ompi/mca/btl/openib/btl_openib_endpoint.h:405
#4  0x2d6ebfad in mca_btl_openib_endpoint_post_recvs 
(endpoint=0x59f3120)
at ../../../../../ompi/mca/btl/openib/btl_openib_endpoint.c:494
#5  0x2d6fe71c in qp_create_all (endpoint=0x59f3120) at 
../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:432
#6  0x2d6fde2b in reply_start_connect (endpoint=0x59f3120, 
rem_info=0x40ea8ed0)
at ../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:245
#7  0x2d7006ae in rml_recv_cb (status=0, process_name=0x5b0bb90, 
buffer=0x40ea8f80, tag=102, cbdata=0x0)
at ../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:858
#8  0x2ae802454601 in orte_rml_base_process_msg (fd=-1, flags=4, 
cbdata=0x5b0bac0)
at ../../../../orte/mca/rml/base/rml_base_msg_handlers.c:172
#9  0x2ae8027164a1 in event_process_active_single_queue (base=0x58ac620, 
activeq=0x58aa5b0)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1367
#10 0x2ae802716b24 in event_process_active (base=0x58ac620) at 
../../../../../../opal/mca/event/libevent2021/libevent/event.c:1437
#11 0x2ae80271715c in opal_libevent2021_event_base_loop (base=0x58ac620, 
flags=1)
at ../../../../../../opal/mca/event/libevent2021/libevent/event.c:1645
#12 0x2ae8023e7465 in orte_progress_thread_engine (obj=0x2ae8026902c0) at 
../../orte/runtime/orte_init.c:180
#13 0x003ab1e06367 in start_thread () from /lib64/libpthread.so.0
#14 0x003ab16d2f7d in clone () from /lib64/libc.so.6
(gdb)


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Quick observation - component ignored for 7 years

2013-08-27 Thread Rolf vandeVaart
The ompi/mca/rcache/rb component has been .ompi_ignored for almost 7 years.  
Should we delete it?




---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1

2013-08-23 Thread Rolf vandeVaart
Yes, I agree that the CUDA support is more intrusive and ends up in different 
areas.  The problem is that the changes could not be simply isolated in a BTL.

1. To support the direct movement of GPU buffers, we often utilize copying into 
host memory and then out of host memory.   These copies have to be done 
utilizing the cuMemcpy() functions rather than the memcpy() function.  This is 
why some changes ended up in the opal datatype area.  Must of that copying is 
driven by the convertor.
2. I added support for doing an asynchronous copy into and out of host buffers. 
 This ended up touching datatype, PML, and BTL layers.
3. A GPU buffer may utilize a different protocol than a HOST buffer within a 
BTL.  This required me to find different ways to direct which PML protocol to 
use.  In addition, it is assume that a BTL either supports RDMA or not.  There 
is no idea of supporting based on the type of buffer one is sending.

Therefore, to leverage much of the existing datatype and PML support, I had to 
make changes in there.  Overall, I agree it is not ideal, but the best I could 
come up with. 

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Friday, August 23, 2013 7:36 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>
>Rolf,
>
>On Aug 22, 2013, at 19:24 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
>> Hi George:
>>
>> The reason it tainted the PML is because the CUDA IPC support makes use
>of the large message RDMA protocol of the PML layer.  The smcuda btl starts
>up, but does not initially support any large message RDMA (RGET,RPUT)
>protocols.  Then when a GPU buffer is first accessed, the smcuda btl starts an
>exchange of some control messages with its peer.  If they determine that
>they can support CUDA IPC, then the smcuda calls up into the PML layer and
>says it is OK to start using the large message RDMA.  This all happens in code
>that is only compiled in if the user asks for CUDA-aware support.
>
>The issue is not that is compiled in only when CUDA support is enabled. The
>problem is that it is a breakage of the separation between PML and BTL.
>Almost all BTL did manage to implement highly optimized protocols under this
>design without having to taint the PML. Very similarly to CUDA I can cite CMA
>and KNEM support in the SM BTL. So I really wonder why the CUDA support is
>so different, at the point where it had to go all over the place (convertor,
>memory pool and PML)?
>
>> The key requirement was I wanted to dynamically add the support for CUDA
>IPC when the user first started accessing GPU buffers rather than during
>MPI_Init.
>
>Moving from BTL flags to endpoint based flags is indeed a good thing. This is
>something that should be done everywhere in the PML code, as it will allow
>the BTL to support different behaviors based on the  peer.
>
>George.
>
>> This the best way I could figure out how to accomplish this but I am open to
>other ideas.
>
>
>
>>
>> Thanks,
>> Rolf
>>
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>>> Bosilca
>>> Sent: Thursday, August 22, 2013 11:32 AM
>>> To: de...@open-mpi.org
>>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>>> trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>>>
>>> I'm not very keen of seeing BTL modification tainting the PML. I
>>> would have expected support for IPC between GPU must be a BTL-level
>>> decision, no a special path in the PML.
>>>
>>> Is there a reason IPC support cannot be hidden down in the SMCUDA BTL?
>>>
>>> Thanks,
>>>   George.
>>>
>>> On Aug 21, 2013, at 23:00 , svn-commit-mai...@open-mpi.org wrote:
>>>
>>>> Author: rolfv (Rolf Vandevaart)
>>>> Date: 2013-08-21 17:00:09 EDT (Wed, 21 Aug 2013) New Revision: 29055
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29055
>>>>
>>>> Log:
>>>> Fix support in smcuda btl so it does not blow up when there is no
>>>> CUDA IPC
>>> support between two GPUs. Also make it so CUDA IPC support is added
>>> dynamically.
>>>> Fixes ticket 3531.
>>>>
>>>> Added:
>>>>  trunk/ompi/mca/btl/smcuda/README
>>>> Text files modified:
>>>>  trunk/ompi/mca/btl/btl.h | 2
>>>>  trunk/ompi/mca/btl/smcuda/README |   113
>>> ++
>>>>  tru

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1

2013-08-22 Thread Rolf vandeVaart
Hi George:

The reason it tainted the PML is because the CUDA IPC support makes use of the 
large message RDMA protocol of the PML layer.  The smcuda btl starts up, but 
does not initially support any large message RDMA (RGET,RPUT) protocols.  Then 
when a GPU buffer is first accessed, the smcuda btl starts an exchange of some 
control messages with its peer.  If they determine that they can support CUDA 
IPC, then the smcuda calls up into the PML layer and says it is OK to start 
using the large message RDMA.  This all happens in code that is only compiled 
in if the user asks for CUDA-aware support.

The key requirement was I wanted to dynamically add the support for CUDA IPC 
when the user first started accessing GPU buffers rather than during MPI_Init.
This the best way I could figure out how to accomplish this but I am open to 
other ideas.   

Thanks,
Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Thursday, August 22, 2013 11:32 AM
>To: de...@open-mpi.org
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>
>I'm not very keen of seeing BTL modification tainting the PML. I would have
>expected support for IPC between GPU must be a BTL-level decision, no a
>special path in the PML.
>
>Is there a reason IPC support cannot be hidden down in the SMCUDA BTL?
>
>  Thanks,
>George.
>
>On Aug 21, 2013, at 23:00 , svn-commit-mai...@open-mpi.org wrote:
>
>> Author: rolfv (Rolf Vandevaart)
>> Date: 2013-08-21 17:00:09 EDT (Wed, 21 Aug 2013) New Revision: 29055
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29055
>>
>> Log:
>> Fix support in smcuda btl so it does not blow up when there is no CUDA IPC
>support between two GPUs. Also make it so CUDA IPC support is added
>dynamically.
>> Fixes ticket 3531.
>>
>> Added:
>>   trunk/ompi/mca/btl/smcuda/README
>> Text files modified:
>>   trunk/ompi/mca/btl/btl.h | 2
>>   trunk/ompi/mca/btl/smcuda/README |   113
>++
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda.c   |   104
>
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda.h   |28 +
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_component.c |   200
>+++
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_endpoint.h  | 5 +
>>   trunk/ompi/mca/common/cuda/common_cuda.c |29 +
>>   trunk/ompi/mca/common/cuda/common_cuda.h | 3
>>   trunk/ompi/mca/pml/ob1/pml_ob1.c |11 ++
>>   trunk/ompi/mca/pml/ob1/pml_ob1_cuda.c|42 
>>   trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c | 6
>>   11 files changed, 535 insertions(+), 8 deletions(-)
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Rolf vandeVaart
I ran into a hang in a test in which the sender sends less data than the 
receiver is expecting.  For example, the following shows the receiver expecting 
twice what the sender is sending.

Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)

This is also reproducible using one of the intel tests and adjusting the eager 
value for the openib BTL.

Ø  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
MPI_Send_overtake_c

In most cases, this works just fine.  However, when the PML protocol used is 
the RGET protocol, the test hangs.   Below is a proposed fix for this issue.
I believe we want to be checking against req_bytes_packed rather than 
req_bytes_expected as req_bytes_expected is what the user originally told us.
Otherwise, with the current code, we never send a FIN message back to the 
sender.

Any thoughts?

[rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
@@ -335,7 +335,7 @@
 /* is receive request complete */
 OPAL_THREAD_ADD_SIZE_T(>req_bytes_received, frag->rdma_length);
-if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
+if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
 mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
   bml_btl,
  frag->rdma_hdr.hdr_rget.hdr_des,



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Build warnings in trunk

2013-05-14 Thread Rolf vandeVaart
I have noticed several warnings while building the trunk.   Feel free to fix 
anything that you are familiar with.  

CC sys_limits.lo
../../../opal/util/sys_limits.c: In function 'opal_util_init_sys_limits':
../../../opal/util/sys_limits.c:107:20: warning: 'lim' may be used 
uninitialized in this function

  CC mca_base_param.lo
../../../../opal/mca/base/mca_base_param.c: In function 'register_param':
../../../../opal/mca/base/mca_base_param.c:113:25: warning: 'var_type' may be 
used uninitialized in this function

  CC mca_base_var.lo
../../../../opal/mca/base/mca_base_var.c: In function 'var_set_from_string':
../../../../opal/mca/base/mca_base_var.c:797:14: warning: 'int_value' may be 
used uninitialized in this function
../../../../opal/mca/base/mca_base_var.c: In function 'mca_base_var_dump':
../../../../opal/mca/base/mca_base_var.c:2016:27: warning: 'original' may be 
used uninitialized in this function
../../../../opal/mca/base/mca_base_var.c:2015:30: warning: 'synonyms' may be 
used uninitialized in this function
../../../../opal/mca/base/mca_base_var.c:2018:17: warning: 'type_string' may be 
used uninitialized in this function

  CC runtime/opal_info_support.lo
../../opal/runtime/opal_info_support.c: In function 
'opal_info_register_project_frameworks':
../../opal/runtime/opal_info_support.c:241:12: warning: 'rc' may be used 
uninitialized in this function

  CC base/oob_base_init.lo
../../../../orte/mca/oob/base/oob_base_init.c: In function 'mca_oob_base_init':
../../../../orte/mca/oob/base/oob_base_init.c:55:43: warning: 's_component' may 
be used uninitialized in this function

  CC ras_slurm_module.lo
../../../../../orte/mca/ras/slurm/ras_slurm_module.c: In function 'init':
../../../../../orte/mca/ras/slurm/ras_slurm_module.c:143:11: warning: 
'slurm_host' may be used uninitialized in this function
../../../../../orte/mca/ras/slurm/ras_slurm_module.c:144:14: warning: 'port' 
may be used uninitialized in this function
../../../../../orte/mca/ras/slurm/ras_slurm_module.c: In function 'recv_data':
../../../../../orte/mca/ras/slurm/ras_slurm_module.c:742:31: warning: 'jtrk' 
may be used uninitialized in this function
../../../../../orte/mca/ras/slurm/ras_slurm_module.c:740:17: warning: 'idx' may 
be used uninitialized in this function
../../../../../orte/mca/ras/slurm/ras_slurm_module.c:740:22: warning: 'sjob' 
may be used uninitialized in this function
../../../../../orte/mca/ras/slurm/ras_slurm_module.c:741:20: warning: 
'nodelist' may be used uninitialized in this function

  CC rmaps_lama_params.lo
../../../../../orte/mca/rmaps/lama/rmaps_lama_params.c: In function 
'rmaps_lama_ok_to_prune_level':
../../../../../orte/mca/rmaps/lama/rmaps_lama_params.c:789:19: warning: 
comparison between 'rmaps_lama_order_type_t' and 'rmaps_lama_level_type_t'
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] mpirun -host does not work from r27879 and forward on trunk

2013-01-31 Thread Rolf vandeVaart
Ralph and I talked off-list about the issue.  He figured it and fixed it with 
changelist 27955.
See that changelist for the details.

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Ralph Castain
>Sent: Thursday, January 31, 2013 11:51 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] mpirun -host does not work from r27879 and
>forward on trunk
>
>Yes - no hostfile and no RM allocation, just -host.
>
>What is your setup?
>
>On Jan 31, 2013, at 8:44 AM, Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>
>> Interesting.  Yes, I was saying that the latest trunk does not work for me.  
>> I
>just retested the trunk also, and no luck.
>> Are you launching the MPI processes on remote nodes from the HNP?
>>
>>> -Original Message-
>>> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>mpi.org]
>>> On Behalf Of Ralph Castain
>>> Sent: Thursday, January 31, 2013 11:40 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] mpirun -host does not work from r27879 and
>>> forward on trunk
>>>
>>> FWIW: I just tried it on the trunk head and it worked fine
>>>
>>> On Jan 31, 2013, at 8:20 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Ummm...that was fixed a long time ago. You might try a later version.
>>>>
>>>> Or are you saying the head of the trunk doesn't work too?
>>>>
>>>> On Jan 31, 2013, at 7:31 AM, Rolf vandeVaart
>>>> <rvandeva...@nvidia.com>
>>> wrote:
>>>>
>>>>> I have stumbled into a problem with the -host argument.  This
>>>>> problem
>>> appears to be introduced with changeset r27879 on 1/19/2013 by rhc.
>>>>>
>>>>> With r27877, things work:
>>>>> [rolf@node]$ which mpirun
>>>>> /home/rolf/ompi-trunk-r27877/64/bin/mpirun
>>>>> [rolf@node]$ mpirun -np 2 -host c0-0,c0-3 hostname
>>>>> c0-3
>>>>> c0-0
>>>>>
>>>>> With r27879, things are broken:
>>>>> [rolf@node]$ setenv PATH
>>>>> /home/rolf/ompi-trunk-r27879/64/bin:${PATH}
>>>>> [rolf@node]$ mpirun -np 2 -host c0-0,c0-3 hostname
>>>>> ---
>>>>> --
>>>>> - All nodes which are allocated for this job are already filled.
>>>>> ---
>>>>> --
>>>>> -
>>>>> [rolf@Fermi-Cluster nv]$
>>>>>
>>>>> Note: Could not compile r27878 so did not test that.
>>>>>
>>>>> I can only run processes on the same node as mpirun.
>>>>>
>>>>> ---
>>>>> --
>>>>> -- This email message is for the sole use of the
>>>>> intended
>>>>> recipient(s) and may contain confidential information.  Any
>>>>> unauthorized review, use, disclosure or distribution is prohibited.
>>>>> If you are not the intended recipient, please contact the sender by
>>>>> reply email and destroy all copies of the original message.
>>>>> ---
>>>>> --
>>>>> --
>>>>>
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] mpirun -host does not work from r27879 and forward on trunk

2013-01-31 Thread Rolf vandeVaart
Interesting.  Yes, I was saying that the latest trunk does not work for me.  I 
just retested the trunk also, and no luck.
Are you launching the MPI processes on remote nodes from the HNP?

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Ralph Castain
>Sent: Thursday, January 31, 2013 11:40 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] mpirun -host does not work from r27879 and
>forward on trunk
>
>FWIW: I just tried it on the trunk head and it worked fine
>
>On Jan 31, 2013, at 8:20 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Ummm...that was fixed a long time ago. You might try a later version.
>>
>> Or are you saying the head of the trunk doesn't work too?
>>
>> On Jan 31, 2013, at 7:31 AM, Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>>
>>> I have stumbled into a problem with the -host argument.  This problem
>appears to be introduced with changeset r27879 on 1/19/2013 by rhc.
>>>
>>> With r27877, things work:
>>> [rolf@node]$ which mpirun
>>> /home/rolf/ompi-trunk-r27877/64/bin/mpirun
>>> [rolf@node]$ mpirun -np 2 -host c0-0,c0-3 hostname
>>> c0-3
>>> c0-0
>>>
>>> With r27879, things are broken:
>>> [rolf@node]$ setenv PATH /home/rolf/ompi-trunk-r27879/64/bin:${PATH}
>>> [rolf@node]$ mpirun -np 2 -host c0-0,c0-3 hostname
>>> -
>>> - All nodes which are allocated for this job are already filled.
>>> -
>>> -
>>> [rolf@Fermi-Cluster nv]$
>>>
>>> Note: Could not compile r27878 so did not test that.
>>>
>>> I can only run processes on the same node as mpirun.
>>>
>>> -
>>> -- This email message is for the sole use of the intended
>>> recipient(s) and may contain confidential information.  Any
>>> unauthorized review, use, disclosure or distribution is prohibited.
>>> If you are not the intended recipient, please contact the sender by
>>> reply email and destroy all copies of the original message.
>>> -
>>> --
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] mpirun -host does not work from r27879 and forward on trunk

2013-01-31 Thread Rolf vandeVaart
I have stumbled into a problem with the -host argument.  This problem appears 
to be introduced with changeset r27879 on 1/19/2013 by rhc.  

With r27877, things work:
[rolf@node]$ which mpirun
/home/rolf/ompi-trunk-r27877/64/bin/mpirun
[rolf@node]$ mpirun -np 2 -host c0-0,c0-3 hostname
c0-3
c0-0

With r27879, things are broken:
[rolf@node]$ setenv PATH /home/rolf/ompi-trunk-r27879/64/bin:${PATH}
[rolf@node]$ mpirun -np 2 -host c0-0,c0-3 hostname
--
All nodes which are allocated for this job are already filled.
--
[rolf@Fermi-Cluster nv]$ 

Note: Could not compile r27878 so did not test that.

I can only run processes on the same node as mpirun.

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Rolf vandeVaart
Thanks for this report.  I will look into this.  Can you tell me what your 
mpirun command looked like and do you know what transport you are running over?
Specifically, is this on a single node or multiple nodes?

Rolf

From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Alessandro Fanfarillo
Sent: Thursday, January 24, 2013 4:11 AM
To: de...@open-mpi.org
Subject: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

Dear all,
I would like to report a bug for the CUDA support on the last 5 trunk versions.
The attached code is a simply send/receive test case which correctly works with 
version 1.9a1r27844.
Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following message:

./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: 
undefined symbol: progress_one_cuda_htod_event
./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: 
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 21641 on
node ip-10-16-24-100 exiting improperly. There are three reasons this could 
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

-
I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.
Thanks in advance.

Best regards.

Alessandro Fanfarillo



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] RFC: Support for asynchronous copies of GPU buffers over IB

2012-12-17 Thread Rolf vandeVaart
[I sent this out in June, but did not commit it.  So resending.  Timeout of Jan 
5, 2012.  Note that this does not use the GPU Direct RDMA]
WHAT: Add support for doing asynchronous copies of GPU memory with larger 
messages.
WHY: Improve performance for sending/receiving of larger GPU messages over IB
WHERE: ob1, openib, and convertor code. All is protected by compiler directives
   so no effect on non-CUDA builds.
REFERENCE BRANCH: https://bitbucket.org/rolfv/ompi-trunk-cuda-async-2
DETAILS:
When sending/receiving GPU memory through IB, all data first passes into host 
memory.
The copy of GPU memory into and out of the host memory can be done 
asynchronously
to improve performance. This RFC adds that feature for the fragments of larger 
messages.
On the sending side, the completion function is essentially broken in two. The 
first function
is called when the copy completes which then initiates the send. When the send 
completes,
the second function is called.
Likewise, on the receiving side, a callback is called when the fragment arrives 
which
initiates the copy of the data out of the buffer. When the copy completes, a 
second
function is called which also calls back into the BTL so it can free resources 
that
were being used.
M opal/datatype/opal_datatype_copy.c
M opal/datatype/opal_convertor.c
M opal/datatype/opal_convertor.h
M opal/datatype/opal_datatype_cuda.c
M opal/datatype/opal_datatype_cuda.h
M opal/datatype/opal_datatype_unpack.c
M opal/datatype/opal_datatype_pack.h
M opal/datatype/opal_datatype_unpack.h
M ompi/mca/btl/btl.h
M ompi/mca/btl/openib/btl_openib_component.c
M ompi/mca/btl/openib/btl_openib.c
M ompi/mca/btl/openib/btl_openib.h
M ompi/mca/btl/openib/btl_openib_mca.c
M ompi/mca/pml/ob1/pml_ob1_recvfrag.c
M ompi/mca/pml/ob1/pml_ob1_sendreq.c
M ompi/mca/pml/ob1/pml_ob1_progress.c
M ompi/mca/pml/ob1/pml_ob1_recvreq.c
M ompi/mca/pml/ob1/pml_ob1_cuda.c
M ompi/mca/pml/ob1/pml_ob1_recvreq.h

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] OpenMPI CUDA 5 readiness?

2012-09-04 Thread Rolf vandeVaart
Hello Dimitry:

Thanks for the information.  I am hoping the VampirTrace folks can comment on 
this since the issue seems to be in areas of the code they work in.


>-Original Message-
>From: Dmitry N. Mikushin [mailto:maemar...@gmail.com]
>Sent: Monday, September 03, 2012 1:37 PM
>To: Rolf vandeVaart
>Cc: de...@open-mpi.org
>Subject: Re: OpenMPI CUDA 5 readiness?
>
>CUDA 5 basically changes char* to void* in some functions. Attached is a small
>patch which changes prototypes, depending on used CUDA version. Tested
>with CUDA 5 preview and 4.2.
>
>- D.
>
>2012/9/2 Dmitry N. Mikushin <maemar...@gmail.com>:
>> Dear Rolf,
>>
>> FYI, looks like with CUDA 5 preview OpenMPI trunk fails to build due
>> to the following errors:
>>
>> $ svn info
>> Path: .
>> URL: http://svn.open-mpi.org/svn/ompi/trunk
>> Repository Root: http://svn.open-mpi.org/svn/ompi Repository UUID:
>> 63e3feb5-37d5-0310-a306-e8a459e722fe
>> Revision: 27216
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: alekseys
>> Last Changed Rev: 27216
>> Last Changed Date: 2012-09-02 15:17:49 +0200 (Sun, 02 Sep 2012)
>>
>> $ ../configure --prefix=$RPM_BUILD_ROOT/opt/kernelgen
>> --disable-mpi-interface-warning --with-cuda=/opt/cuda
>> --with-cuda-libdir=/usr/lib
>>
>> $ make -j8
>> ...
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudartwrap.c:145:14:
>> error: conflicting types for 'cudaGetSymbolAddress'
>> /usr/local/cuda/include/cuda_runtime_api.h:4261:39: note: previous
>> declaration of 'cudaGetSymbolAddress' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudartwrap.c:164:14:
>> error: conflicting types for 'cudaGetSymbolSize'
>> /usr/local/cuda/include/cuda_runtime_api.h:4283:39: note: previous
>> declaration of 'cudaGetSymbolSize' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudartwrap.c:392:14:
>> error: conflicting types for 'cudaGetTextureReference'
>> /usr/local/cuda/include/cuda_runtime_api.h:5060:39: note: previous
>> declaration of 'cudaGetTextureReference' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudartwrap.c:501:14:
>> error: conflicting types for 'cudaFuncGetAttributes'
>> /usr/local/cuda/include/cuda_runtime_api.h:2242:58: note: previous
>> declaration of 'cudaFuncGetAttributes' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudartwrap.c:969:14:
>> error: conflicting types for 'cudaGetSurfaceReference'
>> /usr/local/cuda/include/cuda_runtime_api.h:5112:39: note: previous
>> declaration of 'cudaGetSurfaceReference' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudartwrap.c:1565:14:
>> error: conflicting types for 'cudaFuncSetSharedMemConfig'
>> /usr/local/cuda/include/cuda_runtime_api.h:2173:39: note: previous
>> declaration of 'cudaFuncSetSharedMemConfig' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudart.c:2294:14: error:
>> conflicting types for 'cudaMemcpyToSymbol'
>> /usr/local/cuda/include/cuda_runtime_api.h:3608:39: note: previous
>> declaration of 'cudaMemcpyToSymbol' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudart.c:2310:14: error:
>> conflicting types for 'cudaMemcpyFromSymbol'
>> /usr/local/cuda/include/cuda_runtime_api.h:3643:39: note: previous
>> declaration of 'cudaMemcpyFromSymbol' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudart.c:2423:14: error:
>> conflicting types for 'cudaMemcpyToSymbolAsync'
>> /usr/local/cuda/include/cuda_runtime_api.h:3990:39: note: previous
>> declaration of 'cudaMemcpyToSymbolAsync' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudart.c:2439:14: error:
>> conflicting types for 'cudaMemcpyFromSymbolAsync'
>> /usr/local/cuda/include/cuda_runtime_api.h:4032:39: note: previous
>> declaration of 'cudaMemcpyFromSymbolAsync' was here
>> ../../../../../../ompi/contrib/vt/vt/vtlib/vt_cudart.c:2534:14: error:
>> conflicting types for 'cudaLaunch'
>> /usr/local/cuda/include/cuda_runtime_api.h:2209:39: note: previous
>> declaration of 'cudaLaunch' was here
>>
>> Best regards,
>> - Dima.
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] The hostfile option

2012-07-31 Thread Rolf vandeVaart
>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Ralph Castain
>Sent: Monday, July 30, 2012 9:29 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] The hostfile option
>
>
>On Jul 30, 2012, at 2:37 AM, George Bosilca wrote:
>
>> I think that as long as there is a single home area per cluster the 
>> difference
>between the different approaches might seem irrelevant to most of the
>people.
>
>Yeah, I agree - after thinking about it, it probably didn't accomplish much.
>
>>
>> My problem is twofold. First, I have a common home area across several
>different development clusters. Thus I have direct access through ssh to any
>machine. If I create a single large machinefile, it turns out that every mpirun
>will spawn a daemon on every single node, even if I only run a ping-pong test.
>
>That shouldn't happen if you specify the hosts you want to use, either via -
>host or -hostfile. I assume you are specifying nothing and so you get that
>behavior?
>
>> Second, while I usually run my apps on the same set of resources I need on
>a regular base to switch my nodes for few tests.
>>
>> What I was hoping to achieve is a machinefile containing the "default"
>development cluster (aka. the cluster where I'm almost alone so my deamons
>have minimal chances to disturb other people experiences), and then use a
>machinefile to sporadicly change the cluster where I run for smaller tests.
>Unfortunately, this doesn't work due to the filtering behavior described in my
>original email.
>
>Why not just set the default hostfile to point to the new machinefile via the 
>"-
>-default-hostfile foo" option to mpirun, or you can use the corresponding
>MCA param?
>
>I'm not trying to re-open the hostfile discussion, but I would be interested to
>hear how you feel -hostfile should work. I kinda gather you feel it should
>override the default hostfile instead of filter it, yes? My point being that I
>don't particularly know if anyone would disagree with that behavior, so we
>might decide to modify things if you want to propose it.
>
>Ralph
>

I wrote up the whole description in the Wiki a long while ago because there was 
a lot of confusion about
how things should behave with a resource manager.   The general thought was 
that folks thought of hostfile
and host as a filter when running with a resource manager. 

I never wrote anything about the case you are describing, with the hostfile 
filtering the default hostfile.
I would have assumed that the precedence of hostfile that you desire would be 
the way things work.
Therefore, I am fine if we change it with respect to default hostfile and 
hostfile.

The wiki reference is here: https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan


>>
>>
>> On Jul 28, 2012, at 19:24 , Ralph Castain wrote:
>>
>>> It's been awhile, but I vaguely remember the discussion. IIRC, the rationale
>was that the default hostfile was equivalent to an RM allocation and should be
>treated the same. So hostfile and -host become filters in that case.
>>>
>>> FWIW, I believe the discussion was split on that question. I added a "none"
>option to the default hostfile MCA param so it would be ignored in the case
>where (a) the sys admin has given a default hostfile, but (b) someone wants
>to use hosts outside of it.
>>>
>>>   MCA orte: parameter "orte_default_hostfile" (current value:
>, data source: default value)
>>> Name of the default hostfile (relative or absolute 
>>> path, "none"
>to ignore environmental or default MCA param setting)
>>>
>>> That said, I can see a use-case argument for behaving somewhat
>differently. We've even had cases where users have gotten an allocation from
>an RM, but want to add hosts that are external to the cluster to the job.
>>>
>>> It would be rather trivial to modify the logic:
>>>
>>> 1. read the default hostfile or RM allocation for our baseline
>>>
>>> 2. remove any hosts on that list that are *not* in the given hostfile
>>>
>>> 3. add any hosts that are in the given hostfile, but weren't in the default
>hostfile
>>>
>>> And subsequently do the same for -host. I think that would retain the spirit
>of the discussion, but provide more flexibility and provide a tad more
>"expected" behavior.
>>>
>>> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to
>implement whatever the community would like to see.
>>> Ralph
>>>
>>> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
>>>
 I'm somewhat puzzled by the behavior of the -hostfile in Open MPI.
>Based on the FAQ it is supposed to provide a list of resources to be used by
>the launcher (in my case ssh) to start the processes. Make sense so far.

 However, if the configuration file contain a value for
>orte_default_hostfile, then the behavior of the hostfile option change
>drastically, and the option become a filter (the machines must be on the
>original list or a cryptic error message is displayed).

[OMPI devel] FW: add asynchronous copies for large GPU buffers

2012-07-10 Thread Rolf vandeVaart
Adding a timeout to this RFC.

TIMEOUT: July 17, 2012

rvandeva...@nvidia.com
781-275-5358

-Original Message-
From: Rolf vandeVaart 
Sent: Wednesday, June 27, 2012 6:13 PM
To: de...@open-mpi.org
Subject: RFC: add asynchronous copies for large GPU buffers

WHAT: Add support for doing asynchronous copies of GPU memory with larger 
messages.
WHY: Improve performance for sending/receiving of larger GPU messages over IB
WHERE: ob1, openib, and convertor code.  All is protected by compiler directives
   so no effect on non-CUDA builds.
REFERENCE BRANCH: https://bitbucket.org/rolfv/ompi-trunk-cuda-async

DETAILS:
When sending/receiving GPU memory through IB, all data first passes into host 
memory.
The copy of GPU memory into and out of the host memory can be done 
asynchronously to improve performance.  This RFC adds that feature for the 
fragments of larger messages.

On the sending side, the completion function is essentially broken in two.  The 
first function is called when the copy completes which then initiates the send. 
 When the send completes, the second function is called.

Likewise, on the receiving side, a callback is called when the fragment arrives 
which initiates the copy of the data out of the buffer.  When the copy 
completes, a second function is called which also calls back into the BTL so it 
can free resources that were being used.

M   opal/datatype/opal_datatype_copy.c
M   opal/datatype/opal_convertor.c
M   opal/datatype/opal_convertor.h
M   opal/datatype/opal_datatype_cuda.c
M   opal/datatype/opal_datatype_cuda.h
M   opal/datatype/opal_datatype_unpack.c
M   opal/datatype/opal_datatype_pack.h
M   opal/datatype/opal_datatype_unpack.h
M   ompi/mca/btl/btl.h
M   ompi/mca/btl/openib/btl_openib_component.c
M   ompi/mca/btl/openib/btl_openib.c
M   ompi/mca/btl/openib/btl_openib.h
M   ompi/mca/btl/openib/btl_openib_mca.c
M   ompi/mca/pml/ob1/pml_ob1_recvfrag.c
M   ompi/mca/pml/ob1/pml_ob1_sendreq.c
M   ompi/mca/pml/ob1/pml_ob1_progress.c
M   ompi/mca/pml/ob1/pml_ob1_recvreq.c
M   ompi/mca/pml/ob1/pml_ob1_cuda.c
M   ompi/mca/pml/ob1/pml_ob1_recvreq.h
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] RFC: add asynchronous copies for large GPU buffers

2012-06-27 Thread Rolf vandeVaart
Whoops.  Fixed.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Nathan Hjelm
>Sent: Wednesday, June 27, 2012 6:20 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] RFC: add asynchronous copies for large GPU
>buffers
>
>Can you make your repository public or add me to the access list?
>
>-Nathan
>
>On Wed, Jun 27, 2012 at 03:12:34PM -0700, Rolf vandeVaart wrote:
>> WHAT: Add support for doing asynchronous copies of GPU memory with
>larger messages.
>> WHY: Improve performance for sending/receiving of larger GPU messages
>> over IB
>> WHERE: ob1, openib, and convertor code.  All is protected by compiler
>directives
>>so no effect on non-CUDA builds.
>> REFERENCE BRANCH: https://bitbucket.org/rolfv/ompi-trunk-cuda-async
>>
>> DETAILS:
>> When sending/receiving GPU memory through IB, all data first passes into
>host memory.
>> The copy of GPU memory into and out of the host memory can be done
>> asynchronously to improve performance.  This RFC adds that feature for the
>fragments of larger messages.
>>
>> On the sending side, the completion function is essentially broken in
>> two.  The first function is called when the copy completes which then
>> initiates the send.  When the send completes, the second function is called.
>>
>> Likewise, on the receiving side, a callback is called when the
>> fragment arrives which initiates the copy of the data out of the
>> buffer.  When the copy completes, a second function is called which
>> also calls back into the BTL so it can free resources that were being used.
>>
>> M   opal/datatype/opal_datatype_copy.c
>> M   opal/datatype/opal_convertor.c
>> M   opal/datatype/opal_convertor.h
>> M   opal/datatype/opal_datatype_cuda.c
>> M   opal/datatype/opal_datatype_cuda.h
>> M   opal/datatype/opal_datatype_unpack.c
>> M   opal/datatype/opal_datatype_pack.h
>> M   opal/datatype/opal_datatype_unpack.h
>> M   ompi/mca/btl/btl.h
>> M   ompi/mca/btl/openib/btl_openib_component.c
>> M   ompi/mca/btl/openib/btl_openib.c
>> M   ompi/mca/btl/openib/btl_openib.h
>> M   ompi/mca/btl/openib/btl_openib_mca.c
>> M   ompi/mca/pml/ob1/pml_ob1_recvfrag.c
>> M   ompi/mca/pml/ob1/pml_ob1_sendreq.c
>> M   ompi/mca/pml/ob1/pml_ob1_progress.c
>> M   ompi/mca/pml/ob1/pml_ob1_recvreq.c
>> M   ompi/mca/pml/ob1/pml_ob1_cuda.c
>> M   ompi/mca/pml/ob1/pml_ob1_recvreq.h
>> --
>> - This email message is for the sole use of the intended
>> recipient(s) and may contain confidential information.  Any
>> unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> --
>> -
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] RFC: add asynchronous copies for large GPU buffers

2012-06-27 Thread Rolf vandeVaart
WHAT: Add support for doing asynchronous copies of GPU memory with larger 
messages.
WHY: Improve performance for sending/receiving of larger GPU messages over IB
WHERE: ob1, openib, and convertor code.  All is protected by compiler directives
   so no effect on non-CUDA builds.
REFERENCE BRANCH: https://bitbucket.org/rolfv/ompi-trunk-cuda-async

DETAILS:
When sending/receiving GPU memory through IB, all data first passes into host 
memory.
The copy of GPU memory into and out of the host memory can be done 
asynchronously
to improve performance.  This RFC adds that feature for the fragments of larger 
messages.

On the sending side, the completion function is essentially broken in two.  The 
first function
is called when the copy completes which then initiates the send.  When the send 
completes,
the second function is called.

Likewise, on the receiving side, a callback is called when the fragment arrives 
which 
initiates the copy of the data out of the buffer.  When the copy completes, a 
second
function is called which also calls back into the BTL so it can free resources 
that
were being used.

M   opal/datatype/opal_datatype_copy.c
M   opal/datatype/opal_convertor.c
M   opal/datatype/opal_convertor.h
M   opal/datatype/opal_datatype_cuda.c
M   opal/datatype/opal_datatype_cuda.h
M   opal/datatype/opal_datatype_unpack.c
M   opal/datatype/opal_datatype_pack.h
M   opal/datatype/opal_datatype_unpack.h
M   ompi/mca/btl/btl.h
M   ompi/mca/btl/openib/btl_openib_component.c
M   ompi/mca/btl/openib/btl_openib.c
M   ompi/mca/btl/openib/btl_openib.h
M   ompi/mca/btl/openib/btl_openib_mca.c
M   ompi/mca/pml/ob1/pml_ob1_recvfrag.c
M   ompi/mca/pml/ob1/pml_ob1_sendreq.c
M   ompi/mca/pml/ob1/pml_ob1_progress.c
M   ompi/mca/pml/ob1/pml_ob1_recvreq.c
M   ompi/mca/pml/ob1/pml_ob1_cuda.c
M   ompi/mca/pml/ob1/pml_ob1_recvreq.h
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] RFC: hide btl segment keys within btl

2012-06-18 Thread Rolf vandeVaart
Hi Nathan:
I downloaded and tried it out.  There were a few issues that I had to work 
through, but finally got things working.
Can you apply this patch to your changes prior to checking things in?

I also would suggest configuring with --enable-picky as there are something 
like 10 warnings generated due to your changes.  And check for tabs.

Otherwise, I think it is good.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of George Bosilca
>Sent: Saturday, June 16, 2012 12:49 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] RFC: hide btl segment keys within btl
>
>Looks good to me. I would add some checks regarding the number and size of
>the segments and the allocated space (MCA_BTL_SEG_MAX_SIZE) to make
>sure we never hit the corner case where there are too many segments
>compared with the available space. And add a huge comment in the btl.h
>about the fact that mca_btl_base_segment_t should be used with extreme
>care.
>
>  george.
>
>On Jun 14, 2012, at 18:42 , Jeff Squyres wrote:
>
>> This sounds like a good thing to me.  +1
>>
>> On Jun 13, 2012, at 12:58 PM, Nathan Hjelm wrote:
>>
>>> What: hide btl segment keys from PML/OSC code.
>>>
>>> Why: As it stands new BTLs with larger segment keys (smcuda for example)
>require changes in both OSC/rdma as well as the PMLs. This RFC makes will
>make changes in segment keys transparent to all btl users.
>>>
>>> When: The changes are very straight-forward so I am setting the timeout
>for this to June 22, 2012
>>>
>>> Where: See the attached patch or check out the bitbucket
>http://bitbucket.org/hjelmn/ompi-btl-interface-update
>>>
>>> All the relevant PMLs/BTLs + OSC/rdma have been updated with the
>exception of btl/wv. I have also tested the following components:
>>> - ob1
>>> - csum
>>> - bfo
>>> - ugni (now works with MPI one-sides)
>>> - sm
>>> - vader
>>> - openib (in progress)
>>>
>>> Brian and Rolf, please take a look at your components and let me know if I
>screwed anything up.
>>>
>>> -Nathan Hjelm
>>> HPC-3, LANL
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


cuda-fixes.diff
Description: cuda-fixes.diff


[OMPI devel] Modified files after autogen

2012-05-23 Thread Rolf vandeVaart
After doing a fresh checkout of the trunk, and then running autogen, I see this:

M   opal/mca/event/libevent2019/libevent/Makefile.in
M   opal/mca/event/libevent2019/libevent/depcomp
M   opal/mca/event/libevent2019/libevent/include/Makefile.in
M   opal/mca/event/libevent2019/libevent/configure
M   opal/mca/event/libevent2019/libevent/config.guess
M   opal/mca/event/libevent2019/libevent/config.sub
M   opal/mca/event/libevent2019/libevent/missing
M   opal/mca/event/libevent2019/libevent/aclocal.m4
M   opal/mca/event/libevent2019/libevent/install-sh
?   orte/mca/common/Makefile.in
?   orte/mca/common/pmi/Makefile.in

It looks like the autogen that gets run in libevent makes a variety of 
modifications to these files so they showed up as modified under svn.

I am just curious if this is the expected behavior as this seems somewhat new 
to me.

Rolf


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] mca_btl_tcp_alloc

2012-04-04 Thread Rolf vandeVaart
Here is my explanation.  The call to MCA_BTL_TCP_FRAG_ALLOC_EAGER or 
MCA_BTL_TCP_FRAG_ALLOC_MAX allocate a chunk of memory that has space for both 
the fragment as well as any payload.  So, when we do the frag+1, we are setting 
the pointer in the frag to point where the payload of the message lives.  This 
payload contains the PML header information and potentially the user's buffer.  
 So, that allocation is actually returning something like 64K for the eager 
allocation and 128K for the max allocation.  If you look at btl_tcp_component.c 
in the function mca_btl_tcp_component_init() you can see where the eager and 
max free lists are initialized.

In the case of TCP, there are two segments.  The first segment will contain the 
PML header information.   If the buffer being sent (or received) is contiguous, 
then the rest of the space allocated is not used.  Rather, the second segment 
will point to the user's buffer as there is no need to first copy it into a 
buffer.  If the buffer being sent (or received) is non-contiguous, then the 
data is first copied into the allocated space as it needs to be packed.

Does that make sense?


>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Alex Margolin
>Sent: Wednesday, April 04, 2012 9:23 AM
>To: Open MPI Developers
>Subject: [OMPI devel] mca_btl_tcp_alloc
>
>Hi,
>
>As I'm working out the bugs in my component I used TCP as reference and
>came across the following:
>In mca_btl_tcp_alloc (openmpi-trunk/ompi/mca/btl/tcp/btl_tcp.c:188) the
>first segment is initialized to point to "frag + 1".
>I don't get it... how/when is this location allocated? Isn't it just after the
>mca_btl_tcp_frag_t structure ends?
>
>Thanks,
>Alex
>
>mca_btl_base_descriptor_t* mca_btl_tcp_alloc(
> struct mca_btl_base_module_t* btl,
> struct mca_btl_base_endpoint_t* endpoint,
> uint8_t order,
> size_t size,
> uint32_t flags)
>{
> mca_btl_tcp_frag_t* frag = NULL;
> int rc;
>
> if(size <= btl->btl_eager_limit) {
> MCA_BTL_TCP_FRAG_ALLOC_EAGER(frag, rc);
> } else if (size <= btl->btl_max_send_size) {
> MCA_BTL_TCP_FRAG_ALLOC_MAX(frag, rc);
> }
> if( OPAL_UNLIKELY(NULL == frag) ) {
> return NULL;
> }
>
> frag->segments[0].seg_len = size;
> frag->segments[0].seg_addr.pval = frag+1;
>
> frag->base.des_src = frag->segments;
> frag->base.des_src_cnt = 1;
> frag->base.des_dst = NULL;
> frag->base.des_dst_cnt = 0;
> frag->base.des_flags = flags;
> frag->base.order = MCA_BTL_NO_ORDER;
> frag->btl = (mca_btl_tcp_module_t*)btl;
> return (mca_btl_base_descriptor_t*)frag; }
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI devel] memory bind warning with -bind-to-core and -bind-to-socket

2012-03-14 Thread Rolf vandeVaart
I am running a simple test and using the -bind-to-core or -bind-to-socket 
options.  I think the CPU binding is working fine, but I see these warnings 
about not being able to bind to memory.   Is this expected?  This is trunk code 
(266128)

[dt]$ mpirun --report-bindings -np 2 -bind-to-core connectivity_c
--
WARNING: a request was made to bind a process. While the system supports 
binding the process itself, at least one node does NOT support binding memory 
to the process location.

  Node:  dt

This is a warning only; your job will continue, though performance may be 
degraded.
--
[dt:03600] [[52612,0],0] odls:default binding child [[52612,1],1] to cpus 1,5
[dt:03600] [[52612,0],0] odls:default binding child [[52612,1],0] to cpus 0,4
[dt:03601] [[52612,1],0] is bound to cpus 0,4 
[dt:03602] [[52612,1],1] is bound to cpus 1,5
Connectivity test on 2 processes PASSED.

I see this on two different clusters.

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Rolf vandeVaart
[Comment at bottom]
>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Nathan Hjelm
>Sent: Friday, March 09, 2012 2:23 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106
>
>
>
>On Fri, 9 Mar 2012, Jeffrey Squyres wrote:
>
>> On Mar 9, 2012, at 1:32 PM, Nathan Hjelm wrote:
>>
>>> An mpool that is aware of local processes lru's will solve the problem in
>most cases (all that I have seen)
>>
>> I agree -- don't let words in my emails make you think otherwise.  I think 
>> this
>will fix "most" problems, but undoubtedly, some will still occur.
>>
>> What's your timeline for having this ready -- should it go to 1.5.5, or 1.6?
>>
>> More specifically: if it's immanent, and can go to v1.5, then the openib
>message is irrelevant and should not be used (and backed out of the trunk).  If
>it's going to take a little bit, I'm ok leaving the message in v1.5.5 for now.
>
>I wrote the prototype yesterday (after finding that limiting the lru doesn't
>work for uGNI-- @256 pes we could only register ~1400 item instead of the
>3600 max we saw @128). I should have a version ready for review next week
>and a final version by the end of the month.
>
>
>BTW, can anyone tell me why each mpool defines
>mca_mpool_base_resources_t instead of defining
>mca_mpool_blah_resources_t. The current design makes it impossible to
>support more than one mpool in a btl. I can delete a bunch of code if I can
>make a btl fall back on the rdma mpool if leave_pinned is not set.
>
>-Nathan

I ran into this same issue about wanting to use more than one mpool in a btl.  
I expected that there might be a base resource structure that was extended by 
each mpool.  I talked with Jeff and he told me (if I recall correctly) that the 
reason was because there was no common information in any of the 
mca_mpool_base_resources_t structures so there was no need to have a base 
structure.  I do not think there is any reason we cannot do it as you suggest.

[The one other place I have seen it done like this in the library is the 
mca_btl_base_endpoint_t which is defined differently for each BTL]


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26039

2012-02-24 Thread Rolf vandeVaart
Hi Jeff:

It is set in opal/config/opal_configure_options.m4



>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Jeffrey Squyres
>Sent: Friday, February 24, 2012 6:07 AM
>To: de...@open-mpi.org
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26039
>
>Rolf --
>
>In looking at configure.m4, where does $CUDA_SUPPORT_41 get set?
>
>AS_IF([test "x$CUDA_SUPPORT_41" = "x1"]
>
>
>On Feb 23, 2012, at 9:13 PM, ro...@osl.iu.edu wrote:
>
>> Author: rolfv
>> Date: 2012-02-23 21:13:33 EST (Thu, 23 Feb 2012) New Revision: 26039
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/26039
>>
>> Log:
>> New btl that extends sm btl to support GPU transfers within a node.
>> Uses new CUDA IPC support.  Also, a few minor changes in PML to take
>> advantage of it.
>>
>> This code has no effect unless user asks for it explicitly via
>> configure arguments.  Otherwise, it is either #ifdef'ed out or not
>> compiled.
>>
>>
>> Added:
>>   trunk/contrib/check-btl-sm-diffs.pl   (contents, props changed)
>>   trunk/ompi/mca/btl/smcuda/   (props changed)
>>   trunk/ompi/mca/btl/smcuda/Makefile.am
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda.c
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda.h
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_component.c
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_endpoint.h
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_fifo.h
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_frag.c
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_frag.h
>>   trunk/ompi/mca/btl/smcuda/configure.m4
>>   trunk/ompi/mca/btl/smcuda/help-mpi-btl-smcuda.txt
>>   trunk/ompi/mca/pml/ob1/pml_ob1_cuda.c
>> Text files modified:
>>   trunk/ompi/mca/btl/btl.h |14 ++
>>   trunk/ompi/mca/pml/ob1/Makefile.am   | 7 +++
>>   trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c |32
>
>>   trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c |10 +-
>>   trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h |17 +++--
>>   5 files changed, 73 insertions(+), 7 deletions(-)
>
>
>--
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] RFC: Allocate free list payload if free list isn't specified

2012-02-21 Thread Rolf vandeVaart
I think I am OK with this.  

Alternatively, you could have done something like is done in the TCP BTL where 
the payload and header are added together for the frag size?
To state more clearly, I was trying to say you could do something similar to 
what is done at line 1015 in btl_tcp_component.c and ended up with the same 
results?

This is just making the payload buffer a different chunk of memory than the 
headers?

I am just trying to understand the motivation for the change.

I think the way you have it is more correct so we can support the case where 
someone specifies the header size and the payload size differently and expects 
the free list code to do the right thing.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Nathan Hjelm
>Sent: Tuesday, February 21, 2012 3:59 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] RFC: Allocate free list payload if free list isn't
>specified
>
>Opps, screwed up the title. Should be: RFC: Allocate requested free list
>payload even if an mpool isn't specified.
>
>-Nathan
>
>On Tue, 21 Feb 2012, Nathan Hjelm wro
>
>> What: Allocate free list payload even if a payload size is specified
>> even if no mpool is specified.
>>
>> When: Thursday, Feb 23, 2012
>>
>> Why: The current behavior is to ignore the payload size if no mpool is
>> specified. I see no reason why a payload buffer should't be allocated
>> in the no mpool case. Thoughts?
>>
>> Patch is attached.
>>
>> -Nathan Hjelm
>> HPC-3, LANL
>>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] MVAPICH2 vs Open-MPI

2012-02-14 Thread Rolf vandeVaart
There are several things going on here that make their library perform better.

With respect to inter-node performance, both MVAPICH2 and Open MPI copy the GPU 
memory into host memory first.  However, they are using special host buffers 
that and a code path that allows them to copy the data asynchronously and 
therefore do a better job pipelining than Open MPI.  I believe their host 
buffers are bigger which works better at larger messages.  Open MPI just piggy 
backs on the existing host buffers in the Open MPI openib BTL.  Open MPI also 
just uses synchronous copies .  (There is hope to improve that)

Secondly, with respect to intra-node performance, they are using the Inter 
Process Communication feature of CUDA which means that within a node, one can 
move GPU memory directly from one GPU to another.  We have an RFC from December 
to add this into Open MPI as well, but do not have approval yet.  Hopefully 
sometime soon.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Rayson Ho
>Sent: Tuesday, February 14, 2012 4:16 PM.
>To: Open MPI Developers
>Subject: [OMPI devel] MVAPICH2 vs Open-MPI
>
>See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it
>something that they are doing to optimize for CUDA & GPUs and those
>optimizations are not in OMPI, or did they specifically tune MVAPICH2 to
>make it shine??
>
>http://hpcadvisorycouncil.com/events/2012/Israel-
>Workshop/Presentations/7_OSU.pdf
>
>The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/
>
>Rayson
>
>=
>Open Grid Scheduler / Grid Engine
>http://gridscheduler.sourceforge.net/
>
>Scalable Grid Engine Support Program
>http://www.scalablelogic.com/
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Rolf vandeVaart
Yes, the step outlined in your second bullet is no longer necessary.

Rolf


From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 5:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

Thank you very much. I will try setting the environment variable and if 
required also use the 4.1 RC2 version.

To clarify things a little bit for me, to set up my machine with GPUDirect v1 I 
did the following:

* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA  >= 4.0  make one of the above steps  redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is  
not needed any more?

Sebastian.

Rolf vandeVaart wrote:

I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.

Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
support (you do not need to set an environment variable for one)

 http://developer.nvidia.com/cuda-toolkit-41



There is also a chance that setting the environment variable as outlined in 
this link may help you.

http://forums.nvidia.com/index.php?showtopic=200629



However, I cannot explain why MVAPICH would work and Open MPI would not.



Rolf





-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-boun...@open-mpi.org]

On Behalf Of Sebastian Rinke

Sent: Tuesday, January 17, 2012 12:08 PM

To: Open MPI Developers

Subject: Re: [OMPI devel] GPUDirect v1 issues



I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.



Attached you find a little test case which is based on the GPUDirect v1 test

case (mpi_pinned.c).

In that program the sender splits a message into chunks and sends them

separately to the receiver which posts the corresponding recvs. It is a kind of

pipelining.



In mpi_pinned.c:141 the offsets into the recv buffer are set.

For the correct offsets, i.e. increasing them, it blocks with Open MPI.



Using line 142 instead (offset = 0) works.



The tarball attached contains a Makefile where you will have to adjust



* CUDA_INC_DIR

* CUDA_LIB_DIR



Sebastian



On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:





Also, which version of MVAPICH2 did you use?



I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)

vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.



Ken

-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-bounces@open-



mpi.org]



On Behalf Of Rolf vandeVaart

Sent: Tuesday, January 17, 2012 7:54 AM

To: Open MPI Developers

Subject: Re: [OMPI devel] GPUDirect v1 issues



I am not aware of any issues.  Can you send me a test program and I

can try it out?

Which version of CUDA are you using?



Rolf





-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-bounces@open-



mpi.org]



On Behalf Of Sebastian Rinke

Sent: Tuesday, January 17, 2012 8:50 AM

To: Open MPI Developers

Subject: [OMPI devel] GPUDirect v1 issues



Dear all,



I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking

MPI_SEND/RECV to block forever.



For two subsequent MPI_RECV, it hangs if the recv buffer pointer of

the second recv points to somewhere, i.e. not at the beginning, in

the recv buffer (previously allocated with cudaMallocHost()).



I tried the same with MVAPICH2 and did not see the problem.



Does anybody know about issues with GPUDirect v1 using Open MPI?



Thanks for your help,

Sebastian

___

devel mailing list

de...@open-mpi.org<mailto:de...@open-mpi.org>

http://www.open-mpi.org/mailman/listinfo.cgi/devel



---

This email message is for the sole use of the intended recipient(s) and may 
contain

confidential information.  Any unauthorized review, use, disclosure or 
distribution

is prohibited.  If you are not the intended recipient, please contact the 
sender by

reply email and destroy all copies of the original message.

---



___

devel mailing list

de...@open-mpi.org<mailto:de...@open-mpi.org>

http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Rolf vandeVaart
I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
support (you do not need to set an environment variable for one)
 http://developer.nvidia.com/cuda-toolkit-41

There is also a chance that setting the environment variable as outlined in 
this link may help you.
http://forums.nvidia.com/index.php?showtopic=200629

However, I cannot explain why MVAPICH would work and Open MPI would not.  

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Tuesday, January 17, 2012 12:08 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] GPUDirect v1 issues
>
>I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
>
>Attached you find a little test case which is based on the GPUDirect v1 test
>case (mpi_pinned.c).
>In that program the sender splits a message into chunks and sends them
>separately to the receiver which posts the corresponding recvs. It is a kind of
>pipelining.
>
>In mpi_pinned.c:141 the offsets into the recv buffer are set.
>For the correct offsets, i.e. increasing them, it blocks with Open MPI.
>
>Using line 142 instead (offset = 0) works.
>
>The tarball attached contains a Makefile where you will have to adjust
>
>* CUDA_INC_DIR
>* CUDA_LIB_DIR
>
>Sebastian
>
>On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
>
>> Also, which version of MVAPICH2 did you use?
>>
>> I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
>> vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
>>
>> Ken
>> -----Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>mpi.org]
>> On Behalf Of Rolf vandeVaart
>> Sent: Tuesday, January 17, 2012 7:54 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>
>> I am not aware of any issues.  Can you send me a test program and I
>> can try it out?
>> Which version of CUDA are you using?
>>
>> Rolf
>>
>>> -Original Message-
>>> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>mpi.org]
>>> On Behalf Of Sebastian Rinke
>>> Sent: Tuesday, January 17, 2012 8:50 AM
>>> To: Open MPI Developers
>>> Subject: [OMPI devel] GPUDirect v1 issues
>>>
>>> Dear all,
>>>
>>> I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
>>> MPI_SEND/RECV to block forever.
>>>
>>> For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
>>> the second recv points to somewhere, i.e. not at the beginning, in
>>> the recv buffer (previously allocated with cudaMallocHost()).
>>>
>>> I tried the same with MVAPICH2 and did not see the problem.
>>>
>>> Does anybody know about issues with GPUDirect v1 using Open MPI?
>>>
>>> Thanks for your help,
>>> Sebastian
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Rolf vandeVaart
I am not aware of any issues.  Can you send me a test program and I can try it 
out?
Which version of CUDA are you using?

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Tuesday, January 17, 2012 8:50 AM
>To: Open MPI Developers
>Subject: [OMPI devel] GPUDirect v1 issues
>
>Dear all,
>
>I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
>MPI_SEND/RECV to block forever.
>
>For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the
>second recv points to somewhere, i.e. not at the beginning, in the recv buffer
>(previously allocated with cudaMallocHost()).
>
>I tried the same with MVAPICH2 and did not see the problem.
>
>Does anybody know about issues with GPUDirect v1 using Open MPI?
>
>Thanks for your help,
>Sebastian
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] RDMA with non-contiguous payload

2012-01-04 Thread Rolf vandeVaart
Your observations are correct.  If the payload is non-contiguous, then RDMA is 
not used.  The data has to be copied first into an intermediate buffer and then 
sent.
This has not changed in later version of Open MPI.

Rolf  

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Wednesday, January 04, 2012 10:01 AM
>To: Open MPI Developers
>Subject: [OMPI devel] RDMA with non-contiguous payload
>
>Dear all,
>
>Playing around with GPUDirect v1 and Infiniband I noticed that once the
>payload is non-contiguous no RDMA is used at all.
>Can anybody confirm this?
>
>I'm using Open MPI  1.4.3. If the above is true, has this behavior changed with
>later versions of Open MPI?
>
>Thanks a lot.
>
>Best,
>Sebastian
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI devel] New smcuda BTL that optimizes intra-node GPU to GPU memory transfers

2011-12-09 Thread Rolf vandeVaart
WHAT: Add new sm BTL, and supporting mpools, that can also support CUDA RDMA.

WHY: With CUDA 4.1, there is some GPU IPC support available that we can take 
advantage of to move data efficiently between GPUs within a node.

WHERE: new--> ompi/mca/btl/smcuda, ompi/mca/mpool/cuda, ompi/mca/mpool/rcuda 
Along with a few minor changes in ob1.  These new components are only built if 
explicitly asked for by configure.  Otherwise, new components are not built, 
and there are no changes within normal code paths.
(Jeff's rule: Do no harm)

WHEN: Two weeks from now, December 23, 2011

DETAILS: There is the ability to improve that transfer of GPU memory between 
GPUs within a node by making use of some IPC support that is soon to be 
available with CUDA 4.1.  These changes take advantage of that to implement a 
RDMA GET protocol for GPU memory.

To prevent any pollution within existing sm BTL, a new one has been created 
that has the added RDMA GET support.  In addition, two new memory pools are 
needed as well which are being added.  One of the memory pools is very simple 
whereas the second one is patterned after the rdma memory pool.

Changes can be viewed at:
https://bitbucket.org/rolfv/ompi-trunk-cuda-rdma-3/changeset/29f3255cd2b8

M   ompi/mca/btl/btl.h
A   ompi/mca/btl/smcuda
A   ompi/mca/btl/smcuda/btl_smcuda_component.c
A   ompi/mca/btl/smcuda/configure.m4
A   ompi/mca/btl/smcuda/btl_smcuda_frag.h
A   ompi/mca/btl/smcuda/help-mpi-btl-smcuda.txt
A   ompi/mca/btl/smcuda/btl_smcuda_endpoint.h
A   ompi/mca/btl/smcuda/btl_smcuda.h
A   ompi/mca/btl/smcuda/btl_smcuda_fifo.h
A   ompi/mca/btl/smcuda/Makefile.am
A   ompi/mca/btl/smcuda/btl_smcuda_frag.c
A   ompi/mca/btl/smcuda/btl_smcuda.c
A   ompi/mca/mpool/cuda
A   ompi/mca/mpool/cuda/configure.m4
A   ompi/mca/mpool/cuda/mpool_cuda_component.c
A   ompi/mca/mpool/cuda/mpool_cuda_module.c
A   ompi/mca/mpool/cuda/mpool_cuda.h
A   ompi/mca/mpool/cuda/Makefile.am
A   ompi/mca/mpool/rcuda
A   ompi/mca/mpool/rcuda/configure.m4
A   ompi/mca/mpool/rcuda/mpool_rcuda_component.c
A   ompi/mca/mpool/rcuda/Makefile.am
A   ompi/mca/mpool/rcuda/mpool_rcuda_module.c
A   ompi/mca/mpool/rcuda/mpool_rcuda.h
M   ompi/mca/common/cuda/configure.m4
M   ompi/mca/common/cuda/common_cuda.c
M   ompi/mca/common/cuda/help-mpi-common-cuda.txt
M   ompi/mca/common/cuda/common_cuda.h
M   ompi/mca/pml/ob1/pml_ob1_sendreq.c
M   ompi/mca/pml/ob1/pml_ob1_sendreq.h
M   ompi/mca/pml/ob1/pml_ob1_recvreq.c
A   ompi/mca/pml/ob1/pml_ob1_cuda.c
M   ompi/mca/pml/ob1/Makefile.am

Rolf

rvandeva...@nvidia.com
781-275-5358


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] RFC: new btl descriptor flags

2011-11-29 Thread Rolf vandeVaart
This may seem trivial, but should we name them:

#define MCA_BTL_DES_FLAGS_PUT 0x0010
#define MCA_BTL_DES_FLAGS_GET 0x0020

Although I see there is some inconsistency in how these flags are named, two of 
the three original ones have "BTL_DES_FLAGS" in them.

Rolf 

rvandeva...@nvidia.com
781-275-5358

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Nathan Hjelm
>Sent: Tuesday, November 29, 2011 12:43 PM
>To: Open MPI Developers
>Subject: [OMPI devel] RFC: new btl descriptor flags
>
>We need an accurate way to detect if prepare_src/prepare_dst are being
>called for a get or a put operation. I propose adding two new flags to the btl
>descriptor (and passing them from ob1/csum/etc):
>
>#define MCA_BTL_DES_PUT 0x0010
>#define MCA_BTL_DES_GET 0x0020
>
>Comments? Suggestions? Objections?
>
>-Nathan
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI devel] Remote key sizes

2011-11-08 Thread Rolf vandeVaart
>  george.
>
>PS: Regarding the hand-copy instead of the memcpy, we tried to avoid using
>memcpy in performance critical codes, especially when we know the size of
>the data and the alignment. This relieves the compiler of adding ugly 
>intrinsics,
>allowing it to nicely pipeline to load/stores. Anyway, with both approaches
>you will copy more data than needed for all BTLs except uGNI.

I was looking at a case in a BTL I was working on where I actually need 64 
bytes (yes, bytes) as the remote key size as opposed to the current 16 bytes 
(128 bits).
Not sure how I can handle that yet.  (I assume configure is my friend, but even 
in that case, all headers will need to carry around the extra data.)

Rolf

>
>On Nov 7, 2011, at 21:48 , Nathan T. Hjelm wrote:
>
>>
>>
>> On Mon, 7 Nov 2011 17:18:42 -0500, George Bosilca
>> 
>> wrote:
>>> A little bit of history:
>>>
>>> 1. r25305: added 2 atomic operations to OPAL. However, they only
>>> exists
>> on
>>> amd64 and are only used in the vader BTL, which I assume only
>>> supports amd64.
>>
>> Two things:
>> - The atomic is a new feature that has no impact on existing code. It
>> can also be implemented on Intel but we have not tested it (yet).
>> - The atomic was pushed to support lock-free queues in the Vader BTL.
>> Vader does not need the atomics and can use an atomic lock lock but I
>> see higher latencies when using locks.
>>
>> Why would this change (that has no impact on any other code) need an
>RFC?
>>
>>> 2. r25334: The seg_key union got a new member ptr. This member is
>>> solely used in the vader BTL, as all other BTL use a compiler trick
>>> to convert a pointer to a 64 bits.
>>
>> I am actually going to remove that member. I prefer the use of
>> uintptr_t over casting to a uint64_t but it has no real benefit and
>> possibly a pitfall due to its platform dependent size.
>>
>> But the member has, like the atomic, no impact on any exiting code. It
>> does not change the size of the seg_key and was only used by Vader.
>> Why would this change have required an RFC?
>>
>>> 3. r25445: All members of the seg_key union got friends, because Cray
>> dare
>>> to set their keys at 128 bits long. However a quick  find . -name
>>> "*.[ch]" -exec grep -Hn seg_key {} \; | grep "\[1\]"
>>> indicates that no BTL is using 128 bits keys. Code has been added to
>>> all PMLs, but I guess they just copy empty data.
>>
>> For now they copy empty data but in the near future (as I have said)
>> we will need to bits for the ugni btl (Cray XE Gemini). I pushed this
>> code to prepare for pushing ugni.
>>
>> Also, you might be a good person to ask: Why do we copy each member of
>> a segment individually in the PMLs? Wouldn't it be faster to do a
>> memcpy? If we were using a memcpy I would not have had to make any
>change to the pmls.
>>
>>> What I see is a pattern of commits that can have been dealt with
>>> differently. None had an RFC, and most of them are not even used.
>>
>> I think you are reaching a little here. I pushed several changes over
>> a period of a month. The first two are not related to the third which
>> is the only one that could have any impact to existing code and might
>> require an RFC.
>>
>> In retrospect I should have done a RFC for the 3rd change with a short
>> timeout. At the time (operating on little sleep) it seemed like the
>> commits would have minimal impact. Please let me know if the commits
>> have any negative impact.
>>
>> -Nathan
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] Bull Vendor ID disappeared from IB ini file

2011-09-07 Thread Rolf vandeVaart

Actually, I think you are off by which commit undid the change.  It was this 
one.  And the message does suggest it might have caused problems.

https://svn.open-mpi.org/trac/ompi/changeset/23764
Timestamp:
09/17/10 19:04:06 (12 months ago) 
Author:
rhc
Message:
WARNING: Work on the temp branch being merged here encountered problems 
with bugs in subversion. Considerable effort has gone into validating the 
branch. However, not all conditions can be checked, so users are cautioned that 
it may be advisable to not update from the trunk for a few days to allow MTT to 
identify platform-specific issues.
   This merges the branch containing the revamped build system based around 
converting autogen from a bash script to a Perl program. Jeff has provided 
emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For 
example. a configure.params file is no longer required in each component 
directory. See Jeff's emails for an explanation.




From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of 
Sylvain Jeaugey [sylvain.jeau...@bull.net]
Sent: Wednesday, September 07, 2011 8:56 AM
To: Open MPI Developers
Subject: [OMPI devel] Bull Vendor ID disappeared from IB ini file

Hi All,

I just realized that Bull Vendor IDs for Infiniband cards disappeared from
the trunk. Actually, they were removed shortly after we included them in
last September.

The original commit was :
r23715 | derbeyn | 2010-09-03 16:13:19 +0200 (Fri, 03 Sep 2010) | 1 line
Added Bull vendor id for ConnectX card

An here is the commit that undid Nadia's patch :
r23791 | swise | 2010-09-22 20:16:53 +0200 (Wed, 22 Sep 2010) | 2 lines
Add T4 device IDs to openib btl params ini file.

It does indeed add some T4 device IDs and removes our vendor ID. The other
thing that bugs me is that unlike the commit message suggests, this patch
does a lot more than adding T4 device ids. So, It looks like something
went wrong on this commit (something like : I forgot to update and forced
the commit) and it may be worth checking nothing else were reverted with
this commit ...

Sylvain
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] RFC: CUDA register sm and openib host memory

2011-08-02 Thread Rolf vandeVaart
Hi George:

In the current implementation, to send CUDA device memory, we move it through 
internal host buffers.  In other words, we force the usage of the send 
protocols.
If the host buffers are CUDA registered, then when we call cuMemcpy to move the 
data into those host buffers, the cuMemcpy performs better if the memory is 
registered, and cannot be paged out.
This also may allow future optimizations, for example, asynchronous copies, 
that require the host memory we are copying into to be CUDA registered.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of George Bosilca
>Sent: Tuesday, August 02, 2011 10:50 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] RFC: CUDA register sm and openib host memory
>
>Rolf,
>
>Can we have some more details on how this will improve performance of
>sending GPU device memory? I fail to see how registering the backend shared
>memory file with CUDA is supposed to do anything at all, as this memory is
>internal to Open MPI and not supposed to be visible at any other level.
>
>  Thanks,
>george.
>
>On Jul 28, 2011, at 23:52 , Rolf vandeVaart wrote:
>
>> DETAILS: In order to improve performance of sending GPU device memory,
>> we need to register the host memory with the CUDA framework.  These
>> changes allow that to happen.  These changes are somewhat different
>> from what I proposed a while ago and I think a lot cleaner.  There is
>> a new memory pool flag that indicates whether a piece of memory should
>> be registered.  This allows us to register the sm memory and the
>> pre-posted openib memory.
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



  1   2   >