Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
Yes, I’ve been hearing a growing number of complaints about cgroups for that 
reason. Our mapping/ranking/binding options will work with the cgroup envelope, 
but it generally winds up with a result that isn’t what the user wanted or 
expected.

We always post the OMPI BoF slides on our web site, and we’ll do the same this 
year. I may try to record webcast on it and post that as well since I know it 
can be confusing given all the flexibility we expose.

In case you haven’t read it yet, here is the relevant section from “man mpirun”:

 Mapping, Ranking, and Binding: Oh My!
   Open MPI employs a three-phase procedure for assigning process locations 
and ranks:

   mapping   Assigns a default location to each process

   ranking   Assigns an MPI_COMM_WORLD rank value to each process

   binding   Constrains each process to run on specific processors

   The mapping step is used to assign a default location to each process 
based on the mapper being employed. Mapping by slot, node,  and  sequentially  
results  in  the
   assignment of the processes to the node level. In contrast, mapping by 
object, allows the mapper to assign the process to an actual object on each 
node.

   Note: the location assigned to the process is independent of where it 
will be bound - the assignment is used solely as input to the binding algorithm.

   The  mapping of process processes to nodes can be defined not just with 
general policies but also, if necessary, using arbitrary mappings that cannot 
be described by
   a simple policy.  One can use the "sequential mapper," which reads the 
hostfile line by line, assigning processes to nodes in whatever order the 
hostfile  specifies.
   Use the -mca rmaps seq option.  For example, using the same hostfile as 
before:

   mpirun -hostfile myhostfile -mca rmaps seq ./a.out

   will  launch three processes, one on each of nodes aa, bb, and cc, 
respectively.  The slot counts don't matter;  one process is launched per line 
on whatever node is
   listed on the line.

   Another way to specify arbitrary mappings is with a rankfile, which 
gives you detailed control over process binding as well.  Rankfiles are 
discussed below.

   The second phase focuses on the ranking of the process within the job's 
MPI_COMM_WORLD.  Open MPI separates this from the mapping procedure to allow 
more flexibility
   in the relative placement of MPI processes. This is best illustrated by 
considering the following two cases where we used the —map-by ppr:2:socket 
option:

 node aa   node bb

   rank-by core 0 1 ! 2 3 4 5 ! 6 7

  rank-by socket0 2 ! 1 3 4 6 ! 5 7

  rank-by socket:span   0 4 ! 1 5 2 6 ! 3 7

   Ranking  by core and by slot provide the identical result - a simple 
progression of MPI_COMM_WORLD ranks across each node. Ranking by socket does a 
round-robin rank‐
   ing within each node until all processes have been assigned an MCW rank, 
and then progresses to the next node. Adding the span  modifier  to  the  
ranking  directive
   causes  the  ranking algorithm to treat the entire allocation as a 
single entity - thus, the MCW ranks are assigned across all sockets before 
circling back around to
   the beginning.

   The binding phase actually binds each process to a given set of 
processors. This can improve performance if the operating system is placing  
processes  suboptimally.
   For  example,  it  might  oversubscribe  some  multi-core processor 
sockets, leaving other sockets idle;  this can lead processes to contend 
unnecessarily for common
   resources.  Or, it might spread processes out too widely;  this can be 
suboptimal if application performance is sensitive to interprocess 
communication costs.  Bind‐
   ing can also keep the operating system from migrating processes 
excessively, regardless of how optimally those processes were placed to begin 
with.

   The  processors  to  be  used  for binding can be identified in terms of 
topological groupings - e.g., binding to an l3cache will bind each process to 
all processors
   within the scope of a single L3 cache within their assigned location. 
Thus, if a process is assigned by the mapper to a  certain  socket,  then  a  
—bind-to  l3cache
   directive will cause the process to be bound to the processors that 
share a single L3 cache within that socket.

   To  help  balance loads, the binding directive uses a round-robin method 
when binding to levels lower than used in the mapper. For example, consider the 
case where a
   job is mapped to the socket level, and then bound to core. Each socket 
will have multiple cores, so if multiple processes are mapped to a given 
socket,  the  binding
   algorithm will assign each process located to a socket to a unique core 
in a round-robin manner.

   Alternatively,  

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread Bennet Fauber
Ralph,

Alas, I will not be at SC16.  I would like to hear and/or see what you
present, so if it gets made available in alternate format, I'd
appreciated know where and how to get it.

I am more and more coming to think that our cluster configuration is
essentially designed to frustrated MPI developers because we use the
scheduler to create cgroups (once upon a time, cpusets) for subsets of
cores on multisocket machines, and I think that invalidates a lot of
the assumptions that are getting made by people who want to bind to
particular patters.

It's our foot, and we have been doing a good job of shooting it.  ;-)

-- bennet




On Fri, Oct 28, 2016 at 7:18 PM, r...@open-mpi.org  wrote:
> FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the
> OMPI BoF meeting at SC’16, for those who can attend. Will try to explain the
> rationale as well as the mechanics of the options
>
> On Oct 11, 2016, at 8:09 AM, Dave Love  wrote:
>
> Gilles Gouaillardet  writes:
>
> Bennet,
>
>
> my guess is mapping/binding to sockets was deemed the best compromise
> from an
>
> "out of the box" performance point of view.
>
>
> iirc, we did fix some bugs that occured when running under asymmetric
> cpusets/cgroups.
>
> if you still have some issues with the latest Open MPI version (2.0.1)
> and the default policy,
>
> could you please describe them ?
>
>
> I also don't understand why binding to sockets is the right thing to do.
> Binding to cores seems the right default to me, and I set that locally,
> with instructions about running OpenMP.  (Isn't that what other
> implementations do, which makes them look better?)
>
> I think at least numa should be used, rather than socket.  Knights
> Landing, for instance, is single-socket, so no gets no actual binding by
> default.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI 
BoF meeting at SC’16, for those who can attend. Will try to explain the 
rationale as well as the mechanics of the options

> On Oct 11, 2016, at 8:09 AM, Dave Love  wrote:
> 
> Gilles Gouaillardet > writes:
> 
>> Bennet,
>> 
>> 
>> my guess is mapping/binding to sockets was deemed the best compromise
>> from an
>> 
>> "out of the box" performance point of view.
>> 
>> 
>> iirc, we did fix some bugs that occured when running under asymmetric
>> cpusets/cgroups.
>> 
>> if you still have some issues with the latest Open MPI version (2.0.1)
>> and the default policy,
>> 
>> could you please describe them ?
> 
> I also don't understand why binding to sockets is the right thing to do.
> Binding to cores seems the right default to me, and I set that locally,
> with instructions about running OpenMP.  (Isn't that what other
> implementations do, which makes them look better?)
> 
> I think at least numa should be used, rather than socket.  Knights
> Landing, for instance, is single-socket, so no gets no actual binding by
> default.
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI 
BoF meeting at SC’16, for those who can attend


> On Oct 11, 2016, at 8:16 AM, Dave Love  wrote:
> 
> Wirawan Purwanto  writes:
> 
>> Instead of the scenario above, I was trying to get the MPI processes
>> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
>> node 0 first, then fill node 1, and so on. How do I do this properly?
>> 
>> I tried a few attempts that fail:
>> 
>> $ export OMP_NUM_THREADS=2
>> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE
> 
> ...
> 
>> Clearly I am not understanding how this map-by works. Could somebody
>> help me? There was a wiki article partially written:
>> 
>> https://github.com/open-mpi/ompi/wiki/ProcessPlacement
>> 
>> but unfortunately it is also not clear to me.
> 
> Me neither; this stuff has traditionally been quite unclear and really
> needs documenting/explaining properly.
> 
> This sort of thing from my local instructions for OMPI 1.8 probably does
> what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me
> a couple of small nodes):
> 
>  $ qrsh -pe mpi 24 -l num_proc=12 \
> mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |&
> sort -k 4 -n
>  [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]]: [B/B/./././.][./././././.]
>  [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 
> 3[hwt 0]]: [././B/B/./.][./././././.]
>  [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 
> 5[hwt 0]]: [././././B/B][./././././.]
>  [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]]: [./././././.][B/B/./././.]
>  [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 
> 9[hwt 0]]: [./././././.][././B/B/./.]
>  [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 
> 11[hwt 0]]: [./././././.][././././B/B]
>  [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]]: [B/B/./././.][./././././.]
>  [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core 
> 3[hwt 0]]: [././B/B/./.][./././././.]
>  [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core 
> 5[hwt 0]]: [././././B/B][./././././.]
>  [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]]: [./././././.][B/B/./././.]
>  [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core 
> 9[hwt 0]]: [./././././.][././B/B/./.]
>  [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core 
> 11[hwt 0]]: [./././././.][././././B/B]
> 
> I don't remember how I found that out.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MCA compilation later

2016-10-28 Thread r...@open-mpi.org
You don’t need any of the hardware - you just need the headers. Things like 
libfabric and libibverbs are all publicly available, and so you can build all 
that support even if you cannot run it on your machine.

Once your customer installs the binary, the various plugins will check for 
their required library and hardware and disqualify themselves if it isn’t found.

> On Oct 28, 2016, at 12:33 PM, Sean Ahern  wrote:
> 
> There's been discussion on the OpenMPI list recently about static linking of 
> OpenMPI with all of the desired MCAs in it. I've got the opposite question. 
> I'd like to add MCAs later on to an already-compiled version of OpenMPI and 
> am not quite sure how to do it.
> 
> Let me summarize. We've got a commercial code that we deploy on customer 
> machines in binary form. We're working to integrate OpenMPI into the 
> installer, and things seem to be progressing well. (Note: because we're a 
> commercial code, making the customer compile something doesn't work for us 
> like it can for open source or research codes.)
> 
> Now, we want to take advantage of OpenMPI's ability to find MCAs at runtime, 
> pointing to the various plugins that might apply to a deployed system. I've 
> configured and compiled OpenMPI on one of our build machines, one that 
> doesn't have any special interconnect hardware or software installed. We take 
> this compiled version of OpenMPI and use it on all of our machines. (Yes, 
> I've read Building FAQ #39 
>  about 
> relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled version 
> of OpenMPI and add MCA libraries to it, giving OpenMPI the ability to 
> communicate via transport mechanisms that weren't available on the original 
> build machine. Things like InfiniBand, OmniPath, or one of Cray's 
> interconnects.
> 
> How would I go about doing this? And what are the limitations?
> 
> I'm guessing that I need to go configure and compile the same version of 
> OpenMPI on a machine that has the desired interconnect installation (headers 
> and libraries), then go grab the corresponding lib/openmpi/mca_*{la,so} 
> files. Take those files and drop them in our pre-built OpenMPI from our build 
> machine in the same relative plugin location (lib/openmpi). If I stick with 
> the same compiler (gcc, in this case), I'm hoping that symbols will all 
> resolve themselves at runtime. (I probably will have to do some 
> LD_LIBRARY_PATH games to be sure to find the appropriate underlying libraries 
> unless OpenMPI's process for building MCAs links them in statically somehow.)
> 
> Am I even on the right track here? (The various system-level FAQs (here 
> , here 
> , and especially here 
> ) seem to suggest that I am.)
> 
> Our first test platform will be getting OpenMPI via IB working on our 
> cluster, where we have IB (and TCP/IP) functional and not OpenMPI. This will 
> be a great stand-in for a customer that has an IB cluster and wants to just 
> run our binary installation.
> 
> Thanks.
> 
> -Sean
> 
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] MCA compilation later

2016-10-28 Thread Sean Ahern
There's been discussion on the OpenMPI list recently about static linking
of OpenMPI with all of the desired MCAs in it. I've got the opposite
question. I'd like to add MCAs later on to an already-compiled version of
OpenMPI and am not quite sure how to do it.

Let me summarize. We've got a commercial code that we deploy on customer
machines in binary form. We're working to integrate OpenMPI into the
installer, and things seem to be progressing well. (Note: because we're a
commercial code, making the customer compile something doesn't work for us
like it can for open source or research codes.)

Now, we want to take advantage of OpenMPI's ability to find MCAs at
runtime, pointing to the various plugins that might apply to a deployed
system. I've configured and compiled OpenMPI on one of our build machines,
one that doesn't have any special interconnect hardware or software
installed. We take this compiled version of OpenMPI and use it on all of
our machines. (Yes, I've read Building FAQ #39
 about
relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled
version of OpenMPI and add MCA libraries to it, giving OpenMPI the ability
to communicate via transport mechanisms that weren't available on the
original build machine. Things like InfiniBand, OmniPath, or one of Cray's
interconnects.

How would I go about doing this? And what are the limitations?

I'm guessing that I need to go configure and compile the same version of
OpenMPI on a machine that has the desired interconnect installation
(headers and libraries), then go grab the corresponding
lib/openmpi/mca_*{la,so} files. Take those files and drop them in our
pre-built OpenMPI from our build machine in the same relative plugin
location (lib/openmpi). If I stick with the same compiler (gcc, in this
case), I'm hoping that symbols will all resolve themselves at runtime. (I
probably will have to do some LD_LIBRARY_PATH games to be sure to find the
appropriate underlying libraries unless OpenMPI's process for building MCAs
links them in statically somehow.)

Am I even on the right track here? (The various system-level FAQs (here
, here
, and especially here
) seem to suggest that I
am.)

Our first test platform will be getting OpenMPI via IB working on our
cluster, where we have IB (and TCP/IP) functional and not OpenMPI. This
will be a great stand-in for a customer that has an IB cluster and wants to
just run our binary installation.

Thanks.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with double shared library

2016-10-28 Thread Sean Ahern
Gilles,

You described the problem exactly. I think we were able to nail down a
solution to this one through judicious use of the -rpath $MPI_DIR/lib
linker flag, allowing the runtime linker to properly find OpenMPI symbols
at runtime. We're operational. Thanks for your help.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Mon, Oct 17, 2016 at 9:45 PM, Gilles Gouaillardet 
wrote:

> Sean,
>
>
> if i understand correctly, your built a libtransport_mpi.so library that
> depends on Open MPI, and your main program dlopen libtransport_mpi.so.
>
> in this case, and at least for the time being,  you need to use
> RTLD_GLOBAL in your dlopen flags.
>
>
> Cheers,
>
>
> Gilles
>
> On 10/18/2016 4:53 AM, Sean Ahern wrote:
>
> Folks,
>
> For our code, we have a communication layer that abstracts the code that
> does the actual transfer of data. We call these "transports", and we link
> them as shared libraries. We have created an MPI transport that
> compiles/links against OpenMPI 2.0.1 using the compiler wrappers. When I
> compile OpenMPI with the--disable-dlopen option (thus cramming all of
> OpenMPI's plugins into the MPI library directly), things work great with
> our transport shared library. But when I have a "normal" OpenMPI (without
> --disable-dlopen) and create the same transport shared library, things
> fail. Upon launch, it appears that OpenMPI is unable to find the
> appropriate plugins:
>
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_patcher_overwrite: /home/sean/work/ceisvn/apex/
> branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-
> 2.0.1/lib/openmpi/mca_patcher_overwrite.so: undefined symbol:
> *mca_patcher_base_patch_t_class* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_mmap: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
> undefined symbol: *opal_show_help* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_posix: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
> undefined symbol: *opal_show_help* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_sysv: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
> undefined symbol: *opal_show_help* (ignored)
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_init failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "Error" (-1) instead of "Success" (0)
>
>
> If I skip our shared libraries and instead write a standard MPI-based
> "hello, world" program that links against MPI directly (without
> --disable-dlopen), everything is again fine.
>
> It seems that having the double dlopen is causing problems for OpenMPI
> finding its own shared libraries.
>
> Note: I do have LD_LIBRARY_PATH pointing to …"openmpi-2.0.1/lib", as well
> as OPAL_PREFIX pointing to …"openmpi-2.0.1".
>
> Any thoughts about how I can try to tease out what's going wrong here?
>
> -Sean
>
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
>
>
> ___

Re: [OMPI users] Unable to compile OpenMPI 1.10.3 with CUDA

2016-10-28 Thread Sylvain Jeaugey

On 10/28/2016 10:33 AM, Craig tierney wrote:

Sylvain,

If I do not set --with-cuda, I get:

configure:9964: result: no
configure:10023: checking whether CU_POINTER_ATTRIBUTE_SYNC_MEMOPS is 
declared

configure:10023: gcc -c -DNDEBUG   conftest.c >&5
conftest.c:83:19: fatal error: /cuda.h: No such file or directory
 #include 
   ^
It looks like your environment has variables that the configure tries to 
use. You should look the output of :

 env | grep CUDA
and unset them.
Or you can specify --with-cuda=/usr/local/cuda to be sure.


If I specify the path to cuda, the same results as before. In the 
configure process, the first time cuda.h is tested it works.


configure:9843: checking if --with-cuda is set
configure:9897: result: found (/usr/local/cuda/include/cuda.h)
configure:9964: checking for struct CUipcMemHandle_st.reserved


Good.

But the next time the compile command doesn't add an include to the 
compile line and the compile fails:


configure:74312: checking for CL/cl_ext.h
configure:74312: result: no
configure:74425: checking cuda.h usability
configure:74425: gcc -std=gnu99 -c -O3 -DNDEBUG  conftest.c >&5
conftest.c:648:18: fatal error: cuda.h: No such file or directory
 #include 
  ^
compilation terminated.
configure:74425: $? = 1
Is the Open MPI configure explicitely failing ? If not, is the Open MPI 
compilation failing ? If it works, you should see CUDA support has been 
compiled in (in ompi_info).


It seems you are fooled by the hwloc configure here : the hwloc 
configure includes checks for CUDA but we don't need them in Open MPI so 
they are failing, but you still get CUDA support.


In the latest version of Open MPI, there should be a report at the end 
of configure explicitely stating if CUDA support has been enabled or not.


Sylvain

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Unable to compile OpenMPI 1.10.3 with CUDA

2016-10-28 Thread Craig tierney
Sylvain,

If I do not set --with-cuda, I get:

configure:9964: result: no
configure:10023: checking whether CU_POINTER_ATTRIBUTE_SYNC_MEMOPS is
declared
configure:10023: gcc -c -DNDEBUG   conftest.c >&5
conftest.c:83:19: fatal error: /cuda.h: No such file or directory
 #include 
   ^

If I specify the path to cuda, the same results as before.  In the
configure process, the first time cuda.h is tested it works.

configure:9843: checking if --with-cuda is set
configure:9897: result: found (/usr/local/cuda/include/cuda.h)
configure:9964: checking for struct CUipcMemHandle_st.reserved

But the next time the compile command doesn't add an include to the compile
line and the compile fails:

configure:74312: checking for CL/cl_ext.h
configure:74312: result: no
configure:74425: checking cuda.h usability
configure:74425: gcc -std=gnu99 -c -O3 -DNDEBUGconftest.c >&5
conftest.c:648:18: fatal error: cuda.h: No such file or directory
 #include 
  ^
compilation terminated.
configure:74425: $? = 1

Craig


On Thu, Oct 27, 2016 at 4:47 PM, Sylvain Jeaugey 
wrote:

> I guess --with-cuda is disabling the default CUDA path which is
> /usr/local/cuda. So you should either not set --with-cuda or set
> --with-cuda $CUDA_HOME (no include).
>
> Sylvain
> On 10/27/2016 03:23 PM, Craig tierney wrote:
>
> Hello,
>
> I am trying to build OpenMPI 1.10.3 with CUDA but I am unable to build the
> library that will allow me to use IPC on a node or GDR between nodes.   I
> have tried with with 1.10.4 and 2.0.1 and have the same problems.  Here is
> my build script:
>
> ---
> #!/bin/bash
>
> export OPENMPI_VERSION=1.10.3
> export BASEDIR=/tmp/mpi_testing/
> export CUDA_HOME=/usr/local/cuda
> export PATH=$CUDA_HOME/bin/:$PATH
> export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
> export MPI_HOME=$BASEDIR/openmpi-$OPENMPI_VERSION
>
> which nvcc
> nvcc --version
>
> tar -zxf openmpi-$OPENMPI_VERSION.tar.gz
> cd openmpi-$OPENMPI_VERSION
>
> ./configure --prefix=$MPI_HOME --with-cuda=$CUDA_HOME/include > config.out
> 2>&1
>
> make -j > build.out 2>&1
> make install >> build.out 2>&1
> ---
>
> From the docs, it appears that I should not have to set anything but
> --with-cuda since my CUDA is in /usr/local/cuda.  However, I appended
> /usr/local/cuda/include just in case when the first way didn't work.
>
> From the output in config.log, I see that cuda.h is not found.  When the
> tests are called there is no extra include flag added to specify the
> /usr/local/cuda/include path.
>
> With the resulting build, I test for CUDA and GDR with ompi_info.  Results
> are:
>
> testuser@dgx-1:~/temp$ /tmp/mpi_testing/openmpi-1.10.3/bin/ompi_info  |
> grep cuda
>  MCA btl: smcuda (MCA v2.0.0, API v2.0.0, Component
> v1.10.3)
> MCA coll: cuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
> testuser@dgx-1:~/temp$ /tmp/mpi_testing/openmpi-1.10.3/bin/ompi_info  |
> grep gdr
> testuser@dgx-1:~/temp$
>
> Configure and build logs are attached.
>
>
> Thanks,
> Craig
>
>
>
> ___
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Fortran and MPI-3 shared memory

2016-10-28 Thread Tom Rosmond

Gilles,

Thanks!  With my very rudimentary understanding of C pointers and C 
programming in general I missed that translation subtly.  The revised 
program runs fine with a variety of optimizations and debug options on 
my test system.


Tom R.



On 10/27/2016 10:23 PM, Gilles Gouaillardet wrote:


Tom,


regardless the (lack of) memory model in Fortran, there is an error in 
testmpi3.f90


shar_mem is declared as an integer, and hence is not in the shared memory.

i attached my version of testmpi3.f90, which behaves just like the C 
version,


at least when compiled with -g -O0 and with Open MPI master

/* i replaced shar_mem with fptr_mem */


Cheers,


Gilles




On 10/26/2016 3:29 AM, Tom Rosmond wrote:

All:

I am trying to understand the use of the shared memory features of 
MPI-3 that allow direct sharing of the memory space of on-node 
processes.  Attached are 2 small test programs, one written in C 
(testmpi3.c), the other F95 (testmpi3.f90) .  They are solving the 
identical 'halo' exchange problem.  'testmpi3.c' is a simplified 
version of an example program from a presentation by Mark Lubin of 
Intel.  I wrote 'testmpi3.f90' to mimic the C version.


 Also attached are 2 text files of the compile, execution, and output 
of the respective programs:


CC_testmpi3.txt
F95_testmpi3.txt

Note: All 4 files are contained in the attached 'testmpi3.tar.gz'.

Comparing the outputs of each version, it is clear that the shared 
memory copies in 'testmpi3.c' are working correctly, but not in 
'testmpi3.f90'.  As far as I can tell, the 2 programs are equivalent 
up to line 134 of 'testmpi3.c' and lines 97-101 of 'testmpi3.f90'. I 
thought the calls to 'c_f_pointer' would produce Fortran pointers 
that would access the correct shared memory addresses as the 
C-pointers do in 'testmpi3.c', but clearly that isn't happening. Can 
anyone explain why not, and what is needed to make this happen. Any 
suggestions are welcome.


My environment:
 Scientific Linux 6.8
 INTEL FORTRAN and ICC version 15.0.2.164
 OPEN-MPI 2.0.1


T. Rosmond


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Redusing libmpi.so size....

2016-10-28 Thread Jeff Squyres (jsquyres)
On Oct 28, 2016, at 8:12 AM, Mahesh Nanavalla  
wrote:
> 
> i have configured as below for arm   
> 
> ./configure --enable-orterun-prefix-by-default  
> --prefix="/home/nmahesh/Workspace/ARM_MPI/openmpi" 
> CC=arm-openwrt-linux-muslgnueabi-gcc CXX=arm-openwrt-linux-muslgnueabi-g++ 
> --host=arm-openwrt-linux-muslgnueabi --enable-script-wrapper-compilers 
> --disable-mpi-fortran --enable-dlopen --enable-shared --disable-vt 
> --disable-java --disable-libompitrace --disable-static

Note that there is a tradeoff here: --enable-dlopen will reduce the size of 
libmpi.so by splitting out all the plugins into separate DSOs (dynamic shared 
objects -- i.e., individual .so plugin files).  But note that some of plugins 
are quite small in terms of code.  I mention this because when you dlopen a 
DSO, it will load in DSOs in units of pages.  So even if a DSO only has 1KB of 
code, it will use  of bytes in your running process (e.g., 4KB -- or 
whatever the page size is on your system).

On the other hand, if you --disable-dlopen, then all of Open MPI's plugins are 
slurped into libmpi.so (and friends).  Meaning: no DSOs, no dlopen, no 
page-boundary-loading behavior.  This allows the compiler/linker to pack in all 
the plugins into memory more efficiently (because they'll be compiled as part 
of libmpi.so, and all the code is packed in there -- just like any other 
library).  Your total memory usage in the process may be smaller.

Sidenote: if you run more than one MPI process per node, then libmpi.so (and 
friends) will be shared between processes.  You're assumedly running in an 
embedded environment, so I don't know if this factor matters (i.e., I don't 
know if you'll run with ppn>1), but I thought I'd mention it anyway.

On the other hand (that's your third hand, for those at home counting...), you 
may not want to include *all* the plugins.  I.e., there may be a bunch of 
plugins that you're not actually using, and therefore if they are compiled in 
as part of libmpi.so (and friends), they're consuming space that you don't 
want/need.  So the dlopen mechanism might actually be better -- because Open 
MPI may dlopen a plugin at run time, determine that it won't be used, and then 
dlclose it (i.e., release the memory that would have been used for it).

On the other (fourth!) hand, you can actually tell Open MPI to *not* build 
specific plugins with the --enable-dso-no-build=LIST configure option.  I.e., 
if you know exactly what plugins you want to use, you can negate the ones that 
you *don't* want to use on the configure line, use --disable-static and 
--disable-dlopen, and you'll likely use the least amount of memory.  This is 
admittedly a bit clunky, but Open MPI's configure process was (obviously) not 
optimized for this use case -- it's much more optimized to the "build 
everything possible, and figure out which to use at run time" use case.

If you really want to hit rock bottom on MPI process size in your embedded 
environment, you can do some experimentation to figure out exactly which 
components you need.  You can use repeated runs with "mpirun --mca 
ABC_base_verbose 100 ...", where "ABC" is each of Open MPI's framework names 
("framework" = collection of plugins of the same type).  This verbose output 
will show you exactly which components are opened, which ones are used, and 
which ones are discarded.  You can build up a list of all the discarded 
components and --enable-mca-no-build them.

> While i am running the using mpirun 
> am getting following errror..
> root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 1 
> /usr/bin/openmpiWiFiBulb
> --
> Sorry!  You were supposed to get help about:
> opal_init:startup:internal-failure
> But I couldn't open the help file:
> 
> /home/nmahesh/Workspace/ARM_MPI/openmpi/share/openmpi/help-opal-runtime.txt: 
> No such file or directory.  Sorry!

So this is really two errors:

1. The help message file is not being found.
2. Something is obviously going wrong during opal_init() (which is one of Open 
MPI's startup functions).

For #1, when I do a default build of Open MPI 1.10.3, that file *is* installed. 
 Are you trimming the installation tree, perchance?  If so, if you can put at 
least that one file back in its installation location (it's in the Open MPI 
source tarball), it might reveal more information on exactly what is failing.

Additionally, I wonder if shared memory is not getting setup right.  Try 
running with "mpirun --mca shmem_base_verbose 100 ..." and see if it's 
reporting an error.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Redusing libmpi.so size....

2016-10-28 Thread Mahesh Nanavalla
Hi Gilles,

Thanks for reply

i have configured as below for arm

./configure --enable-orterun-prefix-by-default
--prefix="/home/nmahesh/Workspace/ARM_MPI/openmpi"
CC=arm-openwrt-linux-muslgnueabi-gcc CXX=arm-openwrt-linux-muslgnueabi-g++
--host=arm-openwrt-linux-muslgnueabi --enable-script-wrapper-compilers
--disable-mpi-fortran* --enable-dlopen* --enable-shared --disable-vt
--disable-java --disable-libompitrace --disable-static

While i am running the using mpirun
am getting following errror..
root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 1
/usr/bin/openmpiWiFiBulb
--
Sorry!  You were supposed to get help about:
opal_init:startup:internal-failure
But I couldn't open the help file:
/home/nmahesh/Workspace/ARM_MPI/openmpi/share/openmpi/help-opal-runtime.txt:
No such file or directory.  Sorry!


kindly guide me...

On Fri, Oct 28, 2016 at 5:34 PM, Mahesh Nanavalla <
mahesh.nanavalla...@gmail.com> wrote:

> Hi Gilles,
>
> Thanks for reply
>
> i have configured as below for arm
>
> ./configure --enable-orterun-prefix-by-default  
> --prefix="/home/nmahesh/Workspace/ARM_MPI/openmpi"
> CC=arm-openwrt-linux-muslgnueabi-gcc CXX=arm-openwrt-linux-muslgnueabi-g++
> --host=arm-openwrt-linux-muslgnueabi --enable-script-wrapper-compilers
> --disable-mpi-fortran* --enable-dlopen* --enable-shared --disable-vt
> --disable-java --disable-libompitrace --disable-static
>
> While i am running the using mpirun
> am getting following errror..
> root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 1
> /usr/bin/openmpiWiFiBulb
> --
> Sorry!  You were supposed to get help about:
> opal_init:startup:internal-failure
> But I couldn't open the help file:
> 
> /home/nmahesh/Workspace/ARM_MPI/openmpi/share/openmpi/help-opal-runtime.txt:
> No such file or directory.  Sorry!
>
>
> kindly guide me...
>
> On Fri, Oct 28, 2016 at 4:36 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Hi,
>>
>> i do not know if you can expect same lib size on x86_64 and arm.
>> x86_64 uses variable length instructions, and since arm is RISC, i
>> assume instructions are fixed length, and more instructions are
>> required to achieve the same result.
>> also, 2.4 MB does not seem huge to me.
>>
>> anyway, make sure you did not compile with -g, and you use similar
>> optimization levels on both arch.
>> you also have to be consistent with respect to the --disable-dlopen option
>> (by default, it is off, so all components are in
>> /.../lib/openmpi/mca_*.so. if you configure with --disable-dlopen, all
>> components are slurped into lib{open_pal,open_rte,mpi}.so,
>> and this obviously increases lib size.
>> depending on your compiler, you might be able to optimize for code
>> size (vs performance) with the appropriate flags.
>>
>> last but not least, strip your libs before you compare their sizes.
>>
>> Cheers,
>>
>> Gilles
>>
>> On Fri, Oct 28, 2016 at 3:17 PM, Mahesh Nanavalla
>>  wrote:
>> > Hi all,
>> >
>> > I am using openmpi-1.10.3.
>> >
>> > openmpi-1.10.3 compiled for  arm(cross compiled on X86_64 for openWRT
>> linux)
>> > libmpi.so.12.0.3 size is 2.4MB,but if i compiled on X86_64 (linux)
>> > libmpi.so.12.0.3 size is 990.2KB.
>> >
>> > can anyone tell how to reduce the size of libmpi.so.12.0.3 compiled for
>> > arm.
>> >
>> > Thanks,
>> > Mahesh.N
>> >
>>
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Gilles Gouaillardet
Sergei,

is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /usr/lib64 ?)

make sure you
ulimit -l unlimited
before you invoke mpirun, and this value is correctly propagated to
the remote nodes
/* the failure could be a side effect of a low ulimit -l */

Cheers,

Gilles


On Fri, Oct 28, 2016 at 6:48 PM, Sergei Hrushev  wrote:
> Hello, All !
>
> We have a problem with OpenMPI version 1.10.2 on a cluster with newly
> installed Mellanox InfiniBand adapters.
> OpenMPI was re-configured and re-compiled using: --with-verbs
> --with-verbs-libdir=/usr/lib
>
> And our test MPI task returns proper results but it seems OpenMPI continues
> to use existing 1Gbit Ethernet network instead of InfiniBand.
>
> An output file contains these lines:
> --
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:   node1
>   Local device: mlx4_0
>   Local port:   1
>   CPCs attempted:   rdmacm, udcm
> --
>
> InfiniBand network itself seems to be working:
>
> $ ibstat mlx4_0 shows:
>
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 1
> Firmware version: 2.35.5100
> Hardware version: 0
> Node GUID: 0x7cfe900300bddec0
> System image GUID: 0x7cfe900300bddec3
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 3
> LMC: 0
> SM lid: 3
> Capability mask: 0x0251486a
> Port GUID: 0x7cfe900300bddec1
> Link layer: InfiniBand
>
> ibping also works.
> ibnetdiscover shows the correct topology of  IB network.
>
> Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
> installed).
>
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?
> What else can be checked?
>
> Thanks a lot for any help!
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sorry - shoot down my idea. Over to someone else (me hides head in shame)

On 28 October 2016 at 11:28, Sergei Hrushev  wrote:

> Sergei,   what does the command  "ibv_devinfo" return please?
>>
>> I had a recent case like this, but on Qlogic hardware.
>> Sorry if I am mixing things up.
>>
>>
> An output of ibv_devinfo from cluster's 1st node is:
>
> $ ibv_devinfo -d mlx4_0
> hca_id: mlx4_0
> transport:  InfiniBand (0)
> fw_ver: 2.35.5100
> node_guid:  7cfe:9003:00bd:dec0
> sys_image_guid: 7cfe:9003:00bd:dec3
> vendor_id:  0x02c9
> vendor_part_id: 4099
> hw_ver: 0x0
> board_id:   MT_1100120019
> phys_port_cnt:  1
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:4096 (5)
> active_mtu: 4096 (5)
> sm_lid: 3
> port_lid:   3
> port_lmc:   0x00
> link_layer: InfiniBand
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Sergei Hrushev
>
> Sergei,   what does the command  "ibv_devinfo" return please?
>
> I had a recent case like this, but on Qlogic hardware.
> Sorry if I am mixing things up.
>
>
An output of ibv_devinfo from cluster's 1st node is:

$ ibv_devinfo -d mlx4_0
hca_id: mlx4_0
transport:  InfiniBand (0)
fw_ver: 2.35.5100
node_guid:  7cfe:9003:00bd:dec0
sys_image_guid: 7cfe:9003:00bd:dec3
vendor_id:  0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id:   MT_1100120019
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:4096 (5)
active_mtu: 4096 (5)
sm_lid: 3
port_lid:   3
port_lmc:   0x00
link_layer: InfiniBand
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sergei,   what does the command  "ibv_devinfo" return please?

I had a recent case like this, but on Qlogic hardware.
Sorry if I am mixing things up.

On 28 October 2016 at 10:48, Sergei Hrushev  wrote:

> Hello, All !
>
> We have a problem with OpenMPI version 1.10.2 on a cluster with newly
> installed Mellanox InfiniBand adapters.
> OpenMPI was re-configured and re-compiled using: --with-verbs
> --with-verbs-libdir=/usr/lib
>
> And our test MPI task returns proper results but it seems OpenMPI
> continues to use existing 1Gbit Ethernet network instead of InfiniBand.
>
> An output file contains these lines:
> --
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:   node1
>   Local device: mlx4_0
>   Local port:   1
>   CPCs attempted:   rdmacm, udcm
> --
>
> InfiniBand network itself seems to be working:
>
> $ ibstat mlx4_0 shows:
>
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 1
> Firmware version: 2.35.5100
> Hardware version: 0
> Node GUID: 0x7cfe900300bddec0
> System image GUID: 0x7cfe900300bddec3
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 3
> LMC: 0
> SM lid: 3
> Capability mask: 0x0251486a
> Port GUID: 0x7cfe900300bddec1
> Link layer: InfiniBand
>
> ibping also works.
> ibnetdiscover shows the correct topology of  IB network.
>
> Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
> installed).
>
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?
> What else can be checked?
>
> Thanks a lot for any help!
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Sergei Hrushev
Hello, All !

We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib

And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.

An output file contains these lines:
--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   node1
  Local device: mlx4_0
  Local port:   1
  CPCs attempted:   rdmacm, udcm
--

InfiniBand network itself seems to be working:

$ ibstat mlx4_0 shows:

CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand

ibping also works.
ibnetdiscover shows the correct topology of  IB network.

Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?

Thanks a lot for any help!
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Redusing libmpi.so size....

2016-10-28 Thread Mahesh Nanavalla
Hi all,

I am using openmpi-1.10.3.

openmpi-1.10.3 compiled for  arm(cross compiled on X86_64 for openWRT
linux)  libmpi.so.12.0.3 size is 2.4MB,but if i compiled on X86_64 (linux)
libmpi.so.12.0.3 size is 990.2KB.

can anyone tell how to reduce the size of libmpi.so.12.0.3 compiled for
 arm.

Thanks,
Mahesh.N
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users