Re: [OMPI users] (no subject)

2016-02-22 Thread Mike Dubman
Hi, it seems that your ompi was compiled with ofed ver X but running on ofed ver Y. X and Y are incompatible. On Mon, Feb 22, 2016 at 8:18 PM, Mark Potter wrote: > I am usually able to find the answer to my problems by searching the > archive but I've run up against one

Re: [OMPI users] hcoll dependency on mxm configure error

2015-10-21 Thread Mike Dubman
; > Thanks! > David > > > On 10/21/2015 09:59 AM, Mike Dubman wrote: > > Hi David, > what linux distro do you use? (and mofed version)? > Do you have /etc/ld.conf.d/mxm.conf file? > Can you please try add LD_LIBRARY_PATH=/opt/mellanox/mxm/lib ./configure >

Re: [OMPI users] hcoll dependency on mxm configure error

2015-10-21 Thread Mike Dubman
Hi David, what linux distro do you use? (and mofed version)? Do you have /etc/ld.conf.d/mxm.conf file? Can you please try add LD_LIBRARY_PATH=/opt/mellanox/mxm/lib ./configure ? Thanks On Wed, Oct 21, 2015 at 6:40 PM, David Shrader wrote: > I should probably point out

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-10-01 Thread Mike Dubman
as well. (there is a reason that any MPI have hundreds of knobs) On Thu, Oct 1, 2015 at 1:50 PM, Dave Love <d.l...@liverpool.ac.uk> wrote: > Mike Dubman <mi...@dev.mellanox.co.il> writes: > > > we did not get to the bottom for "why". > > Tried d

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-10-01 Thread Mike Dubman
performance implications. Please set the heap size to the default value > (10240) > > Should say stack not heap. > > -Nathan > > On Wed, Sep 30, 2015 at 06:52:46PM +0300, Mike Dubman wrote: > >mxm comes with mxm_dump_config utility which provides and explains all &g

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-30 Thread Mike Dubman
mxm comes with mxm_dump_config utility which provides and explains all tunables. Please check HPCX/README file for details. On Wed, Sep 30, 2015 at 1:21 PM, Dave Love <d.l...@liverpool.ac.uk> wrote: > Mike Dubman <mi...@dev.mellanox.co.il> writes: > > > unfortunately, t

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-30 Thread Mike Dubman
we did not get to the bottom for "why". Tried different mpi packages (mvapich,intel mpi) and the observation hold true. it could be many factors affected by huge heap size (cpu cache misses? swapness?). On Wed, Sep 30, 2015 at 1:12 PM, Dave Love <d.l...@liverpool.ac.uk> wrote: &

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-29 Thread Mike Dubman
what is your command line and setup? (ofed version, distro) This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_device mlx5_3:1 -x

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-29 Thread Mike Dubman
hje...@lanl.gov> wrote: > > > > >I would like to add that you may want to play with the value and see > >what works for your applications. Most applications should be using > >malloc or similar functions to allocate large memory regions in the heap > >and not on the s

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-28 Thread Mike Dubman
Hello Grigory, We observed ~10% performance degradation with heap size set to unlimited for CFD applications. You can measure your application performance with default and unlimited "limits" and select the best setting. Kind Regards. M On Mon, Sep 28, 2015 at 7:36 PM, Grigory Shamov

Re: [OMPI users] No suitable active ports warning and -mca btl_openib_if_include option

2015-06-17 Thread Mike Dubman
Hi, the message in question belongs to MXM and it is warning (silenced in latter releases of MXM). To select specific device in MXM, please pass: mpirun -x MXM_IB_PORTS=mlx4_0:2 ... M On Wed, Jun 17, 2015 at 9:38 PM, Na Zhang wrote: > Hi all, > > I am trying to

Re: [OMPI users] MXM problem

2015-05-28 Thread Mike Dubman
lla in order to use it and attach the output? > "-x LD_PRELOAD=$HPCX_MXM_DIR/debug/lib/libmxm.so -x MXM_LOG_LEVEL=data" > > Also, could you please attach the entire output of > "$HPCX_MPI_DIR/bin/ompi_info -a" > > Thank you, > Alina. > > On Tue, May 26, 2

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-27 Thread Mike Dubman
e is empty, and > you just end up appending "-L" instead of "-L/something". So why not just > check to ensure that the variable is not empty? > > > > > On May 26, 2015, at 3:27 PM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > > > > i

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
l be empty. > > Right? > > > > On May 26, 2015, at 1:28 PM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > > > > Thanks Jeff! > > > > but in this line: > > > > > https://github.com/open-mpi/ompi/blob/master/config/ompi_check_mxm.

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
gt; automatically). Thus, ompi_check_mxm_libdir never gets assigned which > results in just "-L" getting used on line 41. The same behavior could be > found by using '--with-mxm=yes'. > > Thanks, > David > > > On 05/26/2015 11:28 AM, Mike Dubman wrote: > > Tha

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
David, Could you please send me your config.log file? Looking into config/ompi_check_mxm.m4 macro I don`t understand how it could happen. Thanks a lot. On Tue, May 26, 2015 at 6:41 PM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: > Hello David, > Thanks for info and patch - w

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
the linking commands and make > completed fine. > > So, it looks like there are two solutions: move the install location of > mxm to not be in system-space or modify configure. Which one would be the > better one for me to pursue? > > Thanks, > David > > > On

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Mike Dubman
btw, what is a rationale to run in chroot env? is it dockers-like env? does "ibv_devinfo -v" works for you from chroot env? On Tue, May 26, 2015 at 7:08 AM, Rahul Yadav wrote: > Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of > the chroot

Re: [OMPI users] MXM problem

2015-05-25 Thread Mike Dubman
active_mtu: 4096 (5) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > Best regards, > Timur. > > > Понедельник, 25 мая 2015, 19:39 +03

Re: [OMPI users] MXM problem

2015-05-25 Thread Mike Dubman
Hi Timur, seems that yalla component was not found in your OMPI tree. can it be that your mpirun is not from hpcx? Can you please check LD_LIBRARY_PATH,PATH, LD_PRELOAD and OPAL_PREFIX that it is pointing to the right mpirun? Also, could you please check that yalla is present in the ompi_info -l

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-23 Thread Mike Dubman
Hi, How mxm was installed? by copying? The rpm based installation places mxm into /opt/mellanox/mxm and not into /usr/lib64/libmxm.so. Do you use HPCx (pack of OMPI and MXM and FCA)? You can download HPCX, extract it anywhere and compile OMPI pointing to mxm location under HPCX. Also, HPCx

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-28 Thread Mike Dubman
or trace or other) to sanity check that I am > indeed using #2 ? > > Subhra > > On Fri, Apr 24, 2015 at 12:55 AM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> yes >> >> #1 - ob1 as pml, openib openib as btl (default: rc) >> #2 - yalla as pml

Re: [OMPI users] MPI_Finalize not behaving correctly, orphaned processes

2015-04-26 Thread Mike Dubman
when I paid attention to verbs (which was admittedly a long > time ago), the sample I pasted would segv... > > > > On Apr 24, 2015, at 9:40 AM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > > > > ibv_fork_init() will set special flag for madvise() > (IB

Re: [OMPI users] MPI_Finalize not behaving correctly, orphaned processes

2015-04-24 Thread Mike Dubman
) { > // in the child > *buffer = 3; > // ... > } > ---- > > > > > On Apr 24, 2015, at 2:54 AM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > > > > btw, ompi master now calls ibv_fork_init() before initializing > btl/mtl/oob f

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-24 Thread Mike Dubman
all of the above using infiniband but in different ways? > > Thanks, > Subhra. > > > > On Thu, Apr 23, 2015 at 11:57 PM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> HPCX package uses pml "yalla" by default (part of ompi master branch, not &

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-24 Thread Mike Dubman
3) mpirun --allow-run-as-root --mca mtl ^mxm -n 1 /root/backend > localhost : -x LD_PRELOAD=/root/libci.so -n 1 /root/app2 > > Seems like it doesn't matter if I use mxm, not use mxm or use it with > reliable connection (RC). How can I be sure I am indeed using mxm over &g

Re: [OMPI users] MPI_Finalize not behaving correctly, orphaned processes

2015-04-24 Thread Mike Dubman
btw, ompi master now calls ibv_fork_init() before initializing btl/mtl/oob frameworks and all fork fears should be addressed. On Fri, Apr 24, 2015 at 4:37 AM, Jeff Squyres (jsquyres) wrote: > Disable the memory manager / don't use leave pinned. Then you can > fork/exec

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-23 Thread Mike Dubman
but didn't find > it. > > Subhra. > > On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> cool, progress! >> >> >>1429676565.124664] sys.c:719 MXM WARN Conflicting CPU >> frequencies detected, using: 2601.00 &g

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-22 Thread Mike Dubman
iv_env(): Function not implemented > [1429676565.126821] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR > ibv_query_device() returned 38: Function not implemented > -- > Initialization of MXM library failed. > > Error: Input/output error > >

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-18 Thread Mike Dubman
??:0 > > === > > -- > > mpirun noticed that process rank 1 with PID 450 on node JARVICE exited on > signal 11 (Segmentation fault). > > ---

Re: [OMPI users] Select a card in a multi card system

2015-04-15 Thread Mike Dubman
Hi, With MXM, you can specify list of devices to use for communication: -x MXM_IB_PORTS="mlx5_1:1,mlx4_1:1" also select specific or all transpoirts: -x MXM_TLS=shm,self,ud To change port rate one can use *ibportstate*

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-14 Thread Mike Dubman
e JARVICE exited on > signal 11 (Segmentation fault). > -- > [JARVICE:00562] 1 more process has sent help message help-mca-base.txt / > find-available:not-valid > [JARVICE:00562] Set MCA parameter "orte_base_

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-13 Thread Mike Dubman
mxm > -- > -- > mpirun noticed that process rank 0 with PID 8398 on node JARVICE exited on > signal 11 (Segmentation fault). > ---

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-10 Thread Mike Dubman
t from UD/RC/DC? What is the default? > > Thanks, > Subhra. > > > On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> Hi, >> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC transports >> to be use

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-03-31 Thread Mike Dubman
t; Does the mxm mtl use infiniband rdma? Also from programming perspective, > do I need to use anything else other than MPI_Send/MPI_Recv? > > Thanks, > Subhra. > > > On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> Hi, &g

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-03-30 Thread Mike Dubman
Hi, openib btl does not support this thread model. You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread mode lin 1.8 x series or (-mca pml yalla) in the master branch. M On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar wrote: > Hi, > > Can MPI_THREAD_MULTIPLE

Re: [OMPI users] Determine IB transport type of OpenMPI job

2015-01-11 Thread Mike Dubman
Hi, also - you can use mxm library (which support RC,UD,DC and mixes) and comes as part of Mellanox OFED. The version for community OFED is also available from http://mellanox.com/products/hpcx On Fri, Jan 9, 2015 at 4:03 PM, Sasso, John (GE Power & Water, Non-GE) < john1.sa...@ge.com> wrote: >

Re: [OMPI users] ERROR: C_FUNLOC function

2014-12-18 Thread Mike Dubman
Hi Siegmar, Could you please check the /etc/mtab file for real FS type for the following mount points: get_mounts: dirs[16]:/misc fs:autofs nfs:No get_mounts: dirs[17]:/net fs:autofs nfs:No get_mounts: dirs[18]:/home fs:autofs nfs:No could you please check if mntent.h and paths.h were detected

Re: [OMPI users] shmalloc error with >=512 mb

2014-11-17 Thread Mike Dubman
Hi, the default memheap size is 256MB, you can override it with oshrun -x SHMEM_SYMMETRIC_HEAP_SIZE=512M ... On Mon, Nov 17, 2014 at 3:38 PM, Timur Ismagilov wrote: > Hello! > Why does shmalloc return NULL when I try to allocate 512MB. > When i thry to allocate 256mb - all

Re: [OMPI users] Building on a host with a shoddy OpenFabrics installation

2014-10-11 Thread Mike Dubman
Hi, yep - you can compile OFED/MOFED in the $HOME/ofed dir and point OMPI configure to it with "--with-verbs=/path/to/ofed/install". You can download and compile "libibverbs","libibumad","libibmad","librdmacm","opensm","infiniband-diags" packages only with custom prefix. M On Fri, Oct 10, 2014

Re: [OMPI users] long initialization

2014-08-22 Thread Mike Dubman
t; > On Aug 21, 2014, at 12:02 AM, Timur Ismagilov <tismagi...@mail.ru > > wrote: > > Have i I any opportunity to run mpi jobs? > > > Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain <r...@open-mpi.org > >: > > yes, i know - it is cmr'd > > On Aug 20, 2

Re: [OMPI users] Clarification about OpenMPI, slurm and PMI interface

2014-08-21 Thread Mike Dubman
Hi FIlippo, I think you can use SLURM_LOCALID var (at least with slurm v14.03.4-2) $srun -N2 --ntasks-per-node 3 env |grep SLURM_LOCALID SLURM_LOCALID=1 SLURM_LOCALID=2 SLURM_LOCALID=0 SLURM_LOCALID=0 SLURM_LOCALID=1 SLURM_LOCALID=2 $ Kind Regards, M On Thu, Aug 21, 2014 at 9:27 PM, Ralph

Re: [OMPI users] ORTE daemon has unexpectedly failed after launch

2014-08-20 Thread Mike Dubman
btw, we get same error in v1.8 branch as well. On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain wrote: > It was not yet fixed - but should be now. > > On Aug 20, 2014, at 6:39 AM, Timur Ismagilov wrote: > > Hello! > > As i can see, the bug is fixed, but in

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-19 Thread Mike Dubman
so, it seems you have old ofed w/o this parameter. Can you install latest Mellanox ofed? or check which community ofed has it? On Tue, Aug 19, 2014 at 9:34 AM, Rio Yokota wrote: > Here is what "modinfo mlx4_core" gives > > filename: > >

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-18 Thread Mike Dubman
most likely you installing old ofed which does not have this parameter: try: #modinfo mlx4_core and see if it is there. I would suggest install latest OFED or Mellanox OFED. On Mon, Aug 18, 2014 at 9:53 PM, Rio Yokota wrote: > I get "ofed_info: command not found". Note

Re: [OMPI users] mpi+openshmem hybrid

2014-08-14 Thread Mike Dubman
You can use hybrid mode. following code works for me with ompi 1.8.2 #include #include #include "shmem.h" #include "mpi.h" int main(int argc, char *argv[]) { MPI_Init(, ); start_pes(0); { int version = 0; int subversion = 0; int num_proc = 0; int

Re: [OMPI users] openib component not available

2014-07-24 Thread Mike Dubman
Hi, The openib btl is not compatible with "thread multiple" paradigm. You need to use mxm (lib on top of verbs) for ompi and threads. mxm is part of MOFED or you can download HPCX package (tarball of ompi + mxm) from http://mellanox.com/products/hpcx M On Thu, Jul 24, 2014 at 1:06 PM,

Re: [OMPI users] Salloc and mpirun problem

2014-07-16 Thread Mike Dubman
please add following flags to mpirun "--mca plm_base_verbose 10 --debug-daemons" and attach output. Thx On Wed, Jul 16, 2014 at 11:12 AM, Timur Ismagilov wrote: > Hello! > I have Open MPI v1.9a1r32142 and slurm 2.5.6. > > I can not use mpirun after salloc: > > $salloc -N2

Re: [OMPI users] poor performance using the openib btl

2014-06-25 Thread Mike Dubman
Hi what ofed/mofed are you using? what HCA, distro and command line? M On Wed, Jun 25, 2014 at 1:40 AM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > What are your threading options for OpenMPI (when it was built) ? > > I have seen OpenIB BTL completely lock when some

Re: [OMPI users] [warn] Epoll ADD(1) on fd 0 failed

2014-06-10 Thread Mike Dubman
btw, the output comes from ompi`s libevent and not from slurm itself (sorry about confusion and thanks to Yossi for catching this) opal/mca/event/libevent2021/libevent/epoll.c: event_warn("Epoll %s(%d) on fd %d failed. Old events were %d; read change was %d (%s); write change was %d (%s)",

Re: [OMPI users] OPENIB unknown transport errors

2014-06-07 Thread Mike Dubman
could you please attach output of "ibv_devinfo -v" and "ofed_info -s" Thx On Sat, Jun 7, 2014 at 12:53 AM, Tim Miller wrote: > Hi Josh, > > I asked one of our more advanced users to add the "-mca btl_openib_if_include > mlx4_0:1" argument to his job script. Unfortunately,

Re: [OMPI users] spml_ikrit_np random values

2014-06-06 Thread Mike Dubman
fixed here: https://svn.open-mpi.org/trac/ompi/changeset/31962 Thanks for report. On Thu, Jun 5, 2014 at 7:45 PM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: > seems oshmem_info uses uninitialized value. > we will check it, thanks for report. > > > On Thu, Jun 5, 2

Re: [OMPI users] Problem with yoda component in oshmem.

2014-06-06 Thread Mike Dubman
could you please provide command line ? On Fri, Jun 6, 2014 at 10:56 AM, Timur Ismagilov wrote: > Hello! > > I am using Open MPI v1.8.1 in > example program hello_oshmem.cpp. > > When I put spml_ikrit_np = 1000 (more than 4) and run task on 4 (2,1) > nodes, I get an: > in

Re: [OMPI users] spml_ikrit_np random values

2014-06-05 Thread Mike Dubman
seems oshmem_info uses uninitialized value. we will check it, thanks for report. On Thu, Jun 5, 2014 at 6:56 PM, Timur Ismagilov wrote: > Hello! > > I am using Open MPI v1.8.1. > > $oshmem_info -a --parsable | grep spml_ikrit_np > >

Re: [OMPI users] Deadly warning "Epoll ADD(4) on fd 2 failed." ?

2014-05-28 Thread Mike Dubman
I think it comes from PMI API used by OMPI/SLURM. SLURM`s libpmi is trying to control stdout/stdin which is already controlled by OMPI. On Tue, May 27, 2014 at 8:31 PM, Ralph Castain wrote: > I'm unaware of any OMPI error message like that - might be caused by > something in

Re: [OMPI users] no ikrit component of in oshmem

2014-04-23 Thread Mike Dubman
Hi Timur, What "configure" line you used? ikrit could be compile-it if no "--with-mxm=/opt/mellanox/mxm" was provided. Can you please attach your config.log? Thanks On Wed, Apr 23, 2014 at 3:10 PM, Тимур Исмагилов wrote: > Hi! > I am trying to build openmpi 1.8 with Open

Re: [OMPI users] probable bug in 1.9a1r31409

2014-04-16 Thread Mike Dubman
Hi, I committed your patch to the trunk. thanks M On Wed, Apr 16, 2014 at 6:49 PM, Mike Dubman <mi...@dev.mellanox.co.il>wrote: > +1 > looks good. > > > On Wed, Apr 16, 2014 at 4:35 PM, Åke Sandgren > <ake.sandg...@hpc2n.umu.se>wrote: > >> On 04/16/2014

Re: [OMPI users] probable bug in 1.9a1r31409

2014-04-16 Thread Mike Dubman
+1 looks good. On Wed, Apr 16, 2014 at 4:35 PM, Åke Sandgren wrote: > On 04/16/2014 02:25 PM, Åke Sandgren wrote: > >> Hi! >> >> Found this problem when building r31409 with Pathscale 5.0 >> >> pshmem_barrier.c:81:6: error: redeclaration of 'pshmem_barrier_all' must

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread Mike Dubman
Thanks for prompt help. Could you please resent the patch as attachment which can be applied with "patch" command, my mail client messes long lines. On Fri, Feb 14, 2014 at 7:40 AM, wrote: > > > Thanks. I'm not familiar with mindist mapper. But obviously > checking

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread Mike Dubman
Hi, after this patch we get this in jenkins: *07:03:15* [vegas12.mtr.labs.mlnx:01646] [[26922,0],0] ORTE_ERROR_LOG: Not implemented in file rmaps_mindist_module.c at line 391*07:03:15* [vegas12.mtr.labs.mlnx:01646] [[26922,0],0] ORTE_ERROR_LOG: Not implemented in file base/rmaps_base_map_job.c at

Re: [OMPI users] Get your Open MPI schwag!

2013-10-25 Thread Mike Dubman
t picture is copyrighted, Mike. While I enjoy the > enthusiasm, I actually suspect we would get into trouble using Chuck > Norris' name without first obtaining his permission. > > On Oct 25, 2013, at 2:28 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: > > ok, so - here is a final propo

Re: [OMPI users] Get your Open MPI schwag!

2013-10-25 Thread Mike Dubman
Exascale. Twice. > > :-) > > Damien > > > On 23/10/2013 4:26 PM, Shamis, Pavel wrote: > >> +1 for Chuck Norris >> >> Pavel (Pasha) Shamis >> --- >> Computer Science Research Group >> Computer Science and Math Division >> Oak Ridge Natio

Re: [OMPI users] Get your Open MPI schwag!

2013-10-23 Thread Mike Dubman
maybe to add some nice/funny slogan on the front under the logo, and cool picture on the back. some of community members are still in early twenties (and counting) . :) shall we open a contest for good slogan to put? and mid-size pict to put on the back side? - living the parallel world -

Re: [OMPI users] Big job, InfiniBand, MPI_Alltoallv and ibv_create_qp failed

2013-07-31 Thread Mike Dubman
Hi, What OFED vendor and version do you use? Regards M On Tue, Jul 30, 2013 at 8:42 PM, Paul Kapinos wrote: > Dear Open MPI experts, > > An user at our cluster has a problem running a kinda of big job: > (- the job using 3024 processes (12 per node, 252 nodes) runs

Re: [OMPI users] max. message size

2013-07-17 Thread Mike Dubman
do you use IB as a transport? max message size in IB/RDMA is limited to 2G, but OMPI 1.7 splits large buffers during RDMA into 2G chunks. On Wed, Jul 17, 2013 at 11:51 AM, mohammad assadsolimani < m.assadsolim...@jesus.ch> wrote: > > Dear all, > > I do my PhD in physics and use a program,

Re: [OMPI users] using the xrc queues

2013-07-09 Thread Mike Dubman
Hi, I would suggest use MXM (part of mofed, can be downloaded as standalone rpm from http://mellanox.com/products/mxm for ofed) It uses UD (constant memory footprint) and should provide good performance. The next MXM v2.0 will support RC and DC (reliable UD) as well. Once mxm is installed from

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-12 Thread Mike Dubman
Also, what ofed version (ofed_info -s) and mxm version (rpm -qi mxm) do you use? On Wed, Jun 12, 2013 at 3:30 AM, Ralph Castain wrote: > Great! Would you mind showing the revised table? I'm curious as to the > relative performance. > > > On Jun 11, 2013, at 4:53 PM,

Re: [OMPI users] Using Service Levels (SLs) with OpenMPI 1.6.4 + MLNX_OFED 2.0

2013-06-11 Thread Mike Dubman
--mca btl_openib_ib_path_record_**service_level 1 flag controls openib btl, you need to remove --mca mtl mxm from command line. Have you compiled OpenMPI with rhel6.4 inbox ofed driver? AFAIK, the MOFED 2.x does not have XRC and you mentioned "--enable-openib-connectx-xrc" flag in configure.

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-03 Thread Mike Dubman
Please download http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar, it contains mxm.rpm for mofed 1.5.4.1 On Mon, Dec 3, 2012 at 8:18 AM, Mike Dubman <mike.o...@gmail.com> wrote: > ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0 > will provide you a link to mxm pack

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-03 Thread Mike Dubman
ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0 will provide you a link to mxm package compiled with this MOFED version (thanks to no ABI in OFED). On Sun, Dec 2, 2012 at 10:04 PM, Joseph Farran wrote: > 1.5.4.1

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
please redownload from http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar it contains binaries compiled with mofed 1.5.3-3.1.0 M On Sun, Dec 2, 2012 at 12:13 PM, Mike Dubman <mike.o...@gmail.com> wrote: > > It seems that your active mofed is 1.5.3-3.1.0, while installed mxm wa

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
ecursive] Error 1 > make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' > make: *** [all-recursive] Error 1 > > > > On 12/2/2012 1:37 AM, Mike Dubman wrote: > > please change "--with-openib" to "--with-openib=/usr" and retry > configure

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
--with-fca=/opt/mellanox/fca\ > --with-mxm-libdir=/opt/mellanox/mxm/lib \ > --with-mxm=/opt/mellanox/mxm \ > --prefix=/data/openmpi-1-6.3 > > Please advise, > Joseph > > > > > > > On 12/1/2012 11:39 PM, Mike Dubman wrote

Re: [OMPI users] OpenMPI-1.6.3 & MXM

2012-12-02 Thread Mike Dubman
Hi, The mxm which is part of MOFED 1.5.3 supports OMPI 1.6.0. The mxm upgrade is needed to work with OMPI 1.6.3+ Please remove mxm from your cluster nodes (rpm -e mxm) Install latest from http://mellanox/com/products/mxm/ Compile ompi 1.6.3, add following to its configure line: ./configure

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
Hi Joseph, I guess you install MOFED under /usr, is that right? Could you please specify "--with-openib=/usr" parameter during ompi "configure" stage? 10x M On Fri, Nov 30, 2012 at 1:11 AM, Joseph Farran wrote: > Hi YK: > > Yes, I have those installed but they are newer

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-28 Thread Mike Dubman
You need mxm-1.1.3a5e745-1.x86_64-**rhel6u3.rpm On Wed, Nov 28, 2012 at 7:44 PM, Joseph Farran wrote: > mxm-1.1.3a5e745-1.x86_64-**rhel6u3.rpm >

Re: [OMPI users] application with mxm hangs on startup

2012-08-24 Thread Mike Dubman
Hi, Could you please download latest mxm from http://www.mellanox.com/products/mxm/ and retry? The mxm version which comes with OFED 1.5.3 was tested with OMPI 1.6.0. Regards M On Wed, Aug 22, 2012 at 2:22 PM, Pavel Mezentsev wrote: > I've tried to launch the

Re: [OMPI users] ompi mca mxm version

2012-05-11 Thread Mike Dubman
On May 9, 2012, at 7:41 PM, Mike Dubman wrote: > > > you need latest OMPI 1.6.x and latest MXM ( > ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar) > > Excellent! Thanks for the quick response! Using the MXM v1.1.1067 > against OMPI v1.6.x did the trick. Pleas

Re: [OMPI users] ompi mca mxm version

2012-05-09 Thread Mike Dubman
you need latest OMPI 1.6.x and latest MXM ( ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar) On Wed, May 9, 2012 at 6:02 AM, Derek Gerstmann wrote: > What versions of OpenMPI and the Mellanox MXM libraries have been tested > and verified to work? > > We are

Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-26 Thread Mike Dubman
so far did not happen yet - will report if it does. On Tue, Jan 24, 2012 at 5:10 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Ralph's fix has now been committed to the v1.5 trunk (yesterday). > > Did that fix it? > > > On Jan 22, 2012, at 3:40 PM, Mike Dubman wrot

Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-22 Thread Mike Dubman
it was compiled with the same ompi. We see it occasionally on different clusters with different ompi folders. (all v1.5) On Thu, Jan 19, 2012 at 5:44 PM, Ralph Castain wrote: > I didn't commit anything to the v1.5 branch yesterday - just the trunk. > > As I told Mike

Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-17 Thread Mike Dubman
It happens for us on RHEL 6.0 On Tue, Jan 17, 2012 at 3:46 AM, Ralph Castain wrote: > Well, I'm afraid I can't replicate your report. It runs fine for me. > > Sent from my iPad > > On Jan 16, 2012, at 4:25 PM, Ralph Castain wrote: > > >

Re: [OMPI users] MPI_Send doesn't work if the data >= 2GB

2010-12-06 Thread Mike Dubman
Hi, What interconnect and command line do you use? For InfiniBand openib component there is a known issue with large transfers (2GB) https://svn.open-mpi.org/trac/ompi/ticket/2623 try disabling memory pinning: http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned regards

Re: [OMPI users] openib issues

2010-08-10 Thread Mike Dubman
Hey Eloi, What HCA card do you have ? Can you post code/instructions howto reproduce it? 10x Mike On Mon, Aug 9, 2010 at 5:22 PM, Eloi Gaudry wrote: > Hi, > > Could someone have a look on these two different error messages ? I'd like > to know the reason(s) why they were displayed

[OMPI users] Error: system limit exceeded on number of pipes that can be open

2009-08-11 Thread Mike Dubman
Hello guys, When executing following command with mtt and ompi 1.3.3: mpirun --host witch15,witch15,witch15,witch15,witch16,witch16,witch16,witch16,witch17,witch17,witch17,witch17,witch18,witch18,witch18,witch18,witch19,witch19,witch19,witch19 -np 20 --mca btl_openib_use_srq 1 --mca btl

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-16 Thread Mike Dubman
Hello Ralph, It seems that Option2 is preferred, because it is more intuitive for end-user to create rankfile for mpi job, which is described by -app cmd line. All hosts definitions used inside -app , will be treated like a single global hostlist combined from all hosts appearing inside "-app

Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-22 Thread Mike Dubman
using 2 HCAs on the same PCI-Exp bus (as well as 2 ports from the same HCA) will not improve performance, PCI-Exp is the bottleneck. On Mon, Oct 20, 2008 at 2:28 AM, Mostyn Lewis wrote: > Well, here's what I see with the IMB PingPong test using two ConnectX DDR > cards >