[OMPI users] OpenMPI fails with np > 65

2014-08-10 Thread Lenny Verkhovsky
ideas ? Thanks. Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com<http://www.mellanox.com> Office:+972 74 712 9244 Mobile: +972 54 554 0233 Fax:+972 72 257 9400

Re: [OMPI users] OpenMPI fails with np > 65

2014-08-11 Thread Lenny Verkhovsky
I don't think so, It's always the 66th node, even if I swap between 65th and 66th I also get the same error when setting np=66, while having only 65 hosts in hostfile (I am using only tcp btl ) Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com<http://www.mellanox.

Re: [OMPI users] OpenMPI fails with np > 65

2014-08-12 Thread Lenny Verkhovsky
[node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG: Error in file base/ess_base_std_orted.c at line 288 Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com<http://www.mellanox.com> Office:+972 74 712 9244 Mobile: +972 54 554 0233 Fax:+972 72 257 9

Re: [OMPI users] OpenMPI fails with np > 65

2014-08-13 Thread Lenny Verkhovsky
OMPI from trunk [node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG: Error in file base/ess_base_std_orted.c at line 288 Thanks. Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com<http://www.mellanox.com> Office:+972 74 712 9244 Mobile: +972

Re: [OMPI users] OpenMPI vs Intel MPI

2009-07-02 Thread Lenny Verkhovsky
Hi, I am not an HPL expert, but this might help. 1. rankfile mapper is avaliale only from Open MPI 1.3 version, if you are using Open MPI 1.2.8 try -mca mpi_paffinity_alone 1 2. if you are using Open MPI 1.3 you dont have to use mpi_leave_pinned 1 , since it's a default value Lenny. On Thu,

Re: [OMPI users] enable-mpi-threads

2009-07-09 Thread Lenny Verkhovsky
BTW, What kind of threads Open MPI supports ? I found in the https://svn.open-mpi.org/trac/ompi/browser/trunk/README that we support MPI_THREAD_MULTIPLE, and found few unclear mails about MPI_THREAD_FUNNELED and MPI_THREAD_SERIALIZED. Also found nothing in FAQ :(. Thanks,Lenny. On Thu, Jul 2,

Re: [OMPI users] enable-mpi-threads

2009-07-09 Thread Lenny Verkhovsky
I guess this question was already before https://svn.open-mpi.org/trac/ompi/ticket/1367 On Thu, Jul 9, 2009 at 10:35 AM, Lenny Verkhovsky < lenny.verkhov...@gmail.com> wrote: > BTW, What kind of threads Open MPI supports ? > I found in the https://svn.open-mpi.org/trac/ompi/b

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-14 Thread Lenny Verkhovsky
; >>>> >>>> but "mpirun -np 3 ./something" will work though. It works, when you ask >>>> for 1 CPU less. And the same behavior in any case (shared nodes, non-shared >>>> nodes, multi-node) >>>> >>>> If you switch off rmaps_base_no_o

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-14 Thread Lenny Verkhovsky
dellix7 dellix7 Thanks Lenny. On Tue, Jul 14, 2009 at 4:59 PM, Ralph Castain <r...@open-mpi.org> wrote: > Strange - let me have a look at it later today. Probably something simple > that another pair of eyes might spot. > On Jul 14, 2009, at 7:43 AM, Lenny Verkhovsky wrote: > >

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-14 Thread Lenny Verkhovsky
, Jul 14, 2009 at 7:08 PM, Ralph Castain <r...@open-mpi.org> wrote: > Run it without the appfile, just putting the apps on the cmd line - does it > work right then? > > On Jul 14, 2009, at 10:04 AM, Lenny Verkhovsky wrote: > > additional info > I am running mpirun on ho

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-15 Thread Lenny Verkhovsky
following: > > -np 1 -H witch1 hostname > -np 1 -H witch2 hostname > > That should get you what you want. > Ralph > > On Jul 14, 2009, at 10:29 AM, Lenny Verkhovsky wrote: > > No, it's not working as I expect , unless I expect somthing wrong . > ( sorry for the lo

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-15 Thread Lenny Verkhovsky
orry to have to keep asking you to try things - I don't have a setup here > where I can test this as everything is RM managed. > > > On Jul 15, 2009, at 12:09 AM, Lenny Verkhovsky wrote: > > > Thanks Ralph, after playing with prefixes it worked, > > I still have a pro

Re: [OMPI users] selectively bind MPI to one HCA out of available ones

2009-07-15 Thread Lenny Verkhovsky
Make sure you have Open MPI 1.3 series, I dont think the if_include param is not avaliable in 1.2 series. max btls controls fragmentation and load balancing over similar BTLS ( example using LMC > 0, or 2 ports connected to 1 network ) you need if_include param On Wed, Jul 15, 2009 at 4:20 PM,

Re: [OMPI users] Help: Processors Binding

2009-08-03 Thread Lenny Verkhovsky
Hi, you can find a lot of useful information under FAQ section *http://www.open-mpi.org/faq/* http://www.open-mpi.org/faq/?category=tuning#paffinity-defs Lenny. On Mon, Aug 3, 2009 at 11:55 AM, Lee Amy wrote: > Hi, > > Dose OpenMPI has the processors binding like

Re: [OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"

2009-08-04 Thread Lenny Verkhovsky
Hi, I am looking too for a file example of rules for dynamic collectives, Have anybody tried it ? Where can I find a proper syntax for it ? thanks. Lenny. On Thu, Jul 23, 2009 at 3:08 PM, Igor Kozin wrote: > Hi Gus, > I played with collectives a few months ago.

Re: [OMPI users] bin/orted: Command not found.

2009-08-10 Thread Lenny Verkhovsky
try specifing -prefix in the command line ex: mpirun -np 4 -prefix $MPIHOME ./app Lenny. On Sat, Aug 8, 2009 at 5:04 PM, Kenneth Yoshimoto wrote: > > I don't own these nodes, so I have to use them with > whatever path setups they came with. In particular, > my home

Re: [OMPI users] compile mpi program on Cell BE

2009-08-10 Thread Lenny Verkhovsky
can this be related ? http://www.open-mpi.org/faq/?category=building#build-qs22 On Sun, Aug 9, 2009 at 12:22 PM, Attila Börcs wrote: > Hi Everyone, > > What the regular method of compiling and running mpi code on Cell Broadband > ppu-gcc and spu-gcc? > > > Regards, > >

Re: [OMPI users] Failure trying to use tuned collectives

2009-08-10 Thread Lenny Verkhovsky
By default coll framework scans all avaliable modules and sets the avaliable functions with the highest priorities. So, to use tuned collectives explicetly you can higher it's priority. -mca coll_tuned_priority 100 p.s. Collective modules can have only partial set of avaliable functions, for

Re: [OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-13 Thread Lenny Verkhovsky
Hi, 1. The Mellanox has a newer fw for those HCAshttp:// www.mellanox.com/content/pages.php?pg=firmware_table_IH3Lx I am not sure if it will help, but newer fw usually have some bug fixes. 2. try to disable leave_pinned during the run. It's on by default in 1.3.3 Lenny. On Thu, Aug 13, 2009 at

Re: [OMPI users] Help: How to accomplish processors affinity

2009-08-17 Thread Lenny Verkhovsky
Hi http://www.open-mpi.org/faq/?category=tuning#using-paffinity I am not familiar with this cluster, but in the FAQ ( see link above ) you can find an example of the rankfile. another simple example is the following: $cat rankfile rank 0=host1 slot=0 rank 1=host2 slot=0 rank 2=host3 slot=0 rank

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Lenny Verkhovsky
Hi This message means that you are trying to use host "plankton", that was not allocated via hostfile or hostlist. But according to the files and command line, everything seems fine. Can you try using "plankton.uzh.ch" hostname instead of "plankton". thanks Lenny. On Mon, Aug 17, 2009 at 10:36

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Lenny Verkhovsky
the full names makes it work! > Is there a reason why the rankfile option treats > host names differently than the hostfile option? > > Thanks > Jody > > > > On Mon, Aug 17, 2009 at 11:20 AM, Lenny > Verkhovsky<lenny.verkhov...@gmail.com> wrote: > > Hi &

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Lenny Verkhovsky
.ch slots=1 max-slots=1 >> >> Then this works fine: >> [jody@aim-plankton neander]$ mpirun -np 4 -hostfile th_021 -rf rf_02 >> ./HelloMPI >> >> Is there an explanation for this? >> >> Thank You >> Jody >> >> Lenny. >>>

Re: [OMPI users] problem with LD_LIBRARY_PATH???

2009-08-19 Thread Lenny Verkhovsky
sound like environmental problems. try running $mpirun -prefix/home/jean/openmpisof/ .. Lenny. On Wed, Aug 19, 2009 at 5:36 PM, Jean Potsam wrote: > Hi All, > I'm a trying to install openmpi with self. However, I am > experiencing some problems with openmpi

Re: [OMPI users] Program runs successfully...but with error messages displayed

2009-08-27 Thread Lenny Verkhovsky
mostlike that you compiled MPI with --with-openib flag, but since there are no openib devices avaliable on n06 machine, you got an error. you can "disable" this message by either recompilnig Open MPI without openib flag, or by disabling openib btl -mca btl ^openib or -mca btl sm,self,tcp Lenny.

[OMPI users] VMware and OpenMPI

2009-08-27 Thread Lenny Verkhovsky
Hi all, Does OpenMPI support VMware ? I am trying to run OpenMPI 1.3.3 on VMware and it got stacked during OSU benchmarks and IMB. looks like random deadlock, I wander if anyone have ever tried it ? thanks, Lenny.

Re: [OMPI users] Help: OFED Version problem

2009-08-31 Thread Lenny Verkhovsky
you need to check the release notes, and compare the differences. also check the Open MPI version in both of them. In general it's not so good idea to run different versions of the software for performance comparison or art all. since both of them are Open source, backward computability is not

Re: [OMPI users] rankfile error on openmpi/1.3.3

2009-09-01 Thread Lenny Verkhovsky
please try using full ( drdb0235.en.desres.deshaw.com ) hostname in the hostfile/rankfile. It should help. Lenny. On Mon, Aug 31, 2009 at 7:43 PM, Ralph Castain wrote: > I'm afraid the rank-file mapper in 1.3.3 has several known problems that > have been described on the list

Re: [OMPI users] rankfile error on openmpi/1.3.3

2009-09-01 Thread Lenny Verkhovsky
I changed error message, I hope it will be more clear now. r21919. On Tue, Sep 1, 2009 at 2:13 PM, Lenny Verkhovsky <lenny.verkhov...@gmail.com > wrote: > please try using full ( drdb0235.en.desres.deshaw.com ) hostname > in the hostfile/rankfile. > It should help. > Lenny.

Re: [OMPI users] Help: Infiniband interface hang

2009-09-02 Thread Lenny Verkhovsky
have you tried running hostname $mpirun -np 2 --mca btl openib,self --host node1,node2 hostname if it hangs, it's not Open MPI problem, check your setup, especially check your firewall settings and disable it. On Wed, Sep 2, 2009 at 2:06 PM, Lee Amy wrote: > Hi, > > I

Re: [OMPI users] [OMPI devel] Error message improvement

2009-09-09 Thread Lenny Verkhovsky
UNCTION__ is not portable. > __func__ is but it needs a C99 compliant compiler. > > --Nysal > > On Tue, Sep 8, 2009 at 9:06 PM, Lenny Verkhovsky < > lenny.verkhov...@gmail.com> wrote: > >> fixed in r21952 >> thanks. >> >> On Tue, Sep 8, 2009 at 5:08 PM, Art

Re: [OMPI users] unable to access or execute

2009-09-15 Thread Lenny Verkhovsky
you can use a shared ( i.e. NFS ) folder with this app, or provide a full PATH to it. ex: $mpirun -np 2 -hostfile hostfile /home/user/app 2009/9/15 Dominik Táborský > So I have to manually copy the compiled hello world program to all of > the nodes so that they can be

Re: [OMPI users] cartofile

2009-09-21 Thread Lenny Verkhovsky
Hi Eugene, carto file is a file with a staic graph topology of your node. in the opal/mca/carto/file/carto_file.h you can see example. ( yes I know that , it should be help/man list :) ) Basically it describes a map of your node and inside interconnection. Hopefully it will be discovered

Re: [OMPI users] mpirun failure

2009-10-18 Thread Lenny Verkhovsky
you can use full path to mpirun, you can also set prefix $mpirun -prefix path/to/mpi/home -np . Lenny. On Sun, Oct 18, 2009 at 12:03 PM, Oswald Benedikt wrote: > Hi, thanks, that's what puzzled when I saw the reference to 1.3, but the > LD_LIBRARY_PATH was set to

Re: [OMPI users] mpirun not working on more than one node

2009-11-17 Thread Lenny Verkhovsky
I noticed that you also have different versions of OMPI. You have 1.3.2 on node1 and 1.3 on node2. can you try to put same versions of OMPI on both nodes. can you also try running np 16 on node1 when you try running separately. Lenny. On Tue, Nov 17, 2009 at 5:45 PM, Laurin Müller

Re: [OMPI users] Strange problem with 1.2.6

2008-07-14 Thread Lenny Verkhovsky
maybe it's related to #1378 PML ob1 deadlock for ping/ping ? On 7/14/08, Jeff Squyres wrote: > > What application is it? The majority of the message passing engine did not > change in the 1.2 series; we did add a new option into 1.2.6 for disabling > early completion: > >

Re: [OMPI users] OpenMPI 1.4 nightly

2008-07-31 Thread Lenny Verkhovsky
try to use only openib make sure you use nightly after r19092 On 7/31/08, Gabriele Fatigati wrote: > > Mm, i've tried to disable shared memory but the problem remains. Is it > normal? > > 2008/7/31 Jeff Squyres > >> There is very definitely a shared

Re: [OMPI users] Fail to install openmpi 1.2.5 on bladecenter with OFED 1.3

2008-08-13 Thread Lenny Verkhovsky
Hi, check in /usr/lib it's usually folder for 32bit libraries. I think OFED1.3 comes already with Open MPI so it should be installed by default. BTW, OFED1.3.1 comes with Open MPI 1.2.6 . Lenny. On 8/12/08, Mohd Radzi Nurul Azri wrote: > > Hi, > > > Thanks for the

Re: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Lenny Verkhovsky
./mpi_p1_4_TRUNK -t lt LT (2) (size min max avg) 1 3.443480 3.443480 3.443480 Best regards Lenny. On 10/6/08, Jeff Squyres <jsquy...@cisco.com> wrote: > > On Oct 5, 2008, at 1:22 PM, Lenny Verkhovsky wrote: > > you should probably use -mca tcp,self -mca btl_openib_if_include ib0.8

Re: [OMPI users] Working with a CellBlade cluster

2008-10-23 Thread Lenny Verkhovsky
Hi, If I understand you correctly the most suitable way to do it is by paffinity that we have in Open MPI 1.3 and the trank. how ever usually OS is distributing processes evenly between sockets by it self. There still no formal FAQ due to a multiple reasons but you can read how to use it in the

Re: [OMPI users] Working with a CellBlade cluster

2008-10-23 Thread Lenny Verkhovsky
s_rank_file_". > Do you have idea when OpenMPI 1.3 will be available? OpenMPI 1.3 has quite > a few features I'm looking for. > > Thanks, > Mi > [image: Inactive hide details for "Lenny Verkhovsky" > <lenny.verkhov...@gmail.com>]"Lenny Verkhovsky" <

Re: [OMPI users] Working with a CellBlade cluster

2008-10-27 Thread Lenny Verkhovsky
urrent kernel. > > Mi > [image: Inactive hide details for "Lenny Verkhovsky" > <lenny.verkhov...@gmail.com>]"Lenny Verkhovsky" < > lenny.verkhov...@gmail.com> > > > > *"Lenny Verkhovsky" <lenny.verkhov...@gmail.com>* >

Re: [OMPI users] dual cores

2008-11-10 Thread Lenny Verkhovsky
you can also press "f" while"top" is running and choose option "j" this way you will see what CPU is chosen under column P Lenny. On Mon, Nov 10, 2008 at 7:38 AM, Hodgess, Erin wrote: > great! > > Thanks, > Erin > > > Erin M. Hodgess, PhD > Associate Professor > Department

Re: [OMPI users] Hybrid program

2008-11-23 Thread Lenny Verkhovsky
Hi, Sorry for not answering sooner, In Open MPI 1.3 we added a paffinity mapping module. The syntax is quite simple and flexible: rank N=hostA slot=socket:core_range rank M=hostB slot=cpu see the fallowing example: ex: #mpirun -rf rankfile_name ./app #cat rankfile_name rank 0=host1

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-09 Thread Lenny Verkhovsky
maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ?? On 12/5/08, Justin wrote: > > The reason i'd like to disable these eager buffers is to help detect the > deadlock better. I would not run with this for a normal run but it would be > useful for

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-09 Thread Lenny Verkhovsky
also see https://svn.open-mpi.org/trac/ompi/ticket/1449 On 12/9/08, Lenny Verkhovsky <lenny.verkhov...@gmail.com> wrote: > > maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ?? > > On 12/5/08, Justin <luitj...@cs.utah.edu> wrote: >> >

Re: [OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-09 Thread Lenny Verkhovsky
Hi, 1. please, provide #cat /proc/cpu_info 2. see http://www.open-mpi.org/faq/?category=tuning#paffinity-defs. Best regards Lenny.

Re: [OMPI users] Bug in 1.3 nightly

2008-12-16 Thread Lenny Verkhovsky
I didn't see any errors on 1.3rc3r20130, I am running mtt nightly and it seems to be fine on x86-64 Centos5. On Tue, Dec 16, 2008 at 10:27 AM, Gabriele Fatigati wrote: > Dear OpenMPI developers, > trying to compile 1.3 nightly version , i get the follow error: > >

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-04 Thread Lenny Verkhovsky
Hi, just to make sure, you wrote in the previous mail that you tested IMB-MPI1 and it "reports for the last test" , and the results are for "processes=6", since you have 4 and 8 core machines, this test could be run on the same 8 core machine over shared memory and not over Infiniband, as

Re: [OMPI users] OpenMPI hangs across multiple nodes.

2009-02-04 Thread Lenny Verkhovsky
what kind of communication between nodes do you have - tcp, openib ( IB/IWARP ) ? you can try mpirun -np 4 -host node1,node2 -mca btl tcp,self random On Wed, Feb 4, 2009 at 1:21 AM, Ralph Castain wrote: > Could you tell us which version of OpenMPI you are using, and how it was

Re: [OMPI users] OpenMPI 1.3.1 rpm build error

2009-03-01 Thread Lenny Verkhovsky
We saw the same problem with compilation, the workaround for us was configuring without vt ( ./configure --help ). I hope vt guys will fix it somewhen . Lenny. On Mon, Feb 23, 2009 at 11:48 PM, Jeff Squyres wrote: > It would be interesting to see what happens with the 1.3

Re: [OMPI users] Problem with MPI_Comm_spawn_multiple & MPI_Info_fre

2009-03-10 Thread Lenny Verkhovsky
can you try Open MPI 1.3, Lenny. On 3/10/09, Tee Wen Kai wrote: > > Hi, > > I am using version 1.2.8. > > Thank you. > > Regards, > Wenkai > > --- On *Mon, 9/3/09, Ralph Castain * wrote: > > > From: Ralph Castain > Subject: Re: [OMPI users]

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-12 Thread Lenny Verkhovsky
Hi, The first "crash" is OK, since your rankfile has ranks 0 and 1 defined, while n=1, which means only rank 0 is present and can be allocated. NP must be >= the largest rank in rankfile. What exactly are you trying to do ? I tried to recreate your seqv but all I got was

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-20 Thread Lenny Verkhovsky
> orterun: clean termination accomplished >> > > >> > > >> > > >> > > Message: 4 >> > > Date: Tue, 14 Apr 2009 06:55:58 -0600 >> > > From: Ralph Castain <r...@lanl.gov> >> > > Subject: Re: [OMPI users] 1.3.1 -

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-05 Thread Lenny Verkhovsky
i.org> >>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> >>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >>> DelSp="yes" >>> >>> The rankfile cuts across

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-05 Thread Lenny Verkhovsky
he > additional procs either byslot (default) or bynode (if you specify that > option). So the rankfile doesn't need to contain an entry for every proc. > > Just don't want to confuse folks. > Ralph > > > > On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky < > len

Re: [OMPI users] mpirun fails on remote applications

2009-05-12 Thread Lenny Verkhovsky
sounds like firewall problems to or from anfield04. Lenny, On Tue, May 12, 2009 at 8:18 AM, feng chen wrote: > hi all, > > First of all,i'm new to openmpi. So i don't know much about mpi setting. > That's why i'm following manual and FAQ suggestions from the beginning. >

[MTT users] mtt IBM reduce_scatter_in_place test failure

2008-09-17 Thread Lenny Verkhovsky
I am running mtt test on our cluster and I found error for IBM reduce_scatter_in_place test for np>8 /home/USERS/lenny/OMPI_1_3_TRUNK/bin/mpirun -np 10 -H witch2 ./reduce_scatter_in_place **WARNING**]: MPI_COMM_WORLD rank 4, file reduce_scatter_in_place.c:80: bad answer (0) at index 0 of 1000