ideas ?
Thanks.
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.com>
Office:+972 74 712 9244
Mobile: +972 54 554 0233
Fax:+972 72 257 9400
I don't think so,
It's always the 66th node, even if I swap between 65th and 66th
I also get the same error when setting np=66, while having only 65 hosts in
hostfile
(I am using only tcp btl )
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.
[node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG:
Error in file base/ess_base_std_orted.c at line 288
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.com>
Office:+972 74 712 9244
Mobile: +972 54 554 0233
Fax:+972 72 257 9
OMPI from trunk
[node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG:
Error in file base/ess_base_std_orted.c at line 288
Thanks.
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.com>
Office:+972 74 712 9244
Mobile: +972
Hi,
I am not an HPL expert, but this might help.
1. rankfile mapper is avaliale only from Open MPI 1.3 version, if you are
using Open MPI 1.2.8 try -mca mpi_paffinity_alone 1
2. if you are using Open MPI 1.3 you dont have to use mpi_leave_pinned 1 ,
since it's a default value
Lenny.
On Thu,
BTW, What kind of threads Open MPI supports ?
I found in the https://svn.open-mpi.org/trac/ompi/browser/trunk/README that
we support MPI_THREAD_MULTIPLE,
and found few unclear mails about MPI_THREAD_FUNNELED and
MPI_THREAD_SERIALIZED.
Also found nothing in FAQ :(.
Thanks,Lenny.
On Thu, Jul 2,
I guess this question was already before
https://svn.open-mpi.org/trac/ompi/ticket/1367
On Thu, Jul 9, 2009 at 10:35 AM, Lenny Verkhovsky <
lenny.verkhov...@gmail.com> wrote:
> BTW, What kind of threads Open MPI supports ?
> I found in the https://svn.open-mpi.org/trac/ompi/b
;
>>>>
>>>> but "mpirun -np 3 ./something" will work though. It works, when you ask
>>>> for 1 CPU less. And the same behavior in any case (shared nodes, non-shared
>>>> nodes, multi-node)
>>>>
>>>> If you switch off rmaps_base_no_o
dellix7
dellix7
Thanks
Lenny.
On Tue, Jul 14, 2009 at 4:59 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Strange - let me have a look at it later today. Probably something simple
> that another pair of eyes might spot.
> On Jul 14, 2009, at 7:43 AM, Lenny Verkhovsky wrote:
>
>
, Jul 14, 2009 at 7:08 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Run it without the appfile, just putting the apps on the cmd line - does it
> work right then?
>
> On Jul 14, 2009, at 10:04 AM, Lenny Verkhovsky wrote:
>
> additional info
> I am running mpirun on ho
following:
>
> -np 1 -H witch1 hostname
> -np 1 -H witch2 hostname
>
> That should get you what you want.
> Ralph
>
> On Jul 14, 2009, at 10:29 AM, Lenny Verkhovsky wrote:
>
> No, it's not working as I expect , unless I expect somthing wrong .
> ( sorry for the lo
orry to have to keep asking you to try things - I don't have a setup here
> where I can test this as everything is RM managed.
>
>
> On Jul 15, 2009, at 12:09 AM, Lenny Verkhovsky wrote:
>
>
> Thanks Ralph, after playing with prefixes it worked,
>
> I still have a pro
Make sure you have Open MPI 1.3 series,
I dont think the if_include param is not avaliable in 1.2 series.
max btls controls fragmentation and load balancing over similar BTLS (
example using LMC > 0, or 2 ports connected to 1 network )
you need if_include param
On Wed, Jul 15, 2009 at 4:20 PM,
Hi,
you can find a lot of useful information under FAQ section
*http://www.open-mpi.org/faq/*
http://www.open-mpi.org/faq/?category=tuning#paffinity-defs
Lenny.
On Mon, Aug 3, 2009 at 11:55 AM, Lee Amy wrote:
> Hi,
>
> Dose OpenMPI has the processors binding like
Hi,
I am looking too for a file example of rules for dynamic collectives,
Have anybody tried it ? Where can I find a proper syntax for it ?
thanks.
Lenny.
On Thu, Jul 23, 2009 at 3:08 PM, Igor Kozin wrote:
> Hi Gus,
> I played with collectives a few months ago.
try specifing -prefix in the command line
ex: mpirun -np 4 -prefix $MPIHOME ./app
Lenny.
On Sat, Aug 8, 2009 at 5:04 PM, Kenneth Yoshimoto wrote:
>
> I don't own these nodes, so I have to use them with
> whatever path setups they came with. In particular,
> my home
can this be related ?
http://www.open-mpi.org/faq/?category=building#build-qs22
On Sun, Aug 9, 2009 at 12:22 PM, Attila Börcs wrote:
> Hi Everyone,
>
> What the regular method of compiling and running mpi code on Cell Broadband
> ppu-gcc and spu-gcc?
>
>
> Regards,
>
>
By default coll framework scans all avaliable modules and sets the avaliable
functions with the highest priorities.
So, to use tuned collectives explicetly you can higher it's priority.
-mca coll_tuned_priority 100
p.s. Collective modules can have only partial set of avaliable functions,
for
Hi,
1.
The Mellanox has a newer fw for those HCAshttp://
www.mellanox.com/content/pages.php?pg=firmware_table_IH3Lx
I am not sure if it will help, but newer fw usually have some bug fixes.
2.
try to disable leave_pinned during the run. It's on by default in 1.3.3
Lenny.
On Thu, Aug 13, 2009 at
Hi
http://www.open-mpi.org/faq/?category=tuning#using-paffinity
I am not familiar with this cluster, but in the FAQ ( see link above ) you
can find an example of the rankfile.
another simple example is the following:
$cat rankfile
rank 0=host1 slot=0
rank 1=host2 slot=0
rank 2=host3 slot=0
rank
Hi
This message means
that you are trying to use host "plankton", that was not allocated via
hostfile or hostlist.
But according to the files and command line, everything seems fine.
Can you try using "plankton.uzh.ch" hostname instead of "plankton".
thanks
Lenny.
On Mon, Aug 17, 2009 at 10:36
the full names makes it work!
> Is there a reason why the rankfile option treats
> host names differently than the hostfile option?
>
> Thanks
> Jody
>
>
>
> On Mon, Aug 17, 2009 at 11:20 AM, Lenny
> Verkhovsky<lenny.verkhov...@gmail.com> wrote:
> > Hi
&
.ch slots=1 max-slots=1
>>
>> Then this works fine:
>> [jody@aim-plankton neander]$ mpirun -np 4 -hostfile th_021 -rf rf_02
>> ./HelloMPI
>>
>> Is there an explanation for this?
>>
>> Thank You
>> Jody
>>
>> Lenny.
>>>
sound like environmental problems.
try running
$mpirun -prefix/home/jean/openmpisof/ ..
Lenny.
On Wed, Aug 19, 2009 at 5:36 PM, Jean Potsam wrote:
> Hi All,
> I'm a trying to install openmpi with self. However, I am
> experiencing some problems with openmpi
mostlike that you compiled MPI with --with-openib flag, but since there are
no openib devices avaliable on
n06 machine, you got an error.
you can "disable" this message by either recompilnig Open MPI without openib
flag, or by disabling openib btl
-mca btl ^openib
or
-mca btl sm,self,tcp
Lenny.
Hi all,
Does OpenMPI support VMware ?
I am trying to run OpenMPI 1.3.3 on VMware and it got stacked during OSU
benchmarks and IMB.
looks like random deadlock, I wander if anyone have ever tried it ?
thanks,
Lenny.
you need to check the release notes, and compare the differences.
also check the Open MPI version in both of them.
In general it's not so good idea to run different versions of the software
for performance comparison or art all.
since both of them are Open source, backward computability is not
please try using full ( drdb0235.en.desres.deshaw.com ) hostname
in the hostfile/rankfile.
It should help.
Lenny.
On Mon, Aug 31, 2009 at 7:43 PM, Ralph Castain wrote:
> I'm afraid the rank-file mapper in 1.3.3 has several known problems that
> have been described on the list
I changed error message, I hope it will be more clear now.
r21919.
On Tue, Sep 1, 2009 at 2:13 PM, Lenny Verkhovsky <lenny.verkhov...@gmail.com
> wrote:
> please try using full ( drdb0235.en.desres.deshaw.com ) hostname
> in the hostfile/rankfile.
> It should help.
> Lenny.
have you tried running hostname
$mpirun -np 2 --mca btl openib,self --host node1,node2 hostname
if it hangs, it's not Open MPI problem, check your setup,
especially check your firewall settings and disable it.
On Wed, Sep 2, 2009 at 2:06 PM, Lee Amy wrote:
> Hi,
>
> I
UNCTION__ is not portable.
> __func__ is but it needs a C99 compliant compiler.
>
> --Nysal
>
> On Tue, Sep 8, 2009 at 9:06 PM, Lenny Verkhovsky <
> lenny.verkhov...@gmail.com> wrote:
>
>> fixed in r21952
>> thanks.
>>
>> On Tue, Sep 8, 2009 at 5:08 PM, Art
you can use a shared ( i.e. NFS ) folder with this app, or provide a full
PATH to it.
ex:
$mpirun -np 2 -hostfile hostfile /home/user/app
2009/9/15 Dominik Táborský
> So I have to manually copy the compiled hello world program to all of
> the nodes so that they can be
Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside interconnection.
Hopefully it will be discovered
you can use full path to mpirun, you can also set prefix
$mpirun -prefix path/to/mpi/home -np .
Lenny.
On Sun, Oct 18, 2009 at 12:03 PM, Oswald Benedikt wrote:
> Hi, thanks, that's what puzzled when I saw the reference to 1.3, but the
> LD_LIBRARY_PATH was set to
I noticed that you also have different versions of OMPI. You have 1.3.2 on
node1 and 1.3 on node2.
can you try to put same versions of OMPI on both nodes.
can you also try running np 16 on node1 when you try running separately.
Lenny.
On Tue, Nov 17, 2009 at 5:45 PM, Laurin Müller
maybe it's related to #1378 PML ob1 deadlock for ping/ping ?
On 7/14/08, Jeff Squyres wrote:
>
> What application is it? The majority of the message passing engine did not
> change in the 1.2 series; we did add a new option into 1.2.6 for disabling
> early completion:
>
>
try to use only openib
make sure you use nightly after r19092
On 7/31/08, Gabriele Fatigati wrote:
>
> Mm, i've tried to disable shared memory but the problem remains. Is it
> normal?
>
> 2008/7/31 Jeff Squyres
>
>> There is very definitely a shared
Hi,
check in /usr/lib it's usually folder for 32bit libraries.
I think OFED1.3 comes already with Open MPI so it should be installed by
default.
BTW, OFED1.3.1 comes with Open MPI 1.2.6 .
Lenny.
On 8/12/08, Mohd Radzi Nurul Azri wrote:
>
> Hi,
>
>
> Thanks for the
./mpi_p1_4_TRUNK
-t lt
LT (2) (size min max avg) 1 3.443480 3.443480 3.443480
Best regards
Lenny.
On 10/6/08, Jeff Squyres <jsquy...@cisco.com> wrote:
>
> On Oct 5, 2008, at 1:22 PM, Lenny Verkhovsky wrote:
>
> you should probably use -mca tcp,self -mca btl_openib_if_include ib0.8
Hi,
If I understand you correctly the most suitable way to do it is by paffinity
that we have in Open MPI 1.3 and the trank.
how ever usually OS is distributing processes evenly between sockets by it
self.
There still no formal FAQ due to a multiple reasons but you can read how to
use it in the
s_rank_file_".
> Do you have idea when OpenMPI 1.3 will be available? OpenMPI 1.3 has quite
> a few features I'm looking for.
>
> Thanks,
> Mi
> [image: Inactive hide details for "Lenny Verkhovsky"
> <lenny.verkhov...@gmail.com>]"Lenny Verkhovsky" <
urrent kernel.
>
> Mi
> [image: Inactive hide details for "Lenny Verkhovsky"
> <lenny.verkhov...@gmail.com>]"Lenny Verkhovsky" <
> lenny.verkhov...@gmail.com>
>
>
>
> *"Lenny Verkhovsky" <lenny.verkhov...@gmail.com>*
>
you can also press "f" while"top" is running and choose option "j"
this way you will see what CPU is chosen under column P
Lenny.
On Mon, Nov 10, 2008 at 7:38 AM, Hodgess, Erin wrote:
> great!
>
> Thanks,
> Erin
>
>
> Erin M. Hodgess, PhD
> Associate Professor
> Department
Hi,
Sorry for not answering sooner,
In Open MPI 1.3 we added a paffinity mapping module.
The syntax is quite simple and flexible:
rank N=hostA slot=socket:core_range
rank M=hostB slot=cpu
see the fallowing example:
ex:
#mpirun -rf rankfile_name ./app
#cat rankfile_name
rank 0=host1
maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
On 12/5/08, Justin wrote:
>
> The reason i'd like to disable these eager buffers is to help detect the
> deadlock better. I would not run with this for a normal run but it would be
> useful for
also see https://svn.open-mpi.org/trac/ompi/ticket/1449
On 12/9/08, Lenny Verkhovsky <lenny.verkhov...@gmail.com> wrote:
>
> maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
>
> On 12/5/08, Justin <luitj...@cs.utah.edu> wrote:
>>
>
Hi,
1. please, provide #cat /proc/cpu_info
2. see http://www.open-mpi.org/faq/?category=tuning#paffinity-defs.
Best regards
Lenny.
I didn't see any errors on 1.3rc3r20130, I am running mtt nightly
and it seems to be fine on x86-64 Centos5.
On Tue, Dec 16, 2008 at 10:27 AM, Gabriele Fatigati
wrote:
> Dear OpenMPI developers,
> trying to compile 1.3 nightly version , i get the follow error:
>
>
Hi, just to make sure,
you wrote in the previous mail that you tested IMB-MPI1 and it
"reports for the last test" , and the results are for
"processes=6", since you have 4 and 8 core machines, this test could
be run on the same 8 core machine over shared memory and not over
Infiniband, as
what kind of communication between nodes do you have - tcp, openib (
IB/IWARP ) ?
you can try
mpirun -np 4 -host node1,node2 -mca btl tcp,self random
On Wed, Feb 4, 2009 at 1:21 AM, Ralph Castain wrote:
> Could you tell us which version of OpenMPI you are using, and how it was
We saw the same problem with compilation,
the workaround for us was configuring without vt ( ./configure --help ).
I hope vt guys will fix it somewhen .
Lenny.
On Mon, Feb 23, 2009 at 11:48 PM, Jeff Squyres wrote:
> It would be interesting to see what happens with the 1.3
can you try Open MPI 1.3,
Lenny.
On 3/10/09, Tee Wen Kai wrote:
>
> Hi,
>
> I am using version 1.2.8.
>
> Thank you.
>
> Regards,
> Wenkai
>
> --- On *Mon, 9/3/09, Ralph Castain * wrote:
>
>
> From: Ralph Castain
> Subject: Re: [OMPI users]
Hi,
The first "crash" is OK, since your rankfile has ranks 0 and 1 defined,
while n=1, which means only rank 0 is present and can be allocated.
NP must be >= the largest rank in rankfile.
What exactly are you trying to do ?
I tried to recreate your seqv but all I got was
> orterun: clean termination accomplished
>> > >
>> > >
>> > >
>> > > Message: 4
>> > > Date: Tue, 14 Apr 2009 06:55:58 -0600
>> > > From: Ralph Castain <r...@lanl.gov>
>> > > Subject: Re: [OMPI users] 1.3.1 -
i.org>
>>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov>
>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>> DelSp="yes"
>>>
>>> The rankfile cuts across
he
> additional procs either byslot (default) or bynode (if you specify that
> option). So the rankfile doesn't need to contain an entry for every proc.
>
> Just don't want to confuse folks.
> Ralph
>
>
>
> On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky <
> len
sounds like firewall problems to or from anfield04.
Lenny,
On Tue, May 12, 2009 at 8:18 AM, feng chen wrote:
> hi all,
>
> First of all,i'm new to openmpi. So i don't know much about mpi setting.
> That's why i'm following manual and FAQ suggestions from the beginning.
>
I am running mtt test on our cluster and I found error for
IBM reduce_scatter_in_place test for np>8
/home/USERS/lenny/OMPI_1_3_TRUNK/bin/mpirun -np 10 -H witch2
./reduce_scatter_in_place
**WARNING**]: MPI_COMM_WORLD rank 4, file reduce_scatter_in_place.c:80:
bad answer (0) at index 0 of 1000
58 matches
Mail list logo