Re: [OMPI users] Cannot suppress openib error message

2007-10-24 Thread Dirk Eddelbuettel

On 24 October 2007 at 21:31, Jeff Squyres wrote:
| On Oct 24, 2007, at 9:23 PM, Dirk Eddelbuettel wrote:
| 
| > | If I had to guess, the systems where you don't see the warning are
| > | systems that have OFED loaded.
| >
| > I am pretty sure that none of the systems (at work) have IB  
| > hardware.  I am
| > very sure that my home systems do not, and there the 'btl = ^openib'
| > successfully suppresses the warning --- whereas at work it doesn't.
| 
| Note that you don't need to have IB hardware -- all you need is the  
| OFED software loaded.  I don't know if Debian ships the OFED  
| libraries by default...?  In particular, look for libibverbs:
| 
| [18:28] svbu-mpi:~/svn/ompi % ldd $bogus/lib/openmpi/mca_btl_openib.so
|  libibverbs.so.1 => /usr/lib64/libibverbs.so.1  
| (0x002a956c2000)
|  libnsl.so.1 => /lib64/libnsl.so.1 (0x002a957cd000)
|  libutil.so.1 => /lib64/libutil.so.1 (0x002a958e4000)
|  libm.so.6 => /lib64/tls/libm.so.6 (0x002a959e8000)
|  libpthread.so.0 => /lib64/tls/libpthread.so.0  
| (0x002a95b6e000)
|  libc.so.6 => /lib64/tls/libc.so.6 (0x002a95c83000)
|  libdl.so.2 => /lib64/libdl.so.2 (0x002a95eb8000)
|  /lib64/ld-linux-x86-64.so.2 (0x00552000)

Good point.  However, I use the .deb packages which are I build for Debian,
and they use libibverbs where available:

Build-Depends: [...], libibverbs-dev [!kfreebsd-i386 !kfreebsd-amd64 \
!hurd-i386], gfortran, libsysfs-dev, automake, gcc (>= 4:4.1.2)

in particular on i386. Consequently, the binary package ends up with a
Depends on the run-time package 'libibverbs1' -- and this will hence always
be present as all my systems use the .deb packages (either from Debian or
locally rebuild) that forces libibverbs1 in via this Depends.

At work, I re-build these same package under Ubuntu on my "head node".  And
on the head node, no warning is seen -- wherease my computes issue the
warning.

Could this be another one of the dlopen issues where basically
ldopen("libibverbs.so") 
is executed?   Because the compute nodes do NOT have libibverbs.so (from the
-dev package) but only libibverbs.so.1.0.0 and its matching symlink
libibverbs.so.1.

I just tested that hypothesis and install libibverbs-dev, but no beans. Still
get the warning. 

| However, I note something in your last reply that I may have missed  
| before -- can you clarify a point for me: are you saying that on your  
| home machine, this generates the openib "file not found" warning:
| 
|  mpirun -np 2 hello
| 
| but this does not:
| 
|  mpirun -np 2 --mca btl ^openib hello

More or less, but I use /etc/openmpi/openmci-mca-params.conf to toggle
^openib.  Adding it again as --mca btl ^openib changes nothing, unfortunately.

| If so, can you confirm which version of Open MPI you are running?   
| The only reason that I can think that that would happen is if you are  
| running a trunk nightly download of Open MPI...  If not, then there's  
| something else going on that would be worth understanding.

No, plain 1.2.4 from the original tarballs.

Still puzzled.  To recap, the head node and the compute node all use the same
Ubuntu release, use the same binary .deb packages from Open MPI 1.2.4 I
rebuild there.  The 'sole' difference is that the 'head node' has more
development packages and tools installed -- but that should not matter.  I
just re-checked and the compute node does not have any LAM or MPICH
parts remaining.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Cannot suppress openib error message

2007-10-24 Thread Jeff Squyres

On Oct 24, 2007, at 9:23 PM, Dirk Eddelbuettel wrote:


| If I had to guess, the systems where you don't see the warning are
| systems that have OFED loaded.

I am pretty sure that none of the systems (at work) have IB  
hardware.  I am

very sure that my home systems do not, and there the 'btl = ^openib'
successfully suppresses the warning --- whereas at work it doesn't.


Note that you don't need to have IB hardware -- all you need is the  
OFED software loaded.  I don't know if Debian ships the OFED  
libraries by default...?  In particular, look for libibverbs:


[18:28] svbu-mpi:~/svn/ompi % ldd $bogus/lib/openmpi/mca_btl_openib.so
libibverbs.so.1 => /usr/lib64/libibverbs.so.1  
(0x002a956c2000)

libnsl.so.1 => /lib64/libnsl.so.1 (0x002a957cd000)
libutil.so.1 => /lib64/libutil.so.1 (0x002a958e4000)
libm.so.6 => /lib64/tls/libm.so.6 (0x002a959e8000)
libpthread.so.0 => /lib64/tls/libpthread.so.0  
(0x002a95b6e000)

libc.so.6 => /lib64/tls/libc.so.6 (0x002a95c83000)
libdl.so.2 => /lib64/libdl.so.2 (0x002a95eb8000)
/lib64/ld-linux-x86-64.so.2 (0x00552000)

However, I note something in your last reply that I may have missed  
before -- can you clarify a point for me: are you saying that on your  
home machine, this generates the openib "file not found" warning:


mpirun -np 2 hello

but this does not:

mpirun -np 2 --mca btl ^openib hello

If so, can you confirm which version of Open MPI you are running?   
The only reason that I can think that that would happen is if you are  
running a trunk nightly download of Open MPI...  If not, then there's  
something else going on that would be worth understanding.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Cannot suppress openib error message

2007-10-24 Thread Dirk Eddelbuettel

On 24 October 2007 at 16:22, Jeff Squyres wrote:
| On Oct 24, 2007, at 4:16 PM, Dirk Eddelbuettel wrote:
| 
| > I buy that explanation any day, but what is funny is that the
| > btl = ^openib
| > does suppress the warning on some of my systems (all running 1.2.4)  
| > but not
| > others (also running 1.2.4).
| 
| If I had to guess, the systems where you don't see the warning are  
| systems that have OFED loaded.

I am pretty sure that none of the systems (at work) have IB hardware.  I am
very sure that my home systems do not, and there the 'btl = ^openib'
successfully suppresses the warning --- whereas at work it doesn't.

Must be a side-effect from something else. I made sure not lam libs were
left around.  

Dirk


-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] xcode and ompi

2007-10-24 Thread Jeff Squyres
Those are the three libraries that are typically required.  I don't  
know anything about xcode, so I don't know if there's any other  
secret sauce that you need to use.


Warner -- can you shed any light here?

To verify your Open MPI installation, you might want to try compiling  
a trivial MPI application outside of xcode with the simple "mpicc"  
wrapper compiler, such as:


mpicc mpi_hello_world.c -o mpi_hello_world

You can also see what underlying command mpicc is invoking with:

mpicc mpi_hello_world.c -o mpi_hello_world --showme

But I will pretty much guarantee that if you have mixed multiple MPI  
implementations (LAM and Open MPI) in the same directory tree, things  
won't work.  It would be best to fully uninstall one (e.g., LAM) and  
then re-install the other (e.g., Open MPI).  If you've lost the build  
directory for LAM, you can download a new source tarball from www.lam- 
mpi.org.





On Oct 21, 2007, at 11:13 PM, Tony Sheh wrote:


Hi all,

I'm working in xcode and i'm trying to build an application that
links against the OMPI libraries. So far i've included the following
files in the build:

libmpi.dylib
libopen-pal.dylib
libopen-rte.dylib

and the errors i get are

Undefined symbols:
 all the MPI functions you can think of..


as well as a warning: "suggest use of -bind_at_load, as lazy binding
may result in errors or different symbols being used

I've compiled and linked to the static libraries (using ./configure --
enable-static) and i get the same errors. Also, i previously the
latest version of lam/mpi installed. I didn't uninstall it since i
lost the original directory as well as the make and configure
settings. If that is the conflict then any information about how to
resolve it would be good.

Thanks!
Tony
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Number of processes and number of the cores.

2007-10-24 Thread Jeff Squyres
We don't really have this kind of fine-grained processor affinity  
control in Open MPI yet.


Is there a reason you want to oversubscribe cores this way?  Open MPI  
assumes that each process should be as aggressive as possible in  
terms of performance -- spinning heavily until progress can be made  
on message passing, etc.



On Oct 23, 2007, at 3:15 PM, Siamak Riahi wrote:


I have a question about using the open mpi.

I want to tie "N" number of processes to one core and "M" number of  
processes to another core. I want to know if open mpi is capable of  
doing that.


Thanks,

Siamak
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] problem with 'orted'

2007-10-24 Thread Jeff Squyres
By default, I believe that orte assumes that the orted is in the same  
location on all nodes.  If it's not, you should be able to use the  
following:


1. Make a sym link such that /usr/local/bin/orted appears on all of  
your nodes.  You implied that you tried this, but I find it hard to  
believe that that didn't work -- the error message you show clearly  
indicates that it's looking for /usr/local/bin/orted.  If it's there  
(and executable), it should work.


2. I assume you're using the rsh/ssh launcher.  If this is the case,  
use the mca_pls_rsh_orted MCA parameter to /usr/bin/orted.  E.g.:


mpirun --mca pls_rsh_orted /usr/bin/orted ...



On Oct 1, 2007, at 8:26 AM, Amit Kumar Saha wrote:


hello,

I am using Open MPI 1.2.3 to run a task on 4 hosts as follows:

amit@ubuntu-desktop-1:~/mpi-exec$ mpirun --np 4 --hostfile
mpi-host-file ParallelSearch
bash: /usr/local/bin/orted: No such file or directory

The problem is that 'orted' is not found on one of the 4 hosts. I
investigated the problem and found out that whereas 'orted' is stored
in /usr/local/bin on all the other 3 hosts, it is in /usr/bin/orted on
the erroneous host. I tried to create a soft link to solve the problem
but sadly it is not so simple, it seems.

It would be nice to know how to get around this problem.

Thanks,
Amit
--
Amit Kumar Saha
*NetBeans Community Docs Coordinator*
me blogs@ http://amitksaha.blogspot.com
URL:http://amitsaha.in.googlepages.com
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Merging Intracommunicators

2007-10-24 Thread Jeff Squyres
I believe that the second scenario that Sriram described is  
incorrect: you cannot merge independent intercommunicators into a  
single communicator (either intra or inter).



On Oct 18, 2007, at 4:36 PM, Murat Knecht wrote:


Hi,
I have a question regarding merging intracommunicators.
Using MPI_Spawn, I create on designated machines child processes,
retrieving an intercommunicator each time.
With MPI_Intercomm_Merge it is possible to get an intracommunicator
containing the master process(es) and the newly spawned child process.
The problem is to merge the intracommunicators into a single one.

I understand there is the possibilty to use the so created
intracommunicator from the first try in order to spawn the second  
child,

merge this one into the intracomm and continue like this.
This brings some considerable adminstrative overhead with it, as all
already spawned children must (be informed to) participate in the  
spawn

call.
I would rather merge all intercommunicators together in the end using
only the master process for spawning.
Both these possibilites have been mentioned in the following post.

http://www.lam-mpi.org/MailArchives/lam/2003/06/6226.php

While I understand the first one, I do not follow the second - I  
cannot

seem to find any method to merge multiple inter- or intracomms into a
single intracomm.
Groups cannot be used either, to collect the children and retrieve the
intracomm, because this is only used for subgrouping within an already
existing intracommunicator-group.
Is there a way to merge them the easy way, or did I misread the  
post above?


Thanks & best regards,
Murat
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Syntax error in remote rsh execution

2007-10-24 Thread Tim Prins

Glad you found the problem.

Don't worry about the '--num_proc 3'. This does not refer to the number 
of application processes, but rather the number of 'daemon' processes 
plus 1 for mpirun. However, this is an internal interface which changes 
on different versions of Open MPI, so this explanation is subject to 
change :)


Tim

Jorge Parra wrote:

Hi Tim,

Thank you for your reply.

You are right, my openMPI version is rather old. However I am stuck with 
it while I can compile v1.2.4. I have had some problems with it (I already 
opened a case on Oct 15th).


You were also right about my hostname. uname -n reports (none) and the 
"hostname" command did not exist in the nodes of my cluster. I already 
added it to the nodes and modified the /etc/hosts file. The error went 
away and now I can see that orted runs in the remote node. It is strange 
to me that orted runs with --num_proc 3 when mpirun was executed with -np 
2. Does this sound correct to you? I might open a new case for it 
though...



Thank you for your help,

Jorge

On Mon, 22 Oct 2007, Tim Prins wrote:


Sorry to reply to my own mail.

Just browsing through the logs you sent, and I see that 'hostname' should be
working fine. However, you are using v1.1.5 which is very old. I would
strongly suggest upgrading to v1.2.4. It is a huge improvement over the old
v1.1 series (which is not being maintained anymore).

Tim

On Monday 22 October 2007 08:41:30 pm Tim Prins wrote:

Hi Jorge,

This is interesting. The problem is the universe name:
root@(none):default-universe

The "(none)" part is supposed to be the hostname where mpirun is executed.
Try running:
hostname

and:
uname -n

These should both return valid hostnames for your machine.

Open MPI pretty much assumes that all nodes have a valid (preferably
unique) hostname. If the above commands don't work, you probably need to
fix your cluster.

Let me know if this does not work.

Thanks,

Tim

On Thursday 18 October 2007 09:22:09 pm Jorge Parra wrote:

Hi,

When trying to execute an application that spawns to another node, I
obtain the following message:

# ./mpirun --hostfile /root/hostfile -np 2 greetings
Syntax error: "(" unexpected (expecting ")")
-
- Could not execute the executable
"/opt/OpenMPI/OpenMPI-1.1.5b/exec/bin/greetings
": Exec format error

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be

found and executed.
-
-

and in the remote node:

# pam_rhosts_auth[183]: user root has a `+' user entry
pam_rhosts_auth[183]: allowed to root@192.168.1.102 as root
PAM_unix[183]: (rsh) session opened for user root by (uid=0)
in.rshd[184]: root@192.168.1.102 as root: cmd='( ! [ -e ./.profile ] || .
./.pro
file; orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0
--nodename 1
92.168.1.103 --universe root@(none):default-universe --nsreplica
"0.0.0;tcp://19
2.168.1.102:32774" --gprreplica "0.0.0;tcp://192.168.1.102:32774"
--mpi-call-yie
ld 0 )'
PAM_unix[183]: (rsh) session closed for user root

I suspect the command that rsh is trying to execute in the remote node
fails. It seems to me that the first parenthesis in cmd='( ! is not well
interpreted, thus causing the syntax error. This might prevent .profile
to run and to correctly set PATH. Therefore, "greetings" is not found.

I am attaching to this email the appropiate configuration files of my
system and openmpi on it. This is a system in an isolated network, so I
don't care too much for security. Therefore I am using rsh on it.

I would really appreciate any suggestions to correct this problem.

Thank you,

Jorge

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] 1.2.4 cross-compilation problem

2007-10-24 Thread Jeff Squyres
Well that's fun; I'm not sure why that would happen.  Can you send  
all the information listed here:


http://www.open-mpi.org/community/help/


On Oct 15, 2007, at 5:36 PM, Jorge Parra wrote:


Hi,

I am trying to cross-compile Open-mpi 1.2.4 for an embedded system.  
The development system is a i686 Linux and the target system is a  
ppc 405 based. When trying "make all" I get the following error:


/bin/sh ../../../libtool --tag=CC   --mode=link /opt/powerpc-405- 
linux/bin/powerpc-405-linux-gnu-gcc  -O3 -DNDEBUG -finline- 
functions -fno-strict-aliasing -pthread  -export-dynamic   -o  
opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl - 
lutil  -lm
libtool: link: /opt/powerpc-405-linux/bin/powerpc-405-linux-gnu-gcc  
-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o  
opal_wrapper opal_wrapper.o -Wl,--export-dynamic  ../../../ 
opal/.libs/libopen-pal.a -ldl -lnsl -lutil -lm -pthread
../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o)(.text+0xbe):  
In function `lt_dlinit':

: undefined reference to `lt_libltdlc_LTX_preloaded_symbols'
../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o)(.text+0xc2):  
In function `lt_dlinit':

: undefined reference to `lt_libltdlc_LTX_preloaded_symbols'
collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/opt/openmpi-1.2.4/opal/tools/wrappers'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/opt/openmpi-1.2.4/opal'
make: *** [all-recursive] Error 1

Older versions of opem-mpi have been succesfully compiled in the  
same development system. I am attaching to this email all the  
output and the configuration information.


Any help will greatly appreciated.

Thank you,

Jorge

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Tuning Openmpi with IB Interconnect

2007-10-24 Thread Jeff Squyres
Sorry I missed this message before... it got lost in the deluge that  
is my inbox.


Are you using the mpi_leave_pinned MCA parameter?  That will make a  
big difference on the typical ping-pong benchmarks:


mpirun --mca mpi_leave_pinned 1 



On Oct 11, 2007, at 11:44 AM, Matteo Cicuttin wrote:



Il giorno 11/ott/07, alle ore 07:16, Neeraj Chourasia ha scritto:


Dear All,

Could anyone tell me the important tuning parameters in  
openmpi with IB interconnect? I tried setting eager_rdma,  
min_rdma_size, mpi_leave_pinned parameters from the mpirun command  
line on 38 nodes cluster (38*2 processors) but in vain. I found  
simple mpirun with no mca parameters performing better. I  
conducted test on P2P send/receive with data size of 8MB.


Similarly i patched HPL linpack code with libnbc(non blocking  
collectives) and found no performance benefits. I went through its  
patch and found that, its probably not overlapping computation  
with communication.


Any help in this direction would be appreciated.
-Neeraj



Hi!

I'm Matteo, and I work for a company that produces HPC systems, in  
Italy.
I'm new in that company and I'm looking for some help, and this  
thread seems to be good :)
In the last days we're benchmarking a system, and I'm interested in  
some performance

scores of the infiniband interconnect.
The nodes are dual dual-core opteron machines and we use the PCI-X  
IB interfaces Mellanox Cougar Cub.

Machines have the 8111 system controller and the 8131 PCI-X bridge.
We reach a rate of about 600 MB/s in the point-to-point tests.
This rate (more or less) is reported both by the ib_*_bw benchmarks  
and the IMB-MPI (sendrecv) benchmarks, version 3.

MPI implementation is, of course, openmpi.
I've read in a few places that a similar setup can reach about 800  
MB/s on machines similar to those descripted above.
Someone can confirm this? Someone have similar hardware and the  
measured bandwidth is better than 600 MB/s?

Hints?Comments?

Thank you in advance,
Best regards,

---
Cicuttin Matteo
http://www.matteocicuttin.it
Black holes are where god divided by zero



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] MPI::BOTTOM vs MPI_BOTTOM

2007-10-24 Thread Jeff Squyres
Wow -- that has survived since LAM/MPI -- you're the first person to  
have ever noticed it.  :-)


I *think* it's just a wrong type, but I'd prefer to file a ticket so  
that someone gives it a bit more than a cursory examination before  
making the change.


Thanks for pointing it out!


On Oct 10, 2007, at 9:19 PM, Stephen Guzik wrote:


Hi,

To the Devs. I just noticed that MPI::BOTTOM requires a cast. Not sure
if that was intended.

Compiling 'MPI::COMM_WORLD.Bcast(MPI::BOTTOM, 1, someDataType, 0);'
results in:
error: invalid conversion from ‘const void*’ to ‘void*’
error: initializing argument 1 of ‘virtual void MPI::Comm::Bcast 
(void*,

int, const MPI::Datatype&, int) const’

MPI_BOTTOM, on the other hand, works without a cast.

Stephen

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Syntax error in remote rsh execution

2007-10-24 Thread Jorge Parra

Hi Tim,

Thank you for your reply.

You are right, my openMPI version is rather old. However I am stuck with 
it while I can compile v1.2.4. I have had some problems with it (I already 
opened a case on Oct 15th).


You were also right about my hostname. uname -n reports (none) and the 
"hostname" command did not exist in the nodes of my cluster. I already 
added it to the nodes and modified the /etc/hosts file. The error went 
away and now I can see that orted runs in the remote node. It is strange 
to me that orted runs with --num_proc 3 when mpirun was executed with -np 
2. Does this sound correct to you? I might open a new case for it 
though...



Thank you for your help,

Jorge

On Mon, 22 Oct 2007, Tim Prins wrote:


Sorry to reply to my own mail.

Just browsing through the logs you sent, and I see that 'hostname' should be
working fine. However, you are using v1.1.5 which is very old. I would
strongly suggest upgrading to v1.2.4. It is a huge improvement over the old
v1.1 series (which is not being maintained anymore).

Tim

On Monday 22 October 2007 08:41:30 pm Tim Prins wrote:

Hi Jorge,

This is interesting. The problem is the universe name:
root@(none):default-universe

The "(none)" part is supposed to be the hostname where mpirun is executed.
Try running:
hostname

and:
uname -n

These should both return valid hostnames for your machine.

Open MPI pretty much assumes that all nodes have a valid (preferably
unique) hostname. If the above commands don't work, you probably need to
fix your cluster.

Let me know if this does not work.

Thanks,

Tim

On Thursday 18 October 2007 09:22:09 pm Jorge Parra wrote:

Hi,

When trying to execute an application that spawns to another node, I
obtain the following message:

# ./mpirun --hostfile /root/hostfile -np 2 greetings
Syntax error: "(" unexpected (expecting ")")
-
- Could not execute the executable
"/opt/OpenMPI/OpenMPI-1.1.5b/exec/bin/greetings
": Exec format error

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be

found and executed.
-
-

and in the remote node:

# pam_rhosts_auth[183]: user root has a `+' user entry
pam_rhosts_auth[183]: allowed to root@192.168.1.102 as root
PAM_unix[183]: (rsh) session opened for user root by (uid=0)
in.rshd[184]: root@192.168.1.102 as root: cmd='( ! [ -e ./.profile ] || .
./.pro
file; orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0
--nodename 1
92.168.1.103 --universe root@(none):default-universe --nsreplica
"0.0.0;tcp://19
2.168.1.102:32774" --gprreplica "0.0.0;tcp://192.168.1.102:32774"
--mpi-call-yie
ld 0 )'
PAM_unix[183]: (rsh) session closed for user root

I suspect the command that rsh is trying to execute in the remote node
fails. It seems to me that the first parenthesis in cmd='( ! is not well
interpreted, thus causing the syntax error. This might prevent .profile
to run and to correctly set PATH. Therefore, "greetings" is not found.

I am attaching to this email the appropiate configuration files of my
system and openmpi on it. This is a system in an isolated network, so I
don't care too much for security. Therefore I am using rsh on it.

I would really appreciate any suggestions to correct this problem.

Thank you,

Jorge


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Cannot suppress openib error message

2007-10-24 Thread Jeff Squyres
This is quite likely because of a "feature" in how the OMPI v1.2  
series handles its plugins.  In OMPI <=v1.2.x, Open MPI opens all  
plugins that it can find and *then* applies the filter that you  
provide (e.g., via the "btl" MCA param) to close / ignore certain  
plugins.


In OMPI >=v1.3, we [effectively] apply the filter *before* opening  
plugins.  So "--mca btl ^openib" will actually prevent the openib BTL  
plugin from being loaded.


I'm guessing that what you're seeing today is because we're opening  
the openib BTL on a system where the OpenFabrics support libraries  
are not available, and therefore the dlopen() fails.  The error  
string that we get back from libltdl is the somewhat-misleading "file  
not found (ignored)", and that's what we print (note that ltdl is  
referring to the fact that a dependent library is not found).




On Oct 24, 2007, at 9:51 AM, Dirk Eddelbuettel wrote:



I've been scratching my head over this:

lnx01:/usr/lib> orterun -n 2  --mca btl ^openib  ~/c++/tests/mpitest
[lnx01:14417] mca: base: component_find: unable to open btl openib:  
file not found (ignored)
[lnx01:14418] mca: base: component_find: unable to open btl openib:  
file not found (ignored)

Hello world, I'm process 0
Hello world, I'm process 1
lnx01:/usr/lib> grep openib /etc/openmpi/openmpi-mca-params.conf
#   btl = ^openib
btl = ^openib
lnx01:/usr/lib> orterun -n 2   ~/c++/tests/mpitest
[lnx01:14429] mca: base: component_find: unable to open btl openib:  
file not found (ignored)
[lnx01:14430] mca: base: component_find: unable to open btl openib:  
file not found (ignored)

Hello world, I'm process 0
Hello world, I'm process 1

and when I strace it, I get

uname({sys="Linux", node="lnx01", ...}) = 0
open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf820698) = -1 ENOTTY  
(Inappropriate ioctl for device)

fstat64(3, {st_mode=S_IFREG|0644, st_size=2877, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,  
-1, 0) = 0xb7f72000

read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2877
read(3, "", 4096)   = 0
read(3, "", 8192)   = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8205f8) = -1 ENOTTY  
(Inappropriate ioctl for device)

close(3)= 0
munmap(0xb7f72000, 4096)= 0

Why can't I suppress the dreaded Infinityband message?

System is Ubuntu 7.04 with 'ported' (ie locally recompiled) current  
Open MPI packages

from Debian.

Dirk

--
Three out of two people have difficulties with fractions.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI 1.2.4 vs 1.2

2007-10-24 Thread Jeff Squyres

The changes in the 1.2 series are listed here:

http://svn.open-mpi.org/svn/ompi/branches/v1.2/NEWS

I'm surprised that your performance went down from v1.2 to v1.2.4.   
What networks were you testing, and how exactly did you test?



On Oct 24, 2007, at 12:14 AM, Neeraj Chourasia wrote:


Hello Guys,

I had openmpi v1.2 installed on my cluster. Couple of days  
back, i thought to upgrade it to v1.2.4(latest release i suppose).  
Since i didnt want to take risk, i first installed it on temporary  
location and did bandwidth and bidirectional bandwidth test  
provided by the OSU guys, and to my surprise, old version performs  
better in both scenarios.


Could anyone give me the reason for the same?

I repeated the above point to point  tests between all set of  
nodes, but the result were same :(


-Neeraj


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] orterun "by hand"

2007-10-24 Thread Brock Palen

If they are OSX machines adding password-less ssh is easy,

then you can make a nodefile with all the unique ip's,   if you can  
do that you can avoid putting a full resource manager on them.


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Oct 24, 2007, at 1:39 PM, Reuti wrote:


Hi,

Am 24.10.2007 um 19:21 schrieb George Bosilca:


There is no way to run Open MPI by hand, or at least not simple
way. How about xgrid on your OS X cluster ? Anyway, without a way
to start processes remotely it is really difficult to start up any
kind of parallel job.


just to note: with PVM it's possible, but rarely used I think.

-- Reuti



  george.

On Oct 24, 2007, at 12:06 PM, Dean Dauger, Ph. D. wrote:


Hello,

I'd like to run Open MPI "by hand".  I have a few ordinary
workstations I'd like to run a code using Open MPI on.  They're in
the same LAN, have unique IP addresses and hostnames, and I've
installed the default Open MPI package, and I've compiled an MPI app
against the Open MPI libraries and copied the executable to each
machine, but let's assume these machines do not have BProc, Torque,
PBS, SLURM, rsh or ssh access to each other, or NFS.  I'm looking at
the shell of each node: what do I type in to make Open MPI go?

If it matters, they're OS X Macs. I am welcome to be enlightened if
I've missed the documentation for this scenario.

Thanks,
 Dean

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] orterun "by hand"

2007-10-24 Thread Reuti

Hi,

Am 24.10.2007 um 19:21 schrieb George Bosilca:

There is no way to run Open MPI by hand, or at least not simple  
way. How about xgrid on your OS X cluster ? Anyway, without a way  
to start processes remotely it is really difficult to start up any  
kind of parallel job.


just to note: with PVM it's possible, but rarely used I think.

-- Reuti



  george.

On Oct 24, 2007, at 12:06 PM, Dean Dauger, Ph. D. wrote:


Hello,

I'd like to run Open MPI "by hand".  I have a few ordinary
workstations I'd like to run a code using Open MPI on.  They're in
the same LAN, have unique IP addresses and hostnames, and I've
installed the default Open MPI package, and I've compiled an MPI app
against the Open MPI libraries and copied the executable to each
machine, but let's assume these machines do not have BProc, Torque,
PBS, SLURM, rsh or ssh access to each other, or NFS.  I'm looking at
the shell of each node: what do I type in to make Open MPI go?

If it matters, they're OS X Macs. I am welcome to be enlightened if
I've missed the documentation for this scenario.

Thanks,
 Dean

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] orterun "by hand"

2007-10-24 Thread George Bosilca

Dean,

There is no way to run Open MPI by hand, or at least not simple way.  
How about xgrid on your OS X cluster ? Anyway, without a way to start  
processes remotely it is really difficult to start up any kind of  
parallel job.


  george.

On Oct 24, 2007, at 12:06 PM, Dean Dauger, Ph. D. wrote:


Hello,

I'd like to run Open MPI "by hand".  I have a few ordinary
workstations I'd like to run a code using Open MPI on.  They're in
the same LAN, have unique IP addresses and hostnames, and I've
installed the default Open MPI package, and I've compiled an MPI app
against the Open MPI libraries and copied the executable to each
machine, but let's assume these machines do not have BProc, Torque,
PBS, SLURM, rsh or ssh access to each other, or NFS.  I'm looking at
the shell of each node: what do I type in to make Open MPI go?

If it matters, they're OS X Macs. I am welcome to be enlightened if
I've missed the documentation for this scenario.

Thanks,
 Dean

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] orterun "by hand"

2007-10-24 Thread Gurhan
On 10/24/07, Dean Dauger, Ph. D.  wrote:
> Hello,
>
> I'd like to run Open MPI "by hand".  I have a few ordinary
> workstations I'd like to run a code using Open MPI on.  They're in
> the same LAN, have unique IP addresses and hostnames, and I've
> installed the default Open MPI package, and I've compiled an MPI app
> against the Open MPI libraries and copied the executable to each
> machine, but let's assume these machines do not have BProc, Torque,
> PBS, SLURM, rsh or ssh access to each other, or NFS.  I'm looking at
> the shell of each node: what do I type in to make Open MPI go?
>

   If I understand your question correctly, you need:

   mpirun  /path/to/executable  (depending on the program you may have
to give -np N argument where N is the number of instances you'd like
to run)

 and also read:
   http://www.open-mpi.org/faq/?category=running

  Hope this helps.
  Thanks.
  Gurhan

> If it matters, they're OS X Macs. I am welcome to be enlightened if
> I've missed the documentation for this scenario.
>
> Thanks,
>  Dean
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] orterun "by hand"

2007-10-24 Thread Dean Dauger, Ph. D.

Hello,

I'd like to run Open MPI "by hand".  I have a few ordinary  
workstations I'd like to run a code using Open MPI on.  They're in  
the same LAN, have unique IP addresses and hostnames, and I've  
installed the default Open MPI package, and I've compiled an MPI app  
against the Open MPI libraries and copied the executable to each  
machine, but let's assume these machines do not have BProc, Torque,  
PBS, SLURM, rsh or ssh access to each other, or NFS.  I'm looking at  
the shell of each node: what do I type in to make Open MPI go?


If it matters, they're OS X Macs. I am welcome to be enlightened if  
I've missed the documentation for this scenario.


Thanks,
Dean



[OMPI users] Cannot suppress openib error message

2007-10-24 Thread Dirk Eddelbuettel

I've been scratching my head over this:

lnx01:/usr/lib> orterun -n 2  --mca btl ^openib  ~/c++/tests/mpitest
[lnx01:14417] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
[lnx01:14418] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
Hello world, I'm process 0
Hello world, I'm process 1
lnx01:/usr/lib> grep openib /etc/openmpi/openmpi-mca-params.conf
#   btl = ^openib
btl = ^openib
lnx01:/usr/lib> orterun -n 2   ~/c++/tests/mpitest
[lnx01:14429] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
[lnx01:14430] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
Hello world, I'm process 0
Hello world, I'm process 1

and when I strace it, I get

uname({sys="Linux", node="lnx01", ...}) = 0
open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf820698) = -1 ENOTTY (Inappropriate 
ioctl for device)
fstat64(3, {st_mode=S_IFREG|0644, st_size=2877, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f72000
read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2877
read(3, "", 4096)   = 0
read(3, "", 8192)   = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8205f8) = -1 ENOTTY (Inappropriate 
ioctl for device)
close(3)= 0
munmap(0xb7f72000, 4096)= 0

Why can't I suppress the dreaded Infinityband message?

System is Ubuntu 7.04 with 'ported' (ie locally recompiled) current Open MPI 
packages
from Debian. 

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Bug in common_mx.c (1.2.5a0r16522)

2007-10-24 Thread George Bosilca
You're absolutely right. Thanks for the patch, I applied it on the  
trunk (revision 16560).


  Thanks,
george.

On Oct 24, 2007, at 8:17 AM, Åke Sandgren wrote:


On Wed, 2007-10-24 at 09:00 +0200, Åke Sandgren wrote:

Hi!

In common_mx.c the following looks wrong.
ompi_common_mx_finalize(void)
{
mx_return_t mx_return;
ompi_common_mx_initialize_ref_cnt--;
if(ompi_common_mx_initialize == 0) {

That should be
if(ompi_common_mx_initialize_ref_cnt == 0)
right?



And there was a missing return too.
Complete ompi_common_mx_finalize should be
int
ompi_common_mx_finalize(void)
{
mx_return_t mx_return;
ompi_common_mx_initialize_ref_cnt--;
if(ompi_common_mx_initialize_ref_cnt == 0) {
mx_return = mx_finalize();
if(mx_return != MX_SUCCESS){
opal_output(0, "Error in mx_finalize (error %s)\n",
mx_strerror(mx_return));
return OMPI_ERROR;
}
}
return OMPI_SUCCESS;
}


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Bug in common_mx.c (1.2.5a0r16522)

2007-10-24 Thread Åke Sandgren
On Wed, 2007-10-24 at 09:00 +0200, Åke Sandgren wrote:
> Hi!
> 
> In common_mx.c the following looks wrong.
> ompi_common_mx_finalize(void)
> {
> mx_return_t mx_return;
> ompi_common_mx_initialize_ref_cnt--;
> if(ompi_common_mx_initialize == 0) {
> 
> That should be
> if(ompi_common_mx_initialize_ref_cnt == 0)
> right?
> 

And there was a missing return too.
Complete ompi_common_mx_finalize should be
int
ompi_common_mx_finalize(void)
{
mx_return_t mx_return;
ompi_common_mx_initialize_ref_cnt--;
if(ompi_common_mx_initialize_ref_cnt == 0) {
mx_return = mx_finalize();
if(mx_return != MX_SUCCESS){
opal_output(0, "Error in mx_finalize (error %s)\n",
mx_strerror(mx_return));
return OMPI_ERROR;
}
}
return OMPI_SUCCESS;
}




Re: [OMPI users] Parallel Genetic Algorithms - Open MPI Implementation

2007-10-24 Thread Amit Kumar Saha
Hi Dirk,

On 10/24/07, Dirk Eddelbuettel  wrote:
>
>
> On 24 October 2007 at 01:01, Amit Kumar Saha wrote:
> | Hello all!
> |
> | After some background research, I am soon going to start working on
> | "Parallel Genetic Algorithms". When I reach the point of practical
> | implementation, I am going to use Open MPI for the purpose.
> |
> | Has anyone here worked on similar things? It would be nice if you could
> | share some views/comments.
>
> Yes.  PGAPACK, developend in the mid-1990s by David Levine while at
> Argonne,
> works perfectly well in parallel under various MPI implementations.
>
> I have been in contact with David and Argonne to coordinate a re-release
> under a newer license [1], but we're not quite there yet, and I have
> been the one holding this up. Hopefully more news 'soon' but I've been
> mumbling that all summer while I kept busy...
>
> You may want to look at PGAPACK and study it for possible extensions and
> refactorings, rather than to start again from scratch.


Had come across PGAPack some time back, did not spend much time with it
though. But after I am through with some of the theoretical aspects of both
Genetic algorithms, parallel genetic algorithms. I shall definitely start
off with PGAPack

By the way, if time permits could you kindly point me to some relevant
resources you may know of, though I shall turn to Google soon.

Will get back to you after I have started looking at PGAPack.

Thanks,
Amit
-- 
Amit Kumar Saha
*NetBeans Community Docs
Contribution Coordinator*
me blogs@ http://amitksaha.blogspot.com
URL:http://amitsaha.in.googlepages.com


[OMPI users] Bug in common_mx.c (1.2.5a0r16522)

2007-10-24 Thread Åke Sandgren
Hi!

In common_mx.c the following looks wrong.
ompi_common_mx_finalize(void)
{
mx_return_t mx_return;
ompi_common_mx_initialize_ref_cnt--;
if(ompi_common_mx_initialize == 0) {

That should be
if(ompi_common_mx_initialize_ref_cnt == 0)
right?

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



[OMPI users] OpenMPI 1.2.4 vs 1.2

2007-10-24 Thread Neeraj Chourasia
Hello Guys, I had openmpi v1.2 installed on my cluster. 
Couple of days back, i thought to upgrade it to v1.2.4(latest release i 
suppose). Since i didnt want to take risk, i first installed it on temporary 
location and did bandwidth and bidirectional bandwidth test provided by the OSU 
guys, and to my surprise, old version performs better in both scenarios.Could 
anyone give me the reason for the same?I repeated the above point to 
point tests between all set of nodes, but the result were same :(-Neeraj