Re: [OMPI users] ssh MPi and program tests

2009-04-08 Thread Francesco Pietra
With amd64 etch, intel compilers 10, and openmpi 1.2.6 I had no
problem in compiling amber10.

Having changed to amd64 lenny, amber10 did no more pass installation
tets. I was unable to recompile amber10.

I upgraded openmpi 10 1.3.1, which passed all its tests, but again was
unable to recompile amber10.

Now I have purged everything about intel (compilers and mkl) and I am
installing version 11. I'll recompile openmpi 1.3.1 on these. If
amber10 still refuses to compile, i'll abandon intel for gnu compilers
 and math libraries.

I'll come to your questions. There was no misprint in what I wrote but
at the moment i am unable to do better. All issues about ssh were
resolved. Actually, there was no issue. I created the issues and
apologize for that.

thanks
francesco

On Wed, Apr 8, 2009 at 10:28 AM, Marco  wrote:
> * Francesco Pietra  [2009 04 06, 16:51]:
>> cd cytosine && ./Run.cytosine
>> The authenticity of host deb64 (which is the hostname) (127.0.1.1)
>> can't be established.
>> RSA fingerprint .
>> connecting ?
>
>  This is a warning from ssh, not from OpenMPI; probably it is the first
> time the system tries to connect to itself, and is asking you a
> confirmation to continue.
>
>  Please note that 127.0.1.1 seems quite strange to me, since the
> 'standard' ip for localhost is '127.0.0.1'. You may want to check your
> /etc/hosts .
>
>> I stopped the ssh daemon, whereby tests were interrupted because deb64
>> (i.e., itself) could no more be accessed.
>
>  I'm afraid it wasn't a great idea... the ssh daemon is required to
> receive connections to localhost; and since mpi wants to do just that,
> stopping sshd won't really fix the issue ;)
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] ssh MPi and program tests

2009-04-08 Thread Marco
* Francesco Pietra  [2009 04 06, 16:51]:
> cd cytosine && ./Run.cytosine
> The authenticity of host deb64 (which is the hostname) (127.0.1.1)
> can't be established.
> RSA fingerprint .
> connecting ?

 This is a warning from ssh, not from OpenMPI; probably it is the first
time the system tries to connect to itself, and is asking you a
confirmation to continue.

 Please note that 127.0.1.1 seems quite strange to me, since the
'standard' ip for localhost is '127.0.0.1'. You may want to check your
/etc/hosts .

> I stopped the ssh daemon, whereby tests were interrupted because deb64
> (i.e., itself) could no more be accessed. 

 I'm afraid it wasn't a great idea... the ssh daemon is required to
receive connections to localhost; and since mpi wants to do just that,
stopping sshd won't really fix the issue ;)



Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Francesco Pietra
Hi Jody:
I should only blame myself. Gustavo's indications were clear. Still, I
misunderstood them.

Since I am testing on one node (where everything is there)

mpirun -host deb64 -n 4 connectivity_c

Connectivity test on 4 processes PASSED

thanks
francesco

On Tue, Apr 7, 2009 at 12:27 PM, jody  wrote:
> Hi
>
> What are the options "-deb64" and "-1" you are passing to mpirun:
>> /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee 
>> n=1.connectivity.out
>
> I don't think these are legal options for mpirun (at least they don't
> show up in `man mpirun`).
> And i think you should add a "-n 4" (for 4 processors)
> Furthermore, if you want to specify a host, you have to add "-host hostname1"
> if you want to specify several hosts you have to add "-host
> hostname1,hostname2,hostname3"  (no spaces around the commas)
>
> Jody
>
>
> On Tue, Apr 7, 2009 at 11:39 AM, Francesco Pietra  
> wrote:
>> Hi Gus:
>> I should have set clear at the beginning that on the Zyxel router
>> (connected to Internet by dynamic IP afforded by the provider)  there
>> are three computers. Their host names:
>>
>> deb32 (desktop debian i386)
>>
>> deb64 (multisocket debian amd 64 lenny)
>>
>> tya64 (multisocket debian amd 64 lenny)
>>
>> The three are ssh passwordless interconnected from the same user
>> (myself). I never established connections as root user because I have
>> direct access to all tree computers. So, if I slogin as user,
>> passwordless connection is established. If I try to slogin as root
>> user, it says that the authenticity of the host to which I intended to
>> connect can't be established, RSA key fingerprint .. Connect?
>>
>> Moreover, I appended to the pub keys know to deb64 those that deb64
>> had sent to either deb32 or tya64. Whereby, when i command.
>>
>> With certain programs (conceived for batch run), the execution on
>> deb64 is launched from deb32.
>>
>> ssh 192.168.#.## date (where the numbers stand for hostname)
>>
>>
>> I copied /examples to my deb64 home, chown to me, compiled as user and
>> run as user "connectivity".  (I have not compild in the openmpi
>> directory as this is to root user, while ssh has been adjusted for me
>> as user.
>>
>> Running as user in my home
>>
>> /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee 
>> n=1.connectivity.out
>>
>> it asked to add the host (himself) to the list on known hosts (on
>> repeating the command, that was no more asked). The unabridged output:
>>
>> ===
>> [deb64:03575] procdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0/0
>> [deb64:03575] jobdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0
>> [deb64:03575] top: openmpi-sessions-francesco@deb64_0
>> [deb64:03575] tmp: /tmp
>> [deb64:03575] mpirun: reset PATH:
>> /usr/local/bin:/usr/local/mcce/bin:/opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin:/home/francesco/gmmx06:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/amber10/exe:/usr/local/dock6/bin
>> [deb64:03575] mpirun: reset LD_LIBRARY_PATH:
>> /usr/local/lib:/opt/intel/mkl/10.0.1.014/lib/em64t:/opt/intel/cce/10.1.015/lib:/opt/intel/fce/10.1.015/lib:/usr/local/lib:/opt/acml4.1.0/gfortran64_mp_int64/lib
>> [deb64:03583] procdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0/1
>> [deb64:03583] jobdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0
>> [deb64:03583] top: openmpi-sessions-francesco@deb64_0
>> [deb64:03583] tmp: /tmp
>> [deb64:03575] [[38647,0],0] node[0].name deb64 daemon 0 arch ffc91200
>> [deb64:03575] [[38647,0],0] node[1].name deb64 daemon 1 arch ffc91200
>> [deb64:03583] [[38647,0],1] node[0].name deb64 daemon 0 arch ffc91200
>> [deb64:03583] [[38647,0],1] node[1].name deb64 daemon 1 arch ffc91200
>> --
>> mpirun was unable to launch the specified application as it could not
>> find an executable:
>>
>> Executable: -e
>> Node: deb64
>>
>> while attempting to start process rank 0.
>> --
>> [deb64:03575] sess_dir_finalize: job session dir not empty - leaving
>> [deb64:03575] sess_dir_finalize: proc session dir not empty - leaving
>> orterun: exiting with status -123
>> [deb64:03583] sess_dir_finalize: job session dir not empty - leaving
>> =
>>
>> I have changed the command, setting 4 for n and giving the full path
>> to the executable "connectivity_c" at no avail. I do not understand
>> the message "Executable: -e" in the out file and I feel myself stupid
>> enough in this circumstance.
>>
>> The ssh is working for slogin and ssh to deb 64 date gives the date
>> passwordless, both before and after the "connectivity" run. i.e.,
>> deb64 knew, and knows, itself.
>>
>> The output of ompi_info between xx should probably clarify
>> your other questions.
>>
>> xxx
>>                 Package: Open MPI root@deb64 Distribution
>>                Open MPI: 1.3.1
>>   

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread jody
Hi

What are the options "-deb64" and "-1" you are passing to mpirun:
> /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee n=1.connectivity.out

I don't think these are legal options for mpirun (at least they don't
show up in `man mpirun`).
And i think you should add a "-n 4" (for 4 processors)
Furthermore, if you want to specify a host, you have to add "-host hostname1"
if you want to specify several hosts you have to add "-host
hostname1,hostname2,hostname3"  (no spaces around the commas)

Jody


On Tue, Apr 7, 2009 at 11:39 AM, Francesco Pietra  wrote:
> Hi Gus:
> I should have set clear at the beginning that on the Zyxel router
> (connected to Internet by dynamic IP afforded by the provider)  there
> are three computers. Their host names:
>
> deb32 (desktop debian i386)
>
> deb64 (multisocket debian amd 64 lenny)
>
> tya64 (multisocket debian amd 64 lenny)
>
> The three are ssh passwordless interconnected from the same user
> (myself). I never established connections as root user because I have
> direct access to all tree computers. So, if I slogin as user,
> passwordless connection is established. If I try to slogin as root
> user, it says that the authenticity of the host to which I intended to
> connect can't be established, RSA key fingerprint .. Connect?
>
> Moreover, I appended to the pub keys know to deb64 those that deb64
> had sent to either deb32 or tya64. Whereby, when i command.
>
> With certain programs (conceived for batch run), the execution on
> deb64 is launched from deb32.
>
> ssh 192.168.#.## date (where the numbers stand for hostname)
>
>
> I copied /examples to my deb64 home, chown to me, compiled as user and
> run as user "connectivity".  (I have not compild in the openmpi
> directory as this is to root user, while ssh has been adjusted for me
> as user.
>
> Running as user in my home
>
> /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee n=1.connectivity.out
>
> it asked to add the host (himself) to the list on known hosts (on
> repeating the command, that was no more asked). The unabridged output:
>
> ===
> [deb64:03575] procdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0/0
> [deb64:03575] jobdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0
> [deb64:03575] top: openmpi-sessions-francesco@deb64_0
> [deb64:03575] tmp: /tmp
> [deb64:03575] mpirun: reset PATH:
> /usr/local/bin:/usr/local/mcce/bin:/opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin:/home/francesco/gmmx06:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/amber10/exe:/usr/local/dock6/bin
> [deb64:03575] mpirun: reset LD_LIBRARY_PATH:
> /usr/local/lib:/opt/intel/mkl/10.0.1.014/lib/em64t:/opt/intel/cce/10.1.015/lib:/opt/intel/fce/10.1.015/lib:/usr/local/lib:/opt/acml4.1.0/gfortran64_mp_int64/lib
> [deb64:03583] procdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0/1
> [deb64:03583] jobdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0
> [deb64:03583] top: openmpi-sessions-francesco@deb64_0
> [deb64:03583] tmp: /tmp
> [deb64:03575] [[38647,0],0] node[0].name deb64 daemon 0 arch ffc91200
> [deb64:03575] [[38647,0],0] node[1].name deb64 daemon 1 arch ffc91200
> [deb64:03583] [[38647,0],1] node[0].name deb64 daemon 0 arch ffc91200
> [deb64:03583] [[38647,0],1] node[1].name deb64 daemon 1 arch ffc91200
> --
> mpirun was unable to launch the specified application as it could not
> find an executable:
>
> Executable: -e
> Node: deb64
>
> while attempting to start process rank 0.
> --
> [deb64:03575] sess_dir_finalize: job session dir not empty - leaving
> [deb64:03575] sess_dir_finalize: proc session dir not empty - leaving
> orterun: exiting with status -123
> [deb64:03583] sess_dir_finalize: job session dir not empty - leaving
> =
>
> I have changed the command, setting 4 for n and giving the full path
> to the executable "connectivity_c" at no avail. I do not understand
> the message "Executable: -e" in the out file and I feel myself stupid
> enough in this circumstance.
>
> The ssh is working for slogin and ssh to deb 64 date gives the date
> passwordless, both before and after the "connectivity" run. i.e.,
> deb64 knew, and knows, itself.
>
> The output of ompi_info between xx should probably clarify
> your other questions.
>
> xxx
>                 Package: Open MPI root@deb64 Distribution
>                Open MPI: 1.3.1
>   Open MPI SVN revision: r20826
>   Open MPI release date: Mar 18, 2009
>                Open RTE: 1.3.1
>   Open RTE SVN revision: r20826
>   Open RTE release date: Mar 18, 2009
>                    OPAL: 1.3.1
>       OPAL SVN revision: r20826
>       OPAL release date: Mar 18, 2009
>            Ident string: 1.3.1
>                  Prefix: /usr/local
>  Configured architecture: x86_64-unknown-linux-gnu
>          Configure host: deb64
>           

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Terry Frankcombe
On Tue, 2009-04-07 at 11:39 +0200, Francesco Pietra wrote:
> Hi Gus:
> I should have set clear at the beginning that on the Zyxel router
> (connected to Internet by dynamic IP afforded by the provider)  there
> are three computers. Their host names:
> 
> deb32 (desktop debian i386)
> 
> deb64 (multisocket debian amd 64 lenny)
> 
> tya64 (multisocket debian amd 64 lenny)
> 
> The three are ssh passwordless interconnected from the same user
> (myself). I never established connections as root user because I have
> direct access to all tree computers. So, if I slogin as user,
> passwordless connection is established. If I try to slogin as root
> user, it says that the authenticity of the host to which I intended to
> connect can't be established, RSA key fingerprint .. Connect?
> 
> Moreover, I appended to the pub keys know to deb64 those that deb64
> had sent to either deb32 or tya64. Whereby, when i command.
> 
> With certain programs (conceived for batch run), the execution on
> deb64 is launched from deb32.
> 
> ssh 192.168.#.## date (where the numbers stand for hostname)
> 
> 
> I copied /examples to my deb64 home, chown to me, compiled as user and
> run as user "connectivity".  (I have not compild in the openmpi
> directory as this is to root user, while ssh has been adjusted for me
> as user.
> 
> Running as user in my home
> 
> /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee n=1.connectivity.out
> 
> it asked to add the host (himself) to the list on known hosts (on
> repeating the command, that was no more asked). The unabridged output:

The easiest setup is for the executable to be accessible on all nodes,
either copied or on a shared filesystem.  Is that the case here?

(I haven't read the whole thread, so apologies if this has already been
covered.)




Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Francesco Pietra
Hi Gus:
I should have set clear at the beginning that on the Zyxel router
(connected to Internet by dynamic IP afforded by the provider)  there
are three computers. Their host names:

deb32 (desktop debian i386)

deb64 (multisocket debian amd 64 lenny)

tya64 (multisocket debian amd 64 lenny)

The three are ssh passwordless interconnected from the same user
(myself). I never established connections as root user because I have
direct access to all tree computers. So, if I slogin as user,
passwordless connection is established. If I try to slogin as root
user, it says that the authenticity of the host to which I intended to
connect can't be established, RSA key fingerprint .. Connect?

Moreover, I appended to the pub keys know to deb64 those that deb64
had sent to either deb32 or tya64. Whereby, when i command.

With certain programs (conceived for batch run), the execution on
deb64 is launched from deb32.

ssh 192.168.#.## date (where the numbers stand for hostname)


I copied /examples to my deb64 home, chown to me, compiled as user and
run as user "connectivity".  (I have not compild in the openmpi
directory as this is to root user, while ssh has been adjusted for me
as user.

Running as user in my home

/usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee n=1.connectivity.out

it asked to add the host (himself) to the list on known hosts (on
repeating the command, that was no more asked). The unabridged output:

===
[deb64:03575] procdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0/0
[deb64:03575] jobdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0
[deb64:03575] top: openmpi-sessions-francesco@deb64_0
[deb64:03575] tmp: /tmp
[deb64:03575] mpirun: reset PATH:
/usr/local/bin:/usr/local/mcce/bin:/opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin:/home/francesco/gmmx06:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/amber10/exe:/usr/local/dock6/bin
[deb64:03575] mpirun: reset LD_LIBRARY_PATH:
/usr/local/lib:/opt/intel/mkl/10.0.1.014/lib/em64t:/opt/intel/cce/10.1.015/lib:/opt/intel/fce/10.1.015/lib:/usr/local/lib:/opt/acml4.1.0/gfortran64_mp_int64/lib
[deb64:03583] procdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0/1
[deb64:03583] jobdir: /tmp/openmpi-sessions-francesco@deb64_0/38647/0
[deb64:03583] top: openmpi-sessions-francesco@deb64_0
[deb64:03583] tmp: /tmp
[deb64:03575] [[38647,0],0] node[0].name deb64 daemon 0 arch ffc91200
[deb64:03575] [[38647,0],0] node[1].name deb64 daemon 1 arch ffc91200
[deb64:03583] [[38647,0],1] node[0].name deb64 daemon 0 arch ffc91200
[deb64:03583] [[38647,0],1] node[1].name deb64 daemon 1 arch ffc91200
--
mpirun was unable to launch the specified application as it could not
find an executable:

Executable: -e
Node: deb64

while attempting to start process rank 0.
--
[deb64:03575] sess_dir_finalize: job session dir not empty - leaving
[deb64:03575] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status -123
[deb64:03583] sess_dir_finalize: job session dir not empty - leaving
=

I have changed the command, setting 4 for n and giving the full path
to the executable "connectivity_c" at no avail. I do not understand
the message "Executable: -e" in the out file and I feel myself stupid
enough in this circumstance.

The ssh is working for slogin and ssh to deb 64 date gives the date
passwordless, both before and after the "connectivity" run. i.e.,
deb64 knew, and knows, itself.

The output of ompi_info between xx should probably clarify
your other questions.

xxx
 Package: Open MPI root@deb64 Distribution
Open MPI: 1.3.1
   Open MPI SVN revision: r20826
   Open MPI release date: Mar 18, 2009
Open RTE: 1.3.1
   Open RTE SVN revision: r20826
   Open RTE release date: Mar 18, 2009
OPAL: 1.3.1
   OPAL SVN revision: r20826
   OPAL release date: Mar 18, 2009
Ident string: 1.3.1
  Prefix: /usr/local
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: deb64
   Configured by: root
   Configured on: Fri Apr  3 23:03:30 CEST 2009
  Configure host: deb64
Built by: root
Built on: Fri Apr  3 23:12:28 CEST 2009
  Built host: deb64
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: /opt/intel/fce/10.1.015/bin/ifort
  Fortran77 compiler abs:
  Fortran90 compiler: /opt/intel/fce/10.1.015/bin/ifort
  Fortran90 compiler abs:
 C profiling: yes
   C++ profiling: yes
  

Re: [OMPI users] ssh MPi and program tests

2009-04-06 Thread Francesco Pietra
Hi Gus:
Partial quick answers below. I have reestablished the ssh connection
so that tomorrow I'll run the tests. Everything that relates to
running amber is on the "parallel computer", where I have access to
everything.

On Mon, Apr 6, 2009 at 7:53 PM, Gus Correa  wrote:
> Hi Francesco, list
>
> Francesco Pietra wrote:
>>
>> On Mon, Apr 6, 2009 at 5:21 PM, Gus Correa  wrote:
>>>
>>> Hi Francesco
>>>
>>> Did you try to run examples/connectivity_c.c,
>>> or examples/hello_c.c before trying amber?
>>> They are in the directory where you untarred the OpenMPI tarball.
>>> It is easier to troubleshoot
>>> possible network and host problems
>>> with these simpler programs.
>>
>> I have found the "examples". Should they be compiled? how? This is my
>> only question here.
>
> cd examples/
> /full/path/to/openmpi/bin/mpicc -o connectivity_c connectivity_c.c
>
> Then run it with, say:
>
> /full/path/to/openmpi/bin/mpirun -host {whatever_hosts_you_want}
> -n {as_many_processes_you_want} connectivity_c
>
> Likewise for hello_c.c
>
>> What's below is info. Although amber parallel
>> would have not compiled with faulty openmpi, I'll run openmpi tests as
>> soon as I understand how.
>>
>>> Also, to avoid confusion,
>>> you may use a full path name to mpirun,
>>> in case you have other MPI flavors in your system.
>>> Often times the mpirun your path is pointing to is not what you
>>> may think it is.
>>
>>
>> which mpirun
>> /usr/local/bin/mpirun
>
> Did you install OpenMPI on /usr/local ?
> When you do "mpirun -help", do you see "mpirun (Open MPI) 1.3"?

mpirun -help
mpirun (Open MPI) 1.3.1
on the 1st line, then follow the options


> How about the output of "orte_info" ?
orte_info was not installed. See below what has been installed.


> Does it show your Intel compilers, etc?

I guess so, otherwise amber would have not been compiled, but I don't
know the commands to prove it. The intel compilers are on the path:
/opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin and the mkl
are sourced in .bashrc.

>
> I ask because many Linux distributions come with one or more flavors
> of MPI (OpenMPI, MPICH, LAM, etc), some compilers also do (PGI for
> instance), some tools (Intel MKL?) may also have their MPI,
> and you end up with a bunch of MPI commands
> on your path that may produce a big mixup.
> This is a pretty common problem that affect new users on this list,
> on the MPICH list, on clustering lists, etc.
> The errors messages often don't help find the source of the problem,
> and people spend a lot of time trying to troubleshoot network,
> etc, when is often just a path problem.
>
> So, this is why when you begin, you may want to use full path
> names, to avoid confusion.
> After the basic MPI functionality is working,
> then you can go and fix your path chain,
> and rely on your path chain.
>
>>
>> there is no other accessible MPI (one application, DOT2, has mpich but
>> it is a static compilation; DOT2 parallelizatuion requires thar the
>> computer knows itself, i.e." ssh hostname date" should afford the date
>> passwordless. The reported issues in testing amber have destroyed this
>> situation: now deb64 has port22 closed, evem to itself.
>>
>
> Have you tried to reboot the master node, to see if it comes back
> to the original ssh setup?
> You need ssh to be functional to run OpenMPI code,
> including the tests above.
>
>>
>>> I don't know if you want to run on amd64 alone (master node?)
>>> or on a cluster.
>>> In any case, you may use a list of hosts
>>> or a hostfile on the mpirun command line,
>>> to specify where you want to run.
>>
>> With amber I use the parallel computer directly and the amber
>> installation is chown to me. The ssh connection, in this case, only
>> serves to get file from. or send files to, my desktop.
>>
>
> It is unclear to me what you mean by "the parallel computer directly".
> Can you explain better which computers are in this game?
> Your desktop and a cluster perhaps?
> Are they both Debian 64 Linux?
> Where do you compile the programs?
> Where do you want to run the programs?
>
>> In my .bashrc:
>>
>> (for amber)
>> MPI_HOME=/usr/local
>> export MPI_HOME
>>
>> (for openmpi)
>> if [ "$LD_LIBRARY_PATH" ] ; then
>>  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH'/usr/local/lib"
>> else
>>  export LD_LIBRARY_PATH="/usr/local/lib"
>> fi
>>
>
> Is this on your desktop or on the "parallel computer"?


On both "parallel computers" (there is my desktop, ssh to two uma-type
dual-opteron "parallel computers". Only one was active when the "test"
problems arose. While the (ten years old) destop is i386, both other
machines are amd64, i.e., all debian lenny. I prepare the input files
on the i386 and use it also as storage for backups. The "parallel
computer" has only the X server and a minimal window for a
two-dimensional graphics of amber. The other parallel computer has a
GeForce 6600 card with GLSL support, which I use to elaborate
graphically the 

Re: [OMPI users] ssh MPi and program tests

2009-04-06 Thread Gus Correa

Hi Francesco, list

Francesco Pietra wrote:

On Mon, Apr 6, 2009 at 5:21 PM, Gus Correa  wrote:

Hi Francesco

Did you try to run examples/connectivity_c.c,
or examples/hello_c.c before trying amber?
They are in the directory where you untarred the OpenMPI tarball.
It is easier to troubleshoot
possible network and host problems
with these simpler programs.


I have found the "examples". Should they be compiled? how? This is my
only question here. 


cd examples/
/full/path/to/openmpi/bin/mpicc -o connectivity_c connectivity_c.c

Then run it with, say:

/full/path/to/openmpi/bin/mpirun -host {whatever_hosts_you_want}
-n {as_many_processes_you_want} connectivity_c

Likewise for hello_c.c


What's below is info. Although amber parallel
would have not compiled with faulty openmpi, I'll run openmpi tests as
soon as I understand how.


Also, to avoid confusion,
you may use a full path name to mpirun,
in case you have other MPI flavors in your system.
Often times the mpirun your path is pointing to is not what you
may think it is.



which mpirun
/usr/local/bin/mpirun


Did you install OpenMPI on /usr/local ?
When you do "mpirun -help", do you see "mpirun (Open MPI) 1.3"?
How about the output of "orte_info" ?
Does it show your Intel compilers, etc?

I ask because many Linux distributions come with one or more flavors
of MPI (OpenMPI, MPICH, LAM, etc), some compilers also do (PGI for 
instance), some tools (Intel MKL?) may also have their MPI,

and you end up with a bunch of MPI commands
on your path that may produce a big mixup.
This is a pretty common problem that affect new users on this list,
on the MPICH list, on clustering lists, etc.
The errors messages often don't help find the source of the problem,
and people spend a lot of time trying to troubleshoot network,
etc, when is often just a path problem.

So, this is why when you begin, you may want to use full path
names, to avoid confusion.
After the basic MPI functionality is working,
then you can go and fix your path chain,
and rely on your path chain.



there is no other accessible MPI (one application, DOT2, has mpich but
it is a static compilation; DOT2 parallelizatuion requires thar the
computer knows itself, i.e." ssh hostname date" should afford the date
passwordless. The reported issues in testing amber have destroyed this
situation: now deb64 has port22 closed, evem to itself.



Have you tried to reboot the master node, to see if it comes back
to the original ssh setup?
You need ssh to be functional to run OpenMPI code,
including the tests above.




I don't know if you want to run on amd64 alone (master node?)
or on a cluster.
In any case, you may use a list of hosts
or a hostfile on the mpirun command line,
to specify where you want to run.


With amber I use the parallel computer directly and the amber
installation is chown to me. The ssh connection, in this case, only
serves to get file from. or send files to, my desktop.



It is unclear to me what you mean by "the parallel computer directly".
Can you explain better which computers are in this game?
Your desktop and a cluster perhaps?
Are they both Debian 64 Linux?
Where do you compile the programs?
Where do you want to run the programs?


In my .bashrc:

(for amber)
MPI_HOME=/usr/local
export MPI_HOME

(for openmpi)
if [ "$LD_LIBRARY_PATH" ] ; then
  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH'/usr/local/lib"
else
  export LD_LIBRARY_PATH="/usr/local/lib"
fi



Is this on your desktop or on the "parallel computer"?



There is also

MPICH_HOME=/usr/local
export MPICH_HOME

this is for DOCK, which, with this env variabl, accepts openmpi (at
lest it was so with v 1.2.6)



Oh, well, it looks like there is MPICH already installed on /usr/local.
So, this may be part of the confusion, the path confusion I referred to.

I would suggest installing OpenMPI on a different directory,
using the --prefix option of the OpenMPI configure script.
Do configure --help for details about all configuration options.



the intel compilers (compiled ifort and icc, are sourced in both my
.bashrc and root home .bashrc.

Thanks and apologies for my low level in these affairs. It is the
first time I am faced by such problems, with amd64, same intel
compilers, and openmpi 1.2.6 everything was in order.



To me it doesn't look like the problem is related to the new version
of OpenMPI.

Try the test programs with full path names first.
It may not solve the problem, but it may clarify things a bit.

Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


francesco




Do "/full/path/to/openmpi/bin/mpirun --help" for details.

I am not familiar to amber, but how does it find your openmpi
libraries and compiler wrappers?
Don't you need to give it the paths during configuration,
say,
/configure_amber 

Re: [OMPI users] ssh MPi and program tests

2009-04-06 Thread Francesco Pietra
On Mon, Apr 6, 2009 at 5:21 PM, Gus Correa  wrote:
> Hi Francesco
>
> Did you try to run examples/connectivity_c.c,
> or examples/hello_c.c before trying amber?
> They are in the directory where you untarred the OpenMPI tarball.
> It is easier to troubleshoot
> possible network and host problems
> with these simpler programs.

I have found the "examples". Should they be compiled? how? This is my
only question here. What's below is info. Although amber parallel
would have not compiled with faulty openmpi, I'll run openmpi tests as
soon as I understand how.

>
> Also, to avoid confusion,
> you may use a full path name to mpirun,
> in case you have other MPI flavors in your system.
> Often times the mpirun your path is pointing to is not what you
> may think it is.


which mpirun
/usr/local/bin/mpirun

there is no other accessible MPI (one application, DOT2, has mpich but
it is a static compilation; DOT2 parallelizatuion requires thar the
computer knows itself, i.e." ssh hostname date" should afford the date
passwordless. The reported issues in testing amber have destroyed this
situation: now deb64 has port22 closed, evem to itself.


>
> I don't know if you want to run on amd64 alone (master node?)
> or on a cluster.
> In any case, you may use a list of hosts
> or a hostfile on the mpirun command line,
> to specify where you want to run.

With amber I use the parallel computer directly and the amber
installation is chown to me. The ssh connection, in this case, only
serves to get file from. or send files to, my desktop.

In my .bashrc:

(for amber)
MPI_HOME=/usr/local
export MPI_HOME

(for openmpi)
if [ "$LD_LIBRARY_PATH" ] ; then
  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH'/usr/local/lib"
else
  export LD_LIBRARY_PATH="/usr/local/lib"
fi


There is also

MPICH_HOME=/usr/local
export MPICH_HOME

this is for DOCK, which, with this env variabl, accepts openmpi (at
lest it was so with v 1.2.6)

the intel compilers (compiled ifort and icc, are sourced in both my
.bashrc and root home .bashrc.

Thanks and apologies for my low level in these affairs. It is the
first time I am faced by such problems, with amd64, same intel
compilers, and openmpi 1.2.6 everything was in order.

francesco



>
> Do "/full/path/to/openmpi/bin/mpirun --help" for details.
>
> I am not familiar to amber, but how does it find your openmpi
> libraries and compiler wrappers?
> Don't you need to give it the paths during configuration,
> say,
> /configure_amber -openmpi=/full/path/to/openmpi
> or similar?
>
> I hope this helps.
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
>
>
> Francesco Pietra wrote:
>>
>> I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
>> (10.1.015) and libnuma. Tests passed:
>>
>> ompi_info | grep libnuma
>>  MCA affinity: libnuma (MCA v 2.0, API 2.0)
>>
>> ompi_info | grep maffinity
>>  MCA affinity: first use (MCA as above)
>>  MCA affinity: libnuma as above.
>>
>> Then, I have compiled parallel a molecular dynamics package, amber10,
>> without error signals but I am having problems in testing the amber
>> parallel installation.
>>
>> amber10 configure was set as:
>>
>> ./configure_amber -openmpi -nobintray ifort
>>
>> just as I used before with openmpi 1.2.6. Could you say if the
>> -openmpi should be changed?
>>
>> cd tests
>>
>> export DO_PARALLEL='mpirun -np 4'
>>
>> make test.parallel.MM  < /dev/null
>>
>> cd cytosine && ./Run.cytosine
>> The authenticity of host deb64 (which is the hostname) (127.0.1.1)
>> can't be established.
>> RSA fingerprint .
>> connecting ?
>>
>> I stopped the ssh daemon, whereby tests were interrupted because deb64
>> (i.e., itself) could no more be accessed. Further attempts under these
>> conditions failed for the same reason. Now, sshing to deb64 is no more
>> possible: port 22 closed. In contrast, sshing from deb64 to other
>> computers occurs passwordless. No such problems arose at the time of
>> amd64 etch with the same
>> configuration of ssh, same compilers, and openmpi 1.2.6.
>>
>> I am here because the warning from the amber site is that I should to
>> learn how to use my installation of MPI. Therefore, if there is any
>> clue ..
>>
>> thanks
>> francesco pietra
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] ssh MPi and program tests

2009-04-06 Thread Gus Correa

Hi Francesco

Did you try to run examples/connectivity_c.c,
or examples/hello_c.c before trying amber?
They are in the directory where you untarred the OpenMPI tarball.
It is easier to troubleshoot
possible network and host problems
with these simpler programs.

Also, to avoid confusion,
you may use a full path name to mpirun,
in case you have other MPI flavors in your system.
Often times the mpirun your path is pointing to is not what you
may think it is.

I don't know if you want to run on amd64 alone (master node?)
or on a cluster.
In any case, you may use a list of hosts
or a hostfile on the mpirun command line,
to specify where you want to run.

Do "/full/path/to/openmpi/bin/mpirun --help" for details.

I am not familiar to amber, but how does it find your openmpi
libraries and compiler wrappers?
Don't you need to give it the paths during configuration,
say,
/configure_amber -openmpi=/full/path/to/openmpi
or similar?

I hope this helps.
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Francesco Pietra wrote:

I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
(10.1.015) and libnuma. Tests passed:

ompi_info | grep libnuma
 MCA affinity: libnuma (MCA v 2.0, API 2.0)

ompi_info | grep maffinity
 MCA affinity: first use (MCA as above)
 MCA affinity: libnuma as above.

Then, I have compiled parallel a molecular dynamics package, amber10,
without error signals but I am having problems in testing the amber
parallel installation.

amber10 configure was set as:

./configure_amber -openmpi -nobintray ifort

just as I used before with openmpi 1.2.6. Could you say if the
-openmpi should be changed?

cd tests

export DO_PARALLEL='mpirun -np 4'

make test.parallel.MM  < /dev/null

cd cytosine && ./Run.cytosine
The authenticity of host deb64 (which is the hostname) (127.0.1.1)
can't be established.
RSA fingerprint .
connecting ?

I stopped the ssh daemon, whereby tests were interrupted because deb64
(i.e., itself) could no more be accessed. Further attempts under these
conditions failed for the same reason. Now, sshing to deb64 is no more
possible: port 22 closed. In contrast, sshing from deb64 to other
computers occurs passwordless. No such problems arose at the time of
amd64 etch with the same
configuration of ssh, same compilers, and openmpi 1.2.6.

I am here because the warning from the amber site is that I should to
learn how to use my installation of MPI. Therefore, if there is any
clue ..

thanks
francesco pietra
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] ssh MPi and program tests

2009-04-06 Thread Ralph Castain
You might first try and see if you can run something other than amber  
with your new installation. Make sure you have the PATH and  
LD_LIBRARY_PATH set correctly on the remote node, or add --prefix to  
your mpirun cmd line.


Also, did you remember to install the OMPI 1.3 libraries on the remote  
nodes?


One thing I see below is that host deb64 was resolved to the loopback  
interface - was that correct? Seems unusual - even if you are on that  
host, it usually would resolve to some public IP address.



On Apr 6, 2009, at 8:51 AM, Francesco Pietra wrote:


I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
(10.1.015) and libnuma. Tests passed:

ompi_info | grep libnuma
MCA affinity: libnuma (MCA v 2.0, API 2.0)

ompi_info | grep maffinity
MCA affinity: first use (MCA as above)
MCA affinity: libnuma as above.

Then, I have compiled parallel a molecular dynamics package, amber10,
without error signals but I am having problems in testing the amber
parallel installation.

amber10 configure was set as:

./configure_amber -openmpi -nobintray ifort

just as I used before with openmpi 1.2.6. Could you say if the
-openmpi should be changed?

cd tests

export DO_PARALLEL='mpirun -np 4'

make test.parallel.MM  < /dev/null

cd cytosine && ./Run.cytosine
The authenticity of host deb64 (which is the hostname) (127.0.1.1)
can't be established.
RSA fingerprint .
connecting ?

I stopped the ssh daemon, whereby tests were interrupted because deb64
(i.e., itself) could no more be accessed. Further attempts under these
conditions failed for the same reason. Now, sshing to deb64 is no more
possible: port 22 closed. In contrast, sshing from deb64 to other
computers occurs passwordless. No such problems arose at the time of
amd64 etch with the same
configuration of ssh, same compilers, and openmpi 1.2.6.

I am here because the warning from the amber site is that I should to
learn how to use my installation of MPI. Therefore, if there is any
clue ..

thanks
francesco pietra
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] ssh MPi and program tests

2009-04-06 Thread Francesco Pietra
I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
(10.1.015) and libnuma. Tests passed:

ompi_info | grep libnuma
 MCA affinity: libnuma (MCA v 2.0, API 2.0)

ompi_info | grep maffinity
 MCA affinity: first use (MCA as above)
 MCA affinity: libnuma as above.

Then, I have compiled parallel a molecular dynamics package, amber10,
without error signals but I am having problems in testing the amber
parallel installation.

amber10 configure was set as:

./configure_amber -openmpi -nobintray ifort

just as I used before with openmpi 1.2.6. Could you say if the
-openmpi should be changed?

cd tests

export DO_PARALLEL='mpirun -np 4'

make test.parallel.MM  < /dev/null

cd cytosine && ./Run.cytosine
The authenticity of host deb64 (which is the hostname) (127.0.1.1)
can't be established.
RSA fingerprint .
connecting ?

I stopped the ssh daemon, whereby tests were interrupted because deb64
(i.e., itself) could no more be accessed. Further attempts under these
conditions failed for the same reason. Now, sshing to deb64 is no more
possible: port 22 closed. In contrast, sshing from deb64 to other
computers occurs passwordless. No such problems arose at the time of
amd64 etch with the same
configuration of ssh, same compilers, and openmpi 1.2.6.

I am here because the warning from the amber site is that I should to
learn how to use my installation of MPI. Therefore, if there is any
clue ..

thanks
francesco pietra