Re: [OMPI users] OMPI & uDAPL

2007-10-22 Thread Don Kerr

Couple of things.
With linux I believe you need the interface instance in the 7th field of 
the /etc/dat.conf file.

   example:

InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " "
should be
InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 "ib0 0 " " "


Also, I did see a problem when running with less than ofed 1.2 which I 
did not pursue because v1.2 worked. Last, it appears that you are 
running udapl 1.1, I have only ever run on 1.2 so I don't know what to 
expect.


-DON

Troy Telford wrote:

OK, I've got a system set up so that it can use uDAPL over IB (! OFED, ! 
Mellanox, though) on Linux.


Running simple dapl test programs (shamelessly pulled from the OFED tree) 
seems to verify that DAPL is in fact operating properly.


After searching through the mail archives, I found a small test code by Donald 
Kerr (dat_reg.c), and compiled an ran that successfully.  When run, it 
returns the name of the DAT name (ib0)


I've also been able to run programs using uDAPL with Intel MPI, for example.  
I'm fairly sure uDAPL is working.


However, when I attempt to run an MPI program over uDAPL (--mca btl 
udapl,sm,self), I receive the following error:


WARNING: Failed to open "ib0" 
[DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].

This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.

[0,1,0]: uDAPL on host n02 was unable to find any NICs.


I've also tried using --mca btl_udapl_if_include ib0, but that doesn't seem to 
have any effect.


Interestingly enough, when I don't specify a DAT provider, and I play with the 
name in /etc/dat.conf, Open MPI seems aware of the name change; it will 
list 'failed to open "newname"'



my /etc/dat.conf looks like this:
InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " "


Any ideas on why I'm not able to get Open MPI to use uDAPL?
 



Re: [OMPI users] Pascal Interface for OpenMPI

2007-10-22 Thread Lourival Mendes
Thanks a lot for your comments Jeff..

I will try some of your advices.. and further I will let you know... in the
mean time we can try at least to convince the old school of MPIers to
include the Pascal interface... :)

Best regards

Lourival

2007/10/22, Jeff Squyres :
>
> On Oct 22, 2007, at 6:44 PM, Lourival Mendes wrote:
>
> >Hy everybody, I'm interested in use the MPI on the Pascal
> > environment. I tryed the MPICH2 list but no success. On the Free
> > Pascal Compiler list, Daniël invited me to subscribe this list and
> > open a discussion on the interface of OpenMPI for Pascal.
> >
> >Probably as Daniël knows there is almost no reference on the MPI
> > for Pascal interface, only some very few tryes, one of them in
> > Russian.
> >
> >I would like to know if there is someone working on the interface
> > of OpenMPI for Pascal?
>
> There was a mail or two about it a while ago; you might want to dig
> through the OMPI list archives.  The short version is that none of
> the current Open MPI members have a desire to add Pascal bindings to
> MPI.  It also might be somewhat of an uphill battle to convince the
> old-school MPI'ers to include a Pascal interface in Open MPI, even if
> it was developed by a 3rd party and contributed to the project.
>
> However, that should not deter you from pursuing a Pascal interface
> if you want one.  Traditionally, extensions to MPI have been
> implemented in an MPI-neutral fashion and released into the wild as
> 3rd party libraries (such as the C++ bindings for MPI several years
> ago).  The Pascal bindings likely don't need to know anything about
> the internals of an MPI implementation -- they can just call the C
> bindings.  So it's possible/likely that you would write up a Pascal
> interface that would work with both Open MPI and MPICH (and any other
> MPI's out there).
>
> As I typed out the above, I have a dim recollection of the Pascal
> interface needing to obtain the values of the C constants during its
> setup/compilation phase (note that these values are going to be
> different between different MPI implementations).  You have a few
> options here; you could write a parser for mpi.h to extract the
> values you need (e.g., in perl or somesuch) or write a short C
> program to extract them and printf the values that you capture into a
> Pascal header file or something (I'm not sure if you need the literal
> or symbolic values -- I remember very little of Pascal).  Either way,
> with a little diligence and creativity, it could probably be done.
>
> >Also I'm putting a very newbie question: What is the main
> > difference between the OpenMPI and MPICH ?
>
> Short version: we're different projects implementing the same API
> standard.
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Lourival J. Mendes Neto


Re: [OMPI users] Recursive use of "orterun"

2007-10-22 Thread Tim Prins
Hi Ides,

Thanks for the report and reminder. I have filed a ticket on this 
(https://svn.open-mpi.org/trac/ompi/ticket/1173) and you should receive email 
as it is updated.

I do not know of any more elegant way to work around this at the moment.

Thanks,

Tim

On Friday 19 October 2007 06:31:53 am idesbald van den bosch wrote:
> Hi,
>
> I've run into the same problem as discussed in the thread Lev Gelb: "Re:
> [OMPI users] Recursive use of "orterun" (Ralph H
> Castain)"
>
> I am running a parallel python code, then from python I launch a C++
> parallel program using the python os.system command, then I come back in
> python and keep going.
>
> With LAM/MPI there is no problem with this.
>
> But Open-mpi systematically crashes, because the python os.system command
> launches the C++ program with the same OMPI_* environment variables as for
> the Python program. As discussed in the thread, I have tried filtering the
> OMPI_* variables prior to launching the C++ program with an
> os.execvecommand, but then it fails to return the hand to python and
> instead simply
> terminates when the C++ program ends.
>
> There is a workaround (
> http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/986): create a
> *.sh file with the following lines:
>
> 
> for i in $(env | grep OMPI_MCA |sed 's/=/ /' | awk '{print $1}')
> do
>unset $i
> done
>
> # now the C++ call
> mpirun -np 2  ./MoM/communicateMeshArrays
> --
>
> and then call the *.sh program through the python os.system command.
>
> What I would like to know is that if this "problem" will get fixed in
> open-MPI? Is there another way to elegantly solve this issue? Meanwhile, I
> will stick to the ugly *.sh hack listed above.
>
> Cheers
>
> Ides




Re: [OMPI users] Syntax error in remote rsh execution

2007-10-22 Thread Tim Prins
Sorry to reply to my own mail. 

Just browsing through the logs you sent, and I see that 'hostname' should be 
working fine. However, you are using v1.1.5 which is very old. I would 
strongly suggest upgrading to v1.2.4. It is a huge improvement over the old 
v1.1 series (which is not being maintained anymore).

Tim

On Monday 22 October 2007 08:41:30 pm Tim Prins wrote:
> Hi Jorge,
>
> This is interesting. The problem is the universe name:
> root@(none):default-universe
>
> The "(none)" part is supposed to be the hostname where mpirun is executed.
> Try running:
> hostname
>
> and:
> uname -n
>
> These should both return valid hostnames for your machine.
>
> Open MPI pretty much assumes that all nodes have a valid (preferably
> unique) hostname. If the above commands don't work, you probably need to
> fix your cluster.
>
> Let me know if this does not work.
>
> Thanks,
>
> Tim
>
> On Thursday 18 October 2007 09:22:09 pm Jorge Parra wrote:
> > Hi,
> >
> > When trying to execute an application that spawns to another node, I
> > obtain the following message:
> >
> > # ./mpirun --hostfile /root/hostfile -np 2 greetings
> > Syntax error: "(" unexpected (expecting ")")
> > -
> >- Could not execute the executable
> > "/opt/OpenMPI/OpenMPI-1.1.5b/exec/bin/greetings
> > ": Exec format error
> >
> > This could mean that your PATH or executable name is wrong, or that you
> > do not
> > have the necessary permissions.  Please ensure that the executable is
> > able to be
> >
> > found and executed.
> > -
> >-
> >
> > and in the remote node:
> >
> > # pam_rhosts_auth[183]: user root has a `+' user entry
> > pam_rhosts_auth[183]: allowed to root@192.168.1.102 as root
> > PAM_unix[183]: (rsh) session opened for user root by (uid=0)
> > in.rshd[184]: root@192.168.1.102 as root: cmd='( ! [ -e ./.profile ] || .
> > ./.pro
> > file; orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0
> > --nodename 1
> > 92.168.1.103 --universe root@(none):default-universe --nsreplica
> > "0.0.0;tcp://19
> > 2.168.1.102:32774" --gprreplica "0.0.0;tcp://192.168.1.102:32774"
> > --mpi-call-yie
> > ld 0 )'
> > PAM_unix[183]: (rsh) session closed for user root
> >
> > I suspect the command that rsh is trying to execute in the remote node
> > fails. It seems to me that the first parenthesis in cmd='( ! is not well
> > interpreted, thus causing the syntax error. This might prevent .profile
> > to run and to correctly set PATH. Therefore, "greetings" is not found.
> >
> > I am attaching to this email the appropiate configuration files of my
> > system and openmpi on it. This is a system in an isolated network, so I
> > don't care too much for security. Therefore I am using rsh on it.
> >
> > I would really appreciate any suggestions to correct this problem.
> >
> > Thank you,
> >
> > Jorge
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Syntax error in remote rsh execution

2007-10-22 Thread Tim Prins
Hi Jorge,

This is interesting. The problem is the universe name:
root@(none):default-universe

The "(none)" part is supposed to be the hostname where mpirun is executed. Try 
running:
hostname

and:
uname -n

These should both return valid hostnames for your machine.

Open MPI pretty much assumes that all nodes have a valid (preferably unique) 
hostname. If the above commands don't work, you probably need to fix your 
cluster.

Let me know if this does not work.

Thanks,

Tim

On Thursday 18 October 2007 09:22:09 pm Jorge Parra wrote:
> Hi,
>
> When trying to execute an application that spawns to another node, I
> obtain the following message:
>
> # ./mpirun --hostfile /root/hostfile -np 2 greetings
> Syntax error: "(" unexpected (expecting ")")
> --
> Could not execute the executable
> "/opt/OpenMPI/OpenMPI-1.1.5b/exec/bin/greetings
> ": Exec format error
>
> This could mean that your PATH or executable name is wrong, or that you do
> not
> have the necessary permissions.  Please ensure that the executable is able
> to be
>
> found and executed.
> --
>
> and in the remote node:
>
> # pam_rhosts_auth[183]: user root has a `+' user entry
> pam_rhosts_auth[183]: allowed to root@192.168.1.102 as root
> PAM_unix[183]: (rsh) session opened for user root by (uid=0)
> in.rshd[184]: root@192.168.1.102 as root: cmd='( ! [ -e ./.profile ] || .
> ./.pro
> file; orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0
> --nodename 1
> 92.168.1.103 --universe root@(none):default-universe --nsreplica
> "0.0.0;tcp://19
> 2.168.1.102:32774" --gprreplica "0.0.0;tcp://192.168.1.102:32774"
> --mpi-call-yie
> ld 0 )'
> PAM_unix[183]: (rsh) session closed for user root
>
> I suspect the command that rsh is trying to execute in the remote node
> fails. It seems to me that the first parenthesis in cmd='( ! is not well
> interpreted, thus causing the syntax error. This might prevent .profile to
> run and to correctly set PATH. Therefore, "greetings" is not found.
>
> I am attaching to this email the appropiate configuration files of my
> system and openmpi on it. This is a system in an isolated network, so I
> don't care too much for security. Therefore I am using rsh on it.
>
> I would really appreciate any suggestions to correct this problem.
>
> Thank you,
>
> Jorge




Re: [OMPI users] Pascal Interface for OpenMPI

2007-10-22 Thread Jeff Squyres

On Oct 22, 2007, at 6:44 PM, Lourival Mendes wrote:


   Hy everybody, I'm interested in use the MPI on the Pascal
environment. I tryed the MPICH2 list but no success. On the Free
Pascal Compiler list, Daniël invited me to subscribe this list and
open a discussion on the interface of OpenMPI for Pascal.

   Probably as Daniël knows there is almost no reference on the MPI
for Pascal interface, only some very few tryes, one of them in
Russian.

   I would like to know if there is someone working on the interface
of OpenMPI for Pascal?


There was a mail or two about it a while ago; you might want to dig  
through the OMPI list archives.  The short version is that none of  
the current Open MPI members have a desire to add Pascal bindings to  
MPI.  It also might be somewhat of an uphill battle to convince the  
old-school MPI'ers to include a Pascal interface in Open MPI, even if  
it was developed by a 3rd party and contributed to the project.


However, that should not deter you from pursuing a Pascal interface  
if you want one.  Traditionally, extensions to MPI have been  
implemented in an MPI-neutral fashion and released into the wild as  
3rd party libraries (such as the C++ bindings for MPI several years  
ago).  The Pascal bindings likely don't need to know anything about  
the internals of an MPI implementation -- they can just call the C  
bindings.  So it's possible/likely that you would write up a Pascal  
interface that would work with both Open MPI and MPICH (and any other  
MPI's out there).


As I typed out the above, I have a dim recollection of the Pascal  
interface needing to obtain the values of the C constants during its  
setup/compilation phase (note that these values are going to be  
different between different MPI implementations).  You have a few  
options here; you could write a parser for mpi.h to extract the  
values you need (e.g., in perl or somesuch) or write a short C  
program to extract them and printf the values that you capture into a  
Pascal header file or something (I'm not sure if you need the literal  
or symbolic values -- I remember very little of Pascal).  Either way,  
with a little diligence and creativity, it could probably be done.



   Also I'm putting a very newbie question: What is the main
difference between the OpenMPI and MPICH ?


Short version: we're different projects implementing the same API  
standard.


--
Jeff Squyres
Cisco Systems




[OMPI users] Pascal Interface for OpenMPI

2007-10-22 Thread Lourival Mendes
   Hy everybody, I'm interested in use the MPI on the Pascal
environment. I tryed the MPICH2 list but no success. On the Free
Pascal Compiler list, Daniël invited me to subscribe this list and
open a discussion on the interface of OpenMPI for Pascal.

   Probably as Daniël knows there is almost no reference on the MPI
for Pascal interface, only some very few tryes, one of them in
Russian.

   I would like to know if there is someone working on the interface
of OpenMPI for Pascal?

   Also I'm putting a very newbie question: What is the main
difference between the OpenMPI and MPICH ?

Thanks in advance..

Lourival Mendes



Re: [OMPI users] OMPI & uDAPL

2007-10-22 Thread Troy Telford
On Monday 22 October 2007, Troy Telford wrote:
> WARNING: Failed to open "ib0"

Whoops; I typed in the wrong text here.  The failure was "Failed to 
Open "InfiniHost0" - ie. the name listed in the warning matches the name 
in /etc/dat.conf.

-- 
Troy Telford


[OMPI users] OMPI & uDAPL

2007-10-22 Thread Troy Telford
OK, I've got a system set up so that it can use uDAPL over IB (! OFED, ! 
Mellanox, though) on Linux.

Running simple dapl test programs (shamelessly pulled from the OFED tree) 
seems to verify that DAPL is in fact operating properly.

After searching through the mail archives, I found a small test code by Donald 
Kerr (dat_reg.c), and compiled an ran that successfully.  When run, it 
returns the name of the DAT name (ib0)

I've also been able to run programs using uDAPL with Intel MPI, for example.  
I'm fairly sure uDAPL is working.

However, when I attempt to run an MPI program over uDAPL (--mca btl 
udapl,sm,self), I receive the following error:

WARNING: Failed to open "ib0" 
[DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.

[0,1,0]: uDAPL on host n02 was unable to find any NICs.


I've also tried using --mca btl_udapl_if_include ib0, but that doesn't seem to 
have any effect.

Interestingly enough, when I don't specify a DAT provider, and I play with the 
name in /etc/dat.conf, Open MPI seems aware of the name change; it will 
list 'failed to open "newname"'


my /etc/dat.conf looks like this:
InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " "


Any ideas on why I'm not able to get Open MPI to use uDAPL?
-- 
Troy Telford


Re: [OMPI users] SLURM vs. Torque? [OT]

2007-10-22 Thread Peter Kjellstrom
On Monday 22 October 2007, Bill Johnstone wrote:
> Hello All.
>
> We are starting to need resource/scheduling management for our small
> cluster, and I was wondering if any of you could provide comments on
> what you think about Torque vs. SLURM?  On the basis of the appearance
> of active development as well as the documentation, SLURM seems to be
> superior, but can anyone shed light on how they compare in use?

I won't attempt a full analysis but here are two small (random) crumbs of 
information.

1) Slurm keeps the name of stuff sepparate from the contact address 
(ControlMachine=hostname, ControlAddr=IP/whatever). This alone wins my heart 
any day of the week.

2) The scheduler can be a weak point for slurm. If you can live with the built 
in trivial one then great. If you need more and happen to find something that 
is free and works (or writes one yourself) then let me know ;-)

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] SLURM vs. Torque?

2007-10-22 Thread Jeff Squyres
IMNSHO: SLURM, Torque, and N1GE are all fine products.  They work  
well in production environments, both small and large.  They all have  
default trivial FIFO schedulers but can also be used with more  
complex schedulers (e.g., Maui/Moab).


FWIW: I tend to like SLURM simply out of personal preference; that's  
what I use on my Cisco development and testing clusters.  I'll echo  
Jeff P's sentiments that the SLURM developers are very responsive to  
questions and fixing problems.



On Oct 22, 2007, at 2:24 PM, Jeff Pummill wrote:

SLURM was really easy to build and install, plus it's a project of  
LLNL and I love stuff that the Nat'l Labs architect.


The SLURM message board is also very active and quick to respond to  
questions and problems.



Jeff F. Pummill



Bill Johnstone wrote:
Hello All. We are starting to need resource/scheduling management  
for our small cluster, and I was wondering if any of you could  
provide comments on what you think about Torque vs. SLURM? On the  
basis of the appearance of active development as well as the  
documentation, SLURM seems to be superior, but can anyone shed  
light on how they compare in use? I realize the truth in the stock  
answer of "it depends on what you need/want," but as of yet we are  
not experienced enough with this kind of thing to have a set of  
firm requirements. At this point, we can probably adapt our  
workflow/usage a little bit to accomodate the way the resource  
manager works. And of course we'll be using OpenMPI with whatever  
resource manager we go with. Anyway, enough from me -- I'm looking  
to hear other's experiences and viewpoints. Thanks for any input!  
__ Do You Yahoo!?  
Tired of spam? Yahoo! Mail has the best spam protection around  
http://mail.yahoo.com  
___ users mailing list  
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] SLURM vs. Torque?

2007-10-22 Thread Jeff Pummill
SLURM was really easy to build and install, plus it's a project of LLNL 
and I love stuff that the Nat'l Labs architect.


The SLURM message board is also very active and quick to respond to 
questions and problems.



Jeff F. Pummill



Bill Johnstone wrote:

Hello All.

We are starting to need resource/scheduling management for our small
cluster, and I was wondering if any of you could provide comments on
what you think about Torque vs. SLURM?  On the basis of the appearance
of active development as well as the documentation, SLURM seems to be
superior, but can anyone shed light on how they compare in use?

I realize the truth in the stock answer of "it depends on what you
need/want," but as of yet we are not experienced enough with this kind
of thing to have a set of firm requirements.  At this point, we can
probably adapt our workflow/usage a little bit to accomodate the way
the resource manager works.  And of course we'll be using OpenMPI with
whatever resource manager we go with.

Anyway, enough from me -- I'm looking to hear other's experiences and
viewpoints.

Thanks for any input!

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  


Re: [OMPI users] SLURM vs. Torque?

2007-10-22 Thread McCalla, Mac
Hi Bill,

Have you taken a peek at http://gridengine.sunsource.net ? 

 Regards,

Mac McCalla

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Bill Johnstone
Sent: Monday, October 22, 2007 12:29 PM
To: us...@open-mpi.org
Subject: [OMPI users] SLURM vs. Torque?

Hello All.

We are starting to need resource/scheduling management for our small
cluster, and I was wondering if any of you could provide comments on
what you think about Torque vs. SLURM?  On the basis of the appearance
of active development as well as the documentation, SLURM seems to be
superior, but can anyone shed light on how they compare in use?

I realize the truth in the stock answer of "it depends on what you
need/want," but as of yet we are not experienced enough with this kind
of thing to have a set of firm requirements.  At this point, we can
probably adapt our workflow/usage a little bit to accomodate the way the
resource manager works.  And of course we'll be using OpenMPI with
whatever resource manager we go with.

Anyway, enough from me -- I'm looking to hear other's experiences and
viewpoints.

Thanks for any input!

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] SLURM vs. Torque?

2007-10-22 Thread Bill Johnstone
Hello All.

We are starting to need resource/scheduling management for our small
cluster, and I was wondering if any of you could provide comments on
what you think about Torque vs. SLURM?  On the basis of the appearance
of active development as well as the documentation, SLURM seems to be
superior, but can anyone shed light on how they compare in use?

I realize the truth in the stock answer of "it depends on what you
need/want," but as of yet we are not experienced enough with this kind
of thing to have a set of firm requirements.  At this point, we can
probably adapt our workflow/usage a little bit to accomodate the way
the resource manager works.  And of course we'll be using OpenMPI with
whatever resource manager we go with.

Anyway, enough from me -- I'm looking to hear other's experiences and
viewpoints.

Thanks for any input!

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


[OMPI users] xcode and ompi

2007-10-22 Thread Tony Sheh

Hi all,

I'm working in xcode and i'm trying to build an application that  
links against the OMPI libraries. So far i've included the following  
files in the build:


libmpi.dylib
libopen-pal.dylib
libopen-rte.dylib

and the errors i get are

Undefined symbols:
 all the MPI functions you can think of..


as well as a warning: "suggest use of -bind_at_load, as lazy binding  
may result in errors or different symbols being used


I've compiled and linked to the static libraries (using ./configure -- 
enable-static) and i get the same errors. Also, i previously the  
latest version of lam/mpi installed. I didn't uninstall it since i  
lost the original directory as well as the make and configure  
settings. If that is the conflict then any information about how to  
resolve it would be good.


Thanks!
Tony