Re: [OMPI users] quadrics support?

2009-07-08 Thread Ashley Pittman
On Wed, 2009-07-08 at 15:43 -0400, Michael Di Domenico wrote:
> On Wed, Jul 8, 2009 at 3:33 PM, Ashley Pittman wrote:
> >> When i run tping i get:
> >> ELAN_EXCEOPTIOn @ --: 6 (Initialization error)
> >> elan_init: Can't get capability from environment
> >>
> >> I am not using slurm or RMS at all, just trying to get openmpi to run
> >> between two nodes.
> >
> > To attach to the elan a process has to have a "capability" which is a
> > kernel attribute describing the size (number of nodes/ranks) of the job,
> > without this you'll get errors like the one from tping.  The only way to
> > generate these capabilities is by using RMS, Slurm or I believe pdsh
> > which can generate one and push it into the kernel before calling fork()
> > to create the user application.
> 
> I didn't realize it was an MPI type program, so I ran is using the
> QSNet version of mpirun and OpenMPI.  The process does start and runs
> through 0: and 2:, which i assume are packet sizes, but freezes at
> that point.
> 
> We have an existing XC cluster from HP, that we're trying to convert
> from XC to standard RHEL5.3 w/ Slurm and OpenMPI.  All i want to be
> able to do is load RHEL5 and the Quadrics NIC drivers, and run OpenMPI
> jobs between these two nodes I yanked from the cluster before we
> switch the whole thing over.

My advice would be to try OpenMPI on the (presumably functional) XC
cluster and then migrate that from there to RHEL5.3.  I don't recall
Slurm being hard to get working but it'll be a lot easier to diagnose if
you get OpenMPI and the resource manager working separately before
putting them together.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI users] quadrics support?

2009-07-08 Thread Michael Di Domenico
On Wed, Jul 8, 2009 at 3:33 PM, Ashley Pittman wrote:
>> When i run tping i get:
>> ELAN_EXCEOPTIOn @ --: 6 (Initialization error)
>> elan_init: Can't get capability from environment
>>
>> I am not using slurm or RMS at all, just trying to get openmpi to run
>> between two nodes.
>
> To attach to the elan a process has to have a "capability" which is a
> kernel attribute describing the size (number of nodes/ranks) of the job,
> without this you'll get errors like the one from tping.  The only way to
> generate these capabilities is by using RMS, Slurm or I believe pdsh
> which can generate one and push it into the kernel before calling fork()
> to create the user application.

I didn't realize it was an MPI type program, so I ran is using the
QSNet version of mpirun and OpenMPI.  The process does start and runs
through 0: and 2:, which i assume are packet sizes, but freezes at
that point.

We have an existing XC cluster from HP, that we're trying to convert
from XC to standard RHEL5.3 w/ Slurm and OpenMPI.  All i want to be
able to do is load RHEL5 and the Quadrics NIC drivers, and run OpenMPI
jobs between these two nodes I yanked from the cluster before we
switch the whole thing over.


Re: [OMPI users] quadrics support?

2009-07-08 Thread Ashley Pittman
On Wed, 2009-07-08 at 15:09 -0400, Michael Di Domenico wrote:
> On Wed, Jul 8, 2009 at 12:33 PM, Ashley Pittman wrote:
> > Is the machine configured correctly to allow non OpenMPI QsNet programs
> > to run, for example tping?
> >
> > Which resource manager are you running, I think slurm compiled for RMS
> > is essential.
> 
> I can ping via TCP/IP using the eip0 ports.
> 
> When i run tping i get:
> ELAN_EXCEOPTIOn @ --: 6 (Initialization error)
> elan_init: Can't get capability from environment
> 
> I am not using slurm or RMS at all, just trying to get openmpi to run
> between two nodes.

To attach to the elan a process has to have a "capability" which is a
kernel attribute describing the size (number of nodes/ranks) of the job,
without this you'll get errors like the one from tping.  The only way to
generate these capabilities is by using RMS, Slurm or I believe pdsh
which can generate one and push it into the kernel before calling fork()
to create the user application.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI users] quadrics support?

2009-07-08 Thread Michael Di Domenico
On Wed, Jul 8, 2009 at 12:33 PM, Ashley Pittman wrote:
> Is the machine configured correctly to allow non OpenMPI QsNet programs
> to run, for example tping?
>
> Which resource manager are you running, I think slurm compiled for RMS
> is essential.

I can ping via TCP/IP using the eip0 ports.

When i run tping i get:
ELAN_EXCEOPTIOn @ --: 6 (Initialization error)
elan_init: Can't get capability from environment

I am not using slurm or RMS at all, just trying to get openmpi to run
between two nodes.

Using -mca btl self,tcp -mca btl_tcp_if_include eip0 i can run the
jobs no problem using sockets over the elan interface, but if i run
the job with -mca btl self,elan,tcp, below is the short snipped
output:

Signal: Segmentation fault (11)
Signal code: Invalid permissions (2)


Re: [OMPI users] quadrics support?

2009-07-08 Thread Ashley Pittman
On Tue, 2009-07-07 at 17:18 -0400, Michael Di Domenico wrote:
> So, first run i seem to have run into a bit of an issue.  All the
> Quadrics modules are compiled and loaded.  I can ping between nodes
> over the quadrics interfaces.  But when i try to run one of the hello
> mpi example from openmpi, i get:
> 
> first run, the process hung - killed with ctl-c
> though it doesnt seem to actually die and kill -9 doesn't work
> 
> second run, the process fails with
>   failed elan4_attach  Device or resource busy
>   
>   elan_allocSleepDesc  Failed to allocate IRQ cookie 2a: 22
> Invalid argument
> all subsequent runs fail the same way and i have to reboot the box to
> get the processes to go away
> 
> I'm not sure if this is a quadrics or openmpi issue at this point, but
> i figured since there are quadrics people on the list its a good place
> to start

Is the machine configured correctly to allow non OpenMPI QsNet programs
to run, for example tping?

Which resource manager are you running, I think slurm compiled for RMS
is essential.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI users] quadrics support?

2009-07-08 Thread Ashley Pittman
On Tue, 2009-07-07 at 15:30 -0400, Michael Di Domenico wrote:
> Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to
> operate?  Or operate at full speed or are the Quadrics modules
> sufficient?

In theory you can run without although you'll find it easier and the
code faster if you run the patches.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI users] quadrics support?

2009-07-07 Thread Michael Di Domenico
So, first run i seem to have run into a bit of an issue.  All the
Quadrics modules are compiled and loaded.  I can ping between nodes
over the quadrics interfaces.  But when i try to run one of the hello
mpi example from openmpi, i get:

first run, the process hung - killed with ctl-c
though it doesnt seem to actually die and kill -9 doesn't work

second run, the process fails with
  failed elan4_attach  Device or resource busy
  
  elan_allocSleepDesc  Failed to allocate IRQ cookie 2a: 22
Invalid argument
all subsequent runs fail the same way and i have to reboot the box to
get the processes to go away

I'm not sure if this is a quadrics or openmpi issue at this point, but
i figured since there are quadrics people on the list its a good place
to start

On Tue, Jul 7, 2009 at 3:30 PM, Michael Di
Domenico wrote:
> Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to
> operate?  Or operate at full speed or are the Quadrics modules
> sufficient?
>
> On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman wrote:
>> On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote:
>>> Jeff,
>>>
>>> Okay, thanks.  I'll give it a shot and report back.  I can't
>>> contribute any code, but I can certainly do testing...
>>
>> I'm from the Quadrics stable so could certainty support a port should
>> you require it but I don't have access to hardware either currently.
>>
>> Ashley,
>>
>> --
>>
>> Ashley Pittman, Bath, UK.
>>
>> Padb - A parallel job inspection tool for cluster computing
>> http://padb.pittman.org.uk
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>


Re: [OMPI users] quadrics support?

2009-07-07 Thread Michael Di Domenico
Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to
operate?  Or operate at full speed or are the Quadrics modules
sufficient?

On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman wrote:
> On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote:
>> Jeff,
>>
>> Okay, thanks.  I'll give it a shot and report back.  I can't
>> contribute any code, but I can certainly do testing...
>
> I'm from the Quadrics stable so could certainty support a port should
> you require it but I don't have access to hardware either currently.
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] quadrics support?

2009-07-02 Thread Ashley Pittman
On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote:
> Jeff,
> 
> Okay, thanks.  I'll give it a shot and report back.  I can't
> contribute any code, but I can certainly do testing...

I'm from the Quadrics stable so could certainty support a port should
you require it but I don't have access to hardware either currently.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI users] quadrics support?

2009-07-02 Thread Michael Di Domenico
Jeff,

Okay, thanks.  I'll give it a shot and report back.  I can't
contribute any code, but I can certainly do testing...

On Thu, Jul 2, 2009 at 9:23 AM, Jeff Squyres wrote:
> I see ompi/mca/btl/elan in the OMPI SVN development trunk and in the 1.3
> tree (where elan = the quadrics interface).
>
> So actually, looking at the 1.3.x README, I see configure switches like
> "--with-elan" that specifies where the Elan (Quadrics) headers and libraries
> live.  I have no Quadrics networks and didn't pay attention to this
> development at all (obviously ;-) ) -- you might want to give it a shot and
> see how well it performs.  Meaning: I'm sure it works or UT wouldn't have
> pushed this stuff upstream, but I have no idea how well tuned it is.
>
> If you build OMPI properly, you should be able to tell if Quadrics support
> is included via
>
>ompi_info | grep elan
>
> You should see a BTL line for elan (i.e., a BTL plugin for "elan" is
> installed and functional).  Although OMPI should automatically pick elan for
> MPI communications, you can force OMPI to pick it via:
>
>mpirun --mca btl elan,self ...
>
> Quadrics networks should also qualify for Open MPI's "other" type of network
> support (the MTL, instead of the BTL).  MTL level support can typically give
> slightly better performance on some types of networks, but it doesn't look
> like anyone did any work in this area.  Contributions are always welcome, of
> course!  :-)
>
>
>
> On Jul 2, 2009, at 9:12 AM, Michael Di Domenico wrote:
>
>> Jeff,
>>
>> Thanks, honestly though if the patches haven't been pulled mainline,
>> we are not likely to bring it internally.  I was hoping that quadrics
>> support was mainline, but the documentation was out of date.
>>
>> On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres wrote:
>> > George --
>> >
>> > I know that U. Tennessee did some work in this area; did it ever
>> > materialize?
>> >
>> >
>> > On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote:
>> >
>> >> Did the quadrics support for OpenMPI ever materialize?  I can't find
>> >> any documentation on the web about it and the few mailing list
>> >> messages I came across showed some hints that it might be in progress
>> >> but that was way back in 2007
>> >>
>> >> Thanks
>> >> ___
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >
>> >
>> > --
>> > Jeff Squyres
>> > Cisco Systems
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] quadrics support?

2009-07-02 Thread Jeff Squyres
I see ompi/mca/btl/elan in the OMPI SVN development trunk and in the  
1.3 tree (where elan = the quadrics interface).


So actually, looking at the 1.3.x README, I see configure switches  
like "--with-elan" that specifies where the Elan (Quadrics) headers  
and libraries live.  I have no Quadrics networks and didn't pay  
attention to this development at all (obviously ;-) ) -- you might  
want to give it a shot and see how well it performs.  Meaning: I'm  
sure it works or UT wouldn't have pushed this stuff upstream, but I  
have no idea how well tuned it is.


If you build OMPI properly, you should be able to tell if Quadrics  
support is included via


ompi_info | grep elan

You should see a BTL line for elan (i.e., a BTL plugin for "elan" is  
installed and functional).  Although OMPI should automatically pick  
elan for MPI communications, you can force OMPI to pick it via:


mpirun --mca btl elan,self ...

Quadrics networks should also qualify for Open MPI's "other" type of  
network support (the MTL, instead of the BTL).  MTL level support can  
typically give slightly better performance on some types of networks,  
but it doesn't look like anyone did any work in this area.   
Contributions are always welcome, of course!  :-)




On Jul 2, 2009, at 9:12 AM, Michael Di Domenico wrote:


Jeff,

Thanks, honestly though if the patches haven't been pulled mainline,
we are not likely to bring it internally.  I was hoping that quadrics
support was mainline, but the documentation was out of date.

On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres  
wrote:

> George --
>
> I know that U. Tennessee did some work in this area; did it ever
> materialize?
>
>
> On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote:
>
>> Did the quadrics support for OpenMPI ever materialize?  I can't  
find

>> any documentation on the web about it and the few mailing list
>> messages I came across showed some hints that it might be in  
progress

>> but that was way back in 2007
>>
>> Thanks
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
Cisco Systems



Re: [OMPI users] quadrics support?

2009-07-02 Thread Michael Di Domenico
Jeff,

Thanks, honestly though if the patches haven't been pulled mainline,
we are not likely to bring it internally.  I was hoping that quadrics
support was mainline, but the documentation was out of date.

On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres wrote:
> George --
>
> I know that U. Tennessee did some work in this area; did it ever
> materialize?
>
>
> On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote:
>
>> Did the quadrics support for OpenMPI ever materialize?  I can't find
>> any documentation on the web about it and the few mailing list
>> messages I came across showed some hints that it might be in progress
>> but that was way back in 2007
>>
>> Thanks
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] quadrics support?

2009-07-01 Thread Michael Di Domenico
Did the quadrics support for OpenMPI ever materialize?  I can't find
any documentation on the web about it and the few mailing list
messages I came across showed some hints that it might be in progress
but that was way back in 2007

Thanks