Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Roland Dreier
 > Is it possible that /sys/class/infiniband directory exist and it is 
 > empty ? In which cases ?

Do "modprobe ib_core" on a system with no hardware drivers loaded (or no
RDMA hardware installed)


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Jeff Squyres

On May 29, 2008, at 3:27 AM, Pavel Shamis (Pasha) wrote:

I got some more feedback from Roland off-list explaining that if / 
sys/

class/infiniband does exist and is non-empty and /sys/class/
infiniband_verbs/abi_version does not exist, then this is  
definitely a

case where we want to warn because it implies that config is screwed
up -- RDMA devices are present but not usable.


Is it possible that /sys/class/infiniband directory exist and it is
empty ? In which cases ?


Roland consistently said "...and not empty" in e-mails to me, so  
that's what I assumed.


However, Pasha just did a test: on a machine with a ConnectX HCA, he  
manually removed the mlx4 drive and started the openibd service.  /sys/ 
class/infiniband was created, but it was empty.


I guess this is a situation that we want to warn about -- we can  
simplify the whole deal by making the overriding assumption: if the  
drivers are loaded at all (such that /sys/class/infiniband/ exists at  
all), OMPI should expect to be able to find some RDMA devices.  If it  
doesn't find any, it should issue a warning.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Pavel Shamis (Pasha)


I got some more feedback from Roland off-list explaining that if /sys/ 
class/infiniband does exist and is non-empty and /sys/class/ 
infiniband_verbs/abi_version does not exist, then this is definitely a  
case where we want to warn because it implies that config is screwed  
up -- RDMA devices are present but not usable.
  
Is it possible that /sys/class/infiniband directory exist and it is 
empty ? In which cases ?


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-28 Thread Jeff Squyres

On May 28, 2008, at 8:02 AM, Jeff Squyres wrote:


Note that the two /sys checks may be redundant; I'm not entirely sure
how the two files relate to each other.  libibverbs will complain
about the first if it is not present; the second is used to indicate
that the kernel drivers are loaded.


I got some more feedback from Roland off-list explaining that if /sys/ 
class/infiniband does exist and is non-empty and /sys/class/ 
infiniband_verbs/abi_version does not exist, then this is definitely a  
case where we want to warn because it implies that config is screwed  
up -- RDMA devices are present but not usable.


In this case, I think the warning that libibverbs itself prints is  
suitable ("Fatal: couldn't read...").  So let's just eliminate that  
check in OMPI and go with something like the following (pretty much  
exactly what was proposed a while ago by Pasha :-) ):


  # If sysfs/class/infiniband does not exist, the driver was not
  # started.  Therefore: assume that the user does not want RDMA
  # hardware support -- do *not* print a warning message.
  if (! -d "$sysfsdir/class/infiniband") {
  if ($always_want_to_see_warnings)
  print "Warning: $sysfsdir/class/infiniband does not exist\n";
  return SKIP_THIS_BTL;
  }

  # If we get to this point, the drivers are loaded and therefore we
  # will assume that there is supposed to be at least one RDMA device
  # present.  Warn if we don't find any.
  $list = ibv_get_device_list();
  if (empty($list)) {
  print "Warning: couldn't find any RDMA devices -- if you have  
no RDMA devices, stop the driver to avoid this warning message\n";

  return SKIP_THIS_BTL;
  }

  # ...continue with initialization; warnings and errors are
  # *always* displayed after this point

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-28 Thread Jeff Squyres
Ok.  With lots more off-list discussion, how's this pseudocode for a  
proposal:


  # Main assumption: if the kernel drivers are loaded, the user wants  
RDMA

  # hardware support in OMPI.

  $sysfsdir = ibv_get_sysfs_path();
  # Avoid printing "Fatal: couldn't read uverbs ABI version" message.
  if (! -r "$sysfsdir/class/infiniband_verbs/abi_version") {
  if ($always_want_to_see_warnings)
  print "Warning: verbs ABI version unreadable\n";
  return SKIP_THIS_BTL;
  }

  # If sysfs/class/infiniband does not exist, the driver was not  
started.
  # Therefore: assume that the user does not want RDMA hardware  
support --

  # do *not* print a warning message.
  if (! -d "$sysfsdir/class/infiniband") {
  if ($always_want_to_see_warnings)
  print "Warning: $sysfsdir/class/infiniband does not exist\n";
  return SKIP_THIS_BTL;
  }

  # If we get to this point, the drivers are loaded and therefore we  
will
  # assume that there is supposed to be at least one RDMA device  
present.

  # Warn if we don't find any.
  $list = ibv_get_device_list();
  if (empty($list)) {
  print "Warning: couldn't find any RDMA devices -- if you have  
no RDMA devices, stop the driver to avoid this warning message\n";

  return SKIP_THIS_BTL;
  }

  # ...continue with initialization; warnings and errors are
  # *always* displayed after this point

An overriding assumption here is that if the user requested *only* the  
openib BTL in OMPI and it fails to find any devices, OMPI will always  
print an error that it was unable to reach remote MPI peers  
(regardless of whether the default warning was previously printed or  
not).


Note that the two /sys checks may be redundant; I'm not entirely sure  
how the two files relate to each other.  libibverbs will complain  
about the first if it is not present; the second is used to indicate  
that the kernel drivers are loaded.




On May 26, 2008, at 5:10 AM, Manuel Prinz wrote:


Am Samstag, den 24.05.2008, 17:30 +0200 schrieb Manuel Prinz:

Am Donnerstag, den 22.05.2008, 17:18 -0400 schrieb Jeff Squyres:

Could you check with some of your other Debian maintainers?


I'm sorry that I can't check that before Monday! I'll let you know
then but I'm not aware of that.


I just checked on a box with no InfiniBand hardware: /dev/infiniband
*does not* exist. Loading the IB kernel modules *does not* create the
device. I seems like it only exists if the hardware is present.

Best regards
Manuel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-24 Thread Manuel Prinz
Am Donnerstag, den 22.05.2008, 17:18 -0400 schrieb Jeff Squyres:
> Could you check with some of your other Debian maintainers?

I'm sorry that I can't check that before Monday! I'll let you know then
but I'm not aware of that.

Never thought about the impact of my initial request to Jeff. I
personally do not think that it's a problem that warnings are issued, I
just think there should be a simple way to help out those users who want
to get rid of them and/or help them to understand where the warning
comes from in the first place. If you do find a way to suppress them for
those who do not have the required hardware this would be indeed nice
but it should not be suppressed by default if it complicated debugging
or system administration. Just my two cents.

The reasoning that libibverbs should only be installed if the hardware
exists is somehow valid but kind of impossible to garantee in a
distribution. We (the Debian OpenMPI maintainers) could build two OMPI
packages (with and without libibverbs dependancy) but this increases the
maintainance burden and does not work well with other MPI-using
packages. I guess the common case is that libibverbs is pulled in from
"somewhere" even if no hardware is present.

Best regards
Manuel



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Roland Dreier
 > Either that or udev in not configured properly.

Debian has a correct udev configuration, modulo

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=449081

 > ib_core/mthca/mlx4 should be loaded automatically by hotplug if HW is
 > present. No need for any additional configuration.

Yes (although only mlx4_core and not mlx4_ib will be loaded based on PCI
IDs), but nothing loads ib_uverbs automatically, and systems that have
no RDMA hardware will obviously not have any RDMA drivers autoloaded.

 - R.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Roland Dreier
 > OFED is one distribution of the OpenFabrics software.  It can be  
 > bundled up and packaged differently, too.  I suspect that Debian does  
 > not include OFED directly, because OFED is pretty heavily dependent  
 > upon RPM.  So the OpenFabrics kernel bits must be there somewhere  
 > (libibverbs would be useless, otherwise); it would be nice to  
 > understand how they are activated: either manually or automatically.

"OpenFabrics kernel bits" doesn't really make sense.  Debian just ships
a Linux kernel, which has InfiniBand/RDMA drivers.

Debian doesn't load the ib_uverbs module by default, nor should it,
since the vast majority of users don't have RDMA hardware.  So
libibverbs and Open MPI should act sanely when no kernel drivers are
loaded, /sys/class/infinibad_verbs doesn't exist, etc.

There is already a Debian bug open about this for libibverbs:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418014

I've been meaning to work on this but sadly I have not been able to put
much time into it.

 - R.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Dirk Eddelbuettel
On Fri, May 23, 2008 at 09:56:44AM +0300, Gleb Natapov wrote:
> On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote:
> > > > Also, if this test depends on the Debian kernel packages, then we're
> > > > back to square one as some folks (like myself) run binary kernels,
> > > > other may just hand-compile and this test may not work as we may miss
> > > > the 'Debian trigger' in those cases.
> > > 
> > > 
> > > The OpenFabrics kernel drivers are implemented as kernel modules, so  
> > > it's mainly just a question of loading them it to start them running.   
> > > For example, in the official OFED distribution, it comes with /etc/ 
> > 
> > Do you have any information whether OFED is in fact packaged for
> > Debian?  It may not be, and hence no file ...
> > 
> AFAIK OFED is not packaged for debian. Ronald packages IB for debian.

Correct, that is my understanding too.

Good point also re udev. 

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Gleb Natapov
On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote:
> > > Also, if this test depends on the Debian kernel packages, then we're
> > > back to square one as some folks (like myself) run binary kernels,
> > > other may just hand-compile and this test may not work as we may miss
> > > the 'Debian trigger' in those cases.
> > 
> > 
> > The OpenFabrics kernel drivers are implemented as kernel modules, so  
> > it's mainly just a question of loading them it to start them running.   
> > For example, in the official OFED distribution, it comes with /etc/ 
> 
> Do you have any information whether OFED is in fact packaged for
> Debian?  It may not be, and hence no file ...
> 
AFAIK OFED is not packaged for debian. Ronald packages IB for debian.

--
Gleb.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Gleb Natapov
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote:
> On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote:
> 
> > Is there a test I could run for you?
> 
> Can you see if /dev/infiniband exists?  If it does, the OpenFabrics  
> kernel drivers are running.  If not, they aren't.
Either that or udev in not configured properly.

> 
> > Also, if this test depends on the Debian kernel packages, then we're
> > back to square one as some folks (like myself) run binary kernels,
> > other may just hand-compile and this test may not work as we may miss
> > the 'Debian trigger' in those cases.
> 
> 
> The OpenFabrics kernel drivers are implemented as kernel modules, so  
> it's mainly just a question of loading them it to start them running.   
> For example, in the official OFED distribution, it comes with /etc/ 
> init.d/openibd -- "start" loads the kernel modules and does all the  
> necessary initialization, "stop" unloads everything, etc.
> 
ib_core/mthca/mlx4 should be loaded automatically by hotplug if HW is
present. No need for any additional configuration.

--
Gleb.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres

On May 22, 2008, at 4:30 PM, Dirk Eddelbuettel wrote:


Can you see if /dev/infiniband exists?  If it does, the OpenFabrics
kernel drivers are running.  If not, they aren't.


Negative -- I have no /dev/infiniband.  So his test idea seems
feasible which is nice!


Good!


Do you have any information whether OFED is in fact packaged for

Debian?  It may not be, and hence no file ...


OFED is one distribution of the OpenFabrics software.  It can be  
bundled up and packaged differently, too.  I suspect that Debian does  
not include OFED directly, because OFED is pretty heavily dependent  
upon RPM.  So the OpenFabrics kernel bits must be there somewhere  
(libibverbs would be useless, otherwise); it would be nice to  
understand how they are activated: either manually or automatically.



So if you have this init.d file, perhaps it's a question of checking
the chkconfig levels upon installation...?


Don't have it, but then again, my personal installation is in no way
authorative for all of Debian.



Could you check with some of your other Debian maintainers?

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote:
> On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote:
> 
> > Is there a test I could run for you?
> 
> Can you see if /dev/infiniband exists?  If it does, the OpenFabrics  
> kernel drivers are running.  If not, they aren't.

Negative -- I have no /dev/infiniband.  So his test idea seems
feasible which is nice!

> > Also, if this test depends on the Debian kernel packages, then we're
> > back to square one as some folks (like myself) run binary kernels,
> > other may just hand-compile and this test may not work as we may miss
> > the 'Debian trigger' in those cases.
> 
> 
> The OpenFabrics kernel drivers are implemented as kernel modules, so  
> it's mainly just a question of loading them it to start them running.   
> For example, in the official OFED distribution, it comes with /etc/ 

Do you have any information whether OFED is in fact packaged for
Debian?  It may not be, and hence no file ...

> init.d/openibd -- "start" loads the kernel modules and does all the  
> necessary initialization, "stop" unloads everything, etc.
> 
> So if you have this init.d file, perhaps it's a question of checking  
> the chkconfig levels upon installation...?

Don't have it, but then again, my personal installation is in no way
authorative for all of Debian.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres

On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote:


Is there a test I could run for you?


Can you see if /dev/infiniband exists?  If it does, the OpenFabrics  
kernel drivers are running.  If not, they aren't.



Also, if this test depends on the Debian kernel packages, then we're
back to square one as some folks (like myself) run binary kernels,
other may just hand-compile and this test may not work as we may miss
the 'Debian trigger' in those cases.



The OpenFabrics kernel drivers are implemented as kernel modules, so  
it's mainly just a question of loading them it to start them running.   
For example, in the official OFED distribution, it comes with /etc/ 
init.d/openibd -- "start" loads the kernel modules and does all the  
necessary initialization, "stop" unloads everything, etc.


So if you have this init.d file, perhaps it's a question of checking  
the chkconfig levels upon installation...?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 03:45:36PM -0400, Jeff Squyres wrote:
> On May 22, 2008, at 3:42 PM, Dirk Eddelbuettel wrote:
> 
> >> When you install binary OMPI (which pulls in libibverbs and all the
> >> rest), do you set the OpenFabrics kernel drivers to start upon boot?
> >> Or does the user have to do that manually?
> >
> > I think so. To the best of my knowledge, we don't do anything  
> > explicitly.
> 
> Can you check?

How?

Among our approximatly 20,000 binary packages, I only see

   edd@ron:~$ apt-cache search fabrics
   libmthca-dev - Development files for the libmthca driver
   libmthca1 - A userspace driver for Mellanox InfiniBand HCAs
   libmthca1-dbg - Debugging symbols for the libmthca driver
   edd@ron:~$

I am not aware of anything kernel-specific happening.  That said, I could 
simply be unaware.  

Is there a test I could run for you?

Also, if this test depends on the Debian kernel packages, then we're
back to square one as some folks (like myself) run binary kernels,
other may just hand-compile and this test may not work as we may miss
the 'Debian trigger' in those cases.

> If you (or something else in Debian) start the OpenFabrics drivers  
> automatically (regardless of whether there are any verbs-capable  
> devices), then that kinda defeats the point of Pasha's proposed check...

Yes, but here I can only offer a meek 'dunno, really'.  Sorry!

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres

On May 22, 2008, at 3:42 PM, Dirk Eddelbuettel wrote:


When you install binary OMPI (which pulls in libibverbs and all the
rest), do you set the OpenFabrics kernel drivers to start upon boot?
Or does the user have to do that manually?


I think so. To the best of my knowledge, we don't do anything  
explicitly.


Can you check?

If you (or something else in Debian) start the OpenFabrics drivers  
automatically (regardless of whether there are any verbs-capable  
devices), then that kinda defeats the point of Pasha's proposed check...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 03:35:03PM -0400, Jeff Squyres wrote:
> Dirk / Debian guys --
> 
> When you install binary OMPI (which pulls in libibverbs and all the  
> rest), do you set the OpenFabrics kernel drivers to start upon boot?   
> Or does the user have to do that manually?

I think so. To the best of my knowledge, we don't do anything explicitly. 

There is really just a Depends: on whatever is needed to run the
code. E.g. because we build against libibverbs, the libopenmpi1
library package ends up with

   Depends: libc6 (>= 2.7-1), libgcc1 (>= 1:4.1.1-21), libgfortran3 (>= 4.3), \
libibverbs1 (>= 1.1), libstdc++6 (>= 4.1.1-21)

which is rather standard.  

> I ask because of the check Pasha proposes: if the user has started the  
> OpenFabrics kernel drivers, it's ok for OMPI to print warning messages  
> (this is better than the current: if libibverbs exists, it's ok for  
> OMPI to print warning messages).

Yes, that sounds fine.

Hth, Dirk 

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres

Dirk / Debian guys --

When you install binary OMPI (which pulls in libibverbs and all the  
rest), do you set the OpenFabrics kernel drivers to start upon boot?   
Or does the user have to do that manually?


I ask because of the check Pasha proposes: if the user has started the  
OpenFabrics kernel drivers, it's ok for OMPI to print warning messages  
(this is better than the current: if libibverbs exists, it's ok for  
OMPI to print warning messages).




On May 22, 2008, at 3:25 PM, Pavel Shamis (Pasha) wrote:





1. Driver doesn't support the HCA - If I remember correct , RH40  
by default doesn't support ConnectX hca . The device_list will be  
empty. It is very exotic case.

2. Driver version doesn't correspond with fw version
3. FW was broken
4. Driver was broken and failed to start - it is not very exotic  
case too. Some times user make some modification - upgrade/install/ 
etc.. and it brakes driver.


In such cases, the ibv_devinfo(1) and ibv_devices(1) commands  
would show the same error.

Yep these utilities will show the same error.

Cases 1-2-3 we may cover pretty simple. OPENIB driver creates "/ 
dev/infiniband" during his startup. So if  /dev/infiniband exists  
and  _get_device_list() is empty we may print warning.


Ok, that seems reasonable.


I don't know how we can cover case 4 :-(


If the user makes modifications to the driver and breaks it, I  
don't think we can be held responsible for that -- prudence  
declares that you should verify that your [self-modified] driver is  
not broken first before blaming Open MPI.  I'm not that concerned  
about #4; most of my customers do not modify the drivers.

Agree about #4.

The check for /dev/infiniband should be simple and I think we can  
add it to 1.3 .


Pasha.



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Patrick Geoffray

Brian W. Barrett wrote:
With MX, it's one initialization call (mx_init), and it's not clear from 
the errors it can return that you can differentiate between the two cases.


If you run mx_init() on a machine without the MX driver loaded or no NIC 
detected by the driver, you get a specific error code (MX_NO_DEV) and 
the default error handler print something like:


MX:asterix:mx_init:querying driver:error 5(errno=2):No MX device entry 
in /dev.


You can overload the default error handler to not see the message.

Patrick


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres

On May 22, 2008, at 11:53 AM, Pavel Shamis (Pasha) wrote:

1. Driver doesn't support the HCA - If I remember correct , RH40 by  
default doesn't support ConnectX hca . The device_list will be  
empty. It is very exotic case.

2. Driver version doesn't correspond with fw version
3. FW was broken
4. Driver was broken and failed to start - it is not very exotic  
case too. Some times user make some modification - upgrade/install/ 
etc.. and it brakes driver.


 In such cases, the ibv_devinfo(1) and ibv_devices(1) commands  
would show the same error.

Yep these utilities will show the same error.

Cases 1-2-3 we may cover pretty simple. OPENIB driver creates "/dev/ 
infiniband" during his startup. So if  /dev/infiniband exists and   
_get_device_list() is empty we may print warning.


Ok, that seems reasonable.


I don't know how we can cover case 4 :-(


If the user makes modifications to the driver and breaks it, I don't  
think we can be held responsible for that -- prudence declares that  
you should verify that your [self-modified] driver is not broken first  
before blaming Open MPI.  I'm not that concerned about #4; most of my  
customers do not modify the drivers.


BTW I think that problem is relevant for all BTLs and not only  
openib and may be we need look for some global solution.



Brian's solution was reasonable; perhaps just adding a flag to the  
existing no_nics function.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 06:53:38PM +0300, Pavel Shamis (Pasha) wrote:
> If user will decide to upgrade his ompi +  libibverb rpm/deb package 
> install , he will be need to do a lot of other "annoying" steps, like: 
> source code download, installing all required *-dev.rpm , compilation.

No -- Running "apt-get dist-upgrade", or any of the graphical or
console alternates will give you _already compiled_ and _already
configured_ packages.  That's a point of a distribution.

This whole discussion is about how to set up the default configuration
(with respect to IB warning which many folks do not have).

Also, step back for a second.  Many higher-level MPI-using solution
'hide' all this plumbing.  So someone advocation a Python-based or
R-based MPI solution will just tell people 'install the package'.
These user won't know the difference between Open MPI, LAM or MPICH
and wouldn't even know where to start.

That said, the task is indeed delicate at the Open MPI level as do of
course want genuine warnings for genuine failures.  It is encouraging
how hard everybody tries to help 'the right way' about this.

Hth, Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Brian W. Barrett

On Thu, 22 May 2008, Terry Dontje wrote:


The major difference here is that libmyriexpress is not being included
in mainline Linux distributions.  Specifically: if you can find/use
libmyriexpress, it's likely because you have that hardware.  The same
*used* to be true for libibverbs, but is no longer true because Linux
distros are now shipping (e.g., the Debian distribution pulls in
libibverbs when you install Open MPI).

Ok, but there are distributions that do include the myrinet BTL/MTL (ie 
CT).  Though I agree for the most part in the case of myrinet if you 
have libmyriexpress you probably will probably have an operable 
interface.  I guess I am curious how many other BTLs a distribution 
might end up delivering that could run into this reporting issue.  I 
guess my point is could this be worth something more general instead of 
a one off for IB?


From my point of view the btl_warn_unused_components coupled with "-mca 
btl ^mlfbtl" works for me.  However the fact that the IB 
vendors/community (ie CISCO) is solving this for their favorite 
interface makes me pause for a moment.


There's actually a second (in my mind more important) reason why this is 
IB only, as I shared similar concerns (hence yesterday's e-mail barage). 
InfiniBand has a two stage initialization -- you get the list of HCAs, 
then you initialize the HCA you want.  So it's possible to determine that 
there's no HCAs in the system vs. the system couldn't initialize the HCA 
properly (as that would happen in step 2, according to Jeff).


With MX, it's one initialization call (mx_init), and it's not clear from 
the errors it can return that you can differentiate between the two cases. 
I haven't tried it, but it's possible that mx_init would succeed in the no 
nic case, but then have a NIC count of 0.


Anyway, the short answer is that (in my opinion) we should have a btl base 
param similar to warn_unused for whether to warn when no NICs/HCAs are 
found, hopefully with a nice error function similar to today's no_nics 
(which probably needs to be renamed in that case).  That way, if BTL 
authors other than OpenIB want to do some extra work and return better 
error messages, they can.



FWIW, our distribution actually turns off
btl_base_want_component_unused
because it seemed
the majority of our cases would be that users would false positive
sights of the message.


Is the UDAPL library shipped in Solaris by default?  If so, then
you're likely in exactly the same kind of situation that I'm
describing.  The same will be true if Solaris ends up shipping
libibverbs by default.


Yes the UDAPL library is shipped in Solaris by default.  Which is why we
turn off
btl_warn_unused_components.  Yes, and I suspect once Solaris starts
delivering libibverbs
we (Sun) will need to figure out how to handle having both the udapl and
openib btls being
available.


There is some evil configure hackery that could be done to make this work 
in a more general way (don't you love it when I say that). 
Autogen/configure makes no guarantees about the order in which the 
configure.m4 macros for components in the same framework are run, other 
than all components of priority X are run before those of priority Y, iff 
X > Y.  So you could set the priority of all the components except udapl 
to (say) 10 and udapl's to 0.  Then have the udapl configure only build if 
1) it was specifically requested or 2) ompi_check_openib_happy = no.  No 
more Linux-specific stuff, works when Solaris gets OFED, and works on old 
Solaris that has uDAPL but not OFED.


As a matter of fact, it's so trivial to do that I'd recommend doing it for 
1.3.  Really, you could do it minimally by only changing OpenIB's 
configure.params to set its priority to 10, uDAPL's configure.params to 
set its priority to 0, and uDAPL's configure.m4 to remove the Linux stuff 
and look for ompi_check_openib_happy.



Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Terry Dontje

Jeff Squyres wrote:

On May 22, 2008, at 6:50 AM, Terry Dontje wrote:

  

Brian and I chatted a bit about this off-list, and I think we're in
agreement now:

- do not change the default value or meaning of
btl_base_want_component_unsed.

- major point of confusion: the openib BTL is actually fairly unique
in that it can (and does) tell the difference between "there are no
devices present" and "there are devices, but something went wrong".
Other BTL's have network interfaces that can't tell the difference  
and

can *only* call the no_nics function, regardless of whether there are
no relevant network interfaces or some error occurred during
initialization.

- so a reasonable solution would be an openib-BTL-specific mechanism
that doesn't call the no_nics function (to display that
btl_base_want_component_unused) if there are no verbs-capable devices
found because of the fact that mainline Linuxes are starting to ship
libibverbs.  Specific mechanism TBD; likely to be an openib MCA  
param.


  

So, if you are delivering something similar to a BTL for myrinet you
will see the message but
the belief is this is necessary since there isn't enough granularity  
in

the error reporting of the
device to feel comfortable enough as to whether the user want the  
device

to be used?



The major difference here is that libmyriexpress is not being included  
in mainline Linux distributions.  Specifically: if you can find/use  
libmyriexpress, it's likely because you have that hardware.  The same  
*used* to be true for libibverbs, but is no longer true because Linux  
distros are now shipping (e.g., the Debian distribution pulls in  
libibverbs when you install Open MPI).


  
Ok, but there are distributions that do include the myrinet BTL/MTL (ie 
CT).  Though I agree
for the most part in the case of myrinet if you have libmyriexpress you 
probably will probably have
an operable interface.  I guess I am curious how many other BTLs a 
distribution might end up
delivering that could run into this reporting issue.  I guess my point 
is could this be worth something
more general instead of a one off for IB? 

From my point of view the btl_warn_unused_components coupled with "-mca 
btl ^mlfbtl" works for
me.  However the fact that the IB vendors/community (ie CISCO) is 
solving this for their favorite interface

makes me pause for a moment.

Won't udapl have a similar issue here or does it not get built by
default when OFED is built?



We decided that under Linux, the udapl BTL does not get built by  
default (even if it could) because then an "mpirun a.out" by default  
would use both UDAPL and verbs, which is undesirable for several  
reasons.  There's Linux-specific logic to this effect in config/ 
ompi_check_udapl.m4.


  

Ok, that makes sense.
FWIW, our distribution actually turns off  
btl_base_want_component_unused

because it seemed
the majority of our cases would be that users would false positive
sights of the message.



Is the UDAPL library shipped in Solaris by default?  If so, then  
you're likely in exactly the same kind of situation that I'm  
describing.  The same will be true if Solaris ends up shipping  
libibverbs by default.


  
Yes the UDAPL library is shipped in Solaris by default.  Which is why we 
turn off
btl_warn_unused_components.  Yes, and I suspect once Solaris starts 
delivering libibverbs
we (Sun) will need to figure out how to handle having both the udapl and 
openib btls being

available.


--td




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres

On May 22, 2008, at 6:50 AM, Terry Dontje wrote:


Brian and I chatted a bit about this off-list, and I think we're in
agreement now:

- do not change the default value or meaning of
btl_base_want_component_unsed.

- major point of confusion: the openib BTL is actually fairly unique
in that it can (and does) tell the difference between "there are no
devices present" and "there are devices, but something went wrong".
Other BTL's have network interfaces that can't tell the difference  
and

can *only* call the no_nics function, regardless of whether there are
no relevant network interfaces or some error occurred during
initialization.

- so a reasonable solution would be an openib-BTL-specific mechanism
that doesn't call the no_nics function (to display that
btl_base_want_component_unused) if there are no verbs-capable devices
found because of the fact that mainline Linuxes are starting to ship
libibverbs.  Specific mechanism TBD; likely to be an openib MCA  
param.



So, if you are delivering something similar to a BTL for myrinet you
will see the message but
the belief is this is necessary since there isn't enough granularity  
in

the error reporting of the
device to feel comfortable enough as to whether the user want the  
device

to be used?


The major difference here is that libmyriexpress is not being included  
in mainline Linux distributions.  Specifically: if you can find/use  
libmyriexpress, it's likely because you have that hardware.  The same  
*used* to be true for libibverbs, but is no longer true because Linux  
distros are now shipping (e.g., the Debian distribution pulls in  
libibverbs when you install Open MPI).



Won't udapl have a similar issue here or does it not get built by
default when OFED is built?


We decided that under Linux, the udapl BTL does not get built by  
default (even if it could) because then an "mpirun a.out" by default  
would use both UDAPL and verbs, which is undesirable for several  
reasons.  There's Linux-specific logic to this effect in config/ 
ompi_check_udapl.m4.


FWIW, our distribution actually turns off  
btl_base_want_component_unused

because it seemed
the majority of our cases would be that users would false positive
sights of the message.


Is the UDAPL library shipped in Solaris by default?  If so, then  
you're likely in exactly the same kind of situation that I'm  
describing.  The same will be true if Solaris ends up shipping  
libibverbs by default.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Terry Dontje

Jeff Squyres wrote:
Brian and I chatted a bit about this off-list, and I think we're in  
agreement now:


- do not change the default value or meaning of  
btl_base_want_component_unsed.


- major point of confusion: the openib BTL is actually fairly unique  
in that it can (and does) tell the difference between "there are no  
devices present" and "there are devices, but something went wrong".   
Other BTL's have network interfaces that can't tell the difference and  
can *only* call the no_nics function, regardless of whether there are  
no relevant network interfaces or some error occurred during  
initialization.


- so a reasonable solution would be an openib-BTL-specific mechanism  
that doesn't call the no_nics function (to display that  
btl_base_want_component_unused) if there are no verbs-capable devices  
found because of the fact that mainline Linuxes are starting to ship  
libibverbs.  Specific mechanism TBD; likely to be an openib MCA param.


  
So, if you are delivering something similar to a BTL for myrinet you 
will see the message but
the belief is this is necessary since there isn't enough granularity in 
the error reporting of the
device to feel comfortable enough as to whether the user want the device 
to be used?


Won't udapl have a similar issue here or does it not get built by 
default when OFED is built?


FWIW, our distribution actually turns off btl_base_want_component_unused 
because it seemed
the majority of our cases would be that users would false positive 
sights of the message.


--td


On May 21, 2008, at 9:56 PM, Jeff Squyres wrote:

  

On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:



If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a
simple
delivered from the white box vendor IB setup that barely works on a
good
day (and unfortunately, there seems to be evidence that these exist).
  

I'm not sure I understand what you're saying -- you agree, but what
"new" do you think we need in the openib BTL?  The MCA params saying
which ports you expect to be ACTIVE?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)


Brian and I chatted a bit about this off-list, and I think we're in  
agreement now:


- do not change the default value or meaning of  
btl_base_want_component_unsed.


- major point of confusion: the openib BTL is actually fairly unique  
in that it can (and does) tell the difference between "there are no  
devices present" and "there are devices, but something went wrong".   
Other BTL's have network interfaces that can't tell the difference and  
can *only* call the no_nics function, regardless of whether there are  
no relevant network interfaces or some error occurred during  
initialization.


- so a reasonable solution would be an openib-BTL-specific mechanism  
that doesn't call the no_nics function (to display that  
btl_base_want_component_unused) if there are no verbs-capable devices  
found because of the fact that mainline Linuxes are starting to ship  
libibverbs.  Specific mechanism TBD; likely to be an openib MCA param.
  
Ok, we will have own warning mechanism. But we still open question, Will 
we show (by default) error message in case

when libibverbs exists but it is no hca in the hca_list ?
I think we should show the error. The problem of libibverbs default 
install is relevant only  for
binary distribution, that install all ompi dependences with ompi 
package. In this case
distribution will have openib mca parameter that will allow to disable 
by default the warning message

during ompi package install (or build).
I guess that most people still install ompi from sources. And in this 
case it sound reasonable for me

to print this "no hca"  warning it openib btl was build.

Pasha



On May 21, 2008, at 9:56 PM, Jeff Squyres wrote:

  

On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:



If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a
simple
delivered from the white box vendor IB setup that barely works on a
good
day (and unfortunately, there seems to be evidence that these exist).
  

I'm not sure I understand what you're saying -- you agree, but what
"new" do you think we need in the openib BTL?  The MCA params saying
which ports you expect to be ACTIVE?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
Brian and I chatted a bit about this off-list, and I think we're in  
agreement now:


- do not change the default value or meaning of  
btl_base_want_component_unsed.


- major point of confusion: the openib BTL is actually fairly unique  
in that it can (and does) tell the difference between "there are no  
devices present" and "there are devices, but something went wrong".   
Other BTL's have network interfaces that can't tell the difference and  
can *only* call the no_nics function, regardless of whether there are  
no relevant network interfaces or some error occurred during  
initialization.


- so a reasonable solution would be an openib-BTL-specific mechanism  
that doesn't call the no_nics function (to display that  
btl_base_want_component_unused) if there are no verbs-capable devices  
found because of the fact that mainline Linuxes are starting to ship  
libibverbs.  Specific mechanism TBD; likely to be an openib MCA param.



On May 21, 2008, at 9:56 PM, Jeff Squyres wrote:


On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:


If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a
simple
delivered from the white box vendor IB setup that barely works on a
good
day (and unfortunately, there seems to be evidence that these exist).



I'm not sure I understand what you're saying -- you agree, but what
"new" do you think we need in the openib BTL?  The MCA params saying
which ports you expect to be ACTIVE?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:

If this is true (for some reason I thought it wasn't), then I think  
we'd
actually be ok with your proposal, but you're right, you'd need  
something

new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a  
simple
delivered from the white box vendor IB setup that barely works on a  
good

day (and unfortunately, there seems to be evidence that these exist).



I'm not sure I understand what you're saying -- you agree, but what  
"new" do you think we need in the openib BTL?  The MCA params saying  
which ports you expect to be ACTIVE?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


I'm only concerned about the case where there's an IB card, the user
expects the IB card to be used, and the IB card isn't used.


Can you put in a site wide

btl = ^tcp

to avoid the problem?  If the IB card fails, then you'll get
unreachable MPI errors.


And how many users are going to figure that one out before complaining 
loudly?  That's what LANL did (probably still does) and it worked great 
there, but that doesn't mean that others will figure that out (after all, 
not everyone has an OMPI developer on staff...).



If the
changes don't silence a warning in that situation, I'm fine with
whatever
you do.  But does ibv_get_device_list return an HCA when the port is
down
(because the SM failed and the machine rebooted since that time)?


Yes.


If this is true (for some reason I thought it wasn't), then I think we'd 
actually be ok with your proposal, but you're right, you'd need something 
new in the IB btl.  I'm not concerned about the dual rail issue -- if 
you're smart enough to configure dual rail IB, you're smart enough to 
figure out OMPI mca params.  I'm not sure the same is true for a simple 
delivered from the white box vendor IB setup that barely works on a good 
day (and unfortunately, there seems to be evidence that these exist).



Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 12:21 PM, Terry Dontje wrote:


So are you proposing to set btl_base_warn_component_unused to 0 or
something more BTL specific?


Probably something more btl-specific.  The libibverbs-being-shipped-in- 
main-line-Linux-distro's issue doesn't really affect other BTLs, so I  
don't think it's appropriate to make this a global-to-all-of-OMPI issue.


But the question of when exactly the openib BTL should call the  
function that displays that message is up in the air.  Brian and I  
seem to disagree.  :-)


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:


It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.



Thinking about this a bit more -- I think it depends on what kind of
errors you are worried about seeing.  IBV does separate the discovery
of devices (ibv_get_device_list) from trying to open a device
(ibv_open_device).  So hypothetically, we *can* distinguish between
these kinds of errors already.

Do you see devices that are so broken that they don't show up in the
list returned from ibv_get_device_list?

FWIW: the *only* case I'm talking about changing the default for is
when ibv_get_device_list returns an empty list (meaning that according
to the verbs stack, there are no devices in the host).  I think that
we should *always* warn for any kinds of errors that occur after that
(e.g., we find a device but can't open it, we find one or more devices
but no active ports, etc.).


Previously, there has not been such a distinction, so I really have no 
idea which caused the openib BTL throw its error (and never really cared, 
as it was always somebody else's problem at that point).


I'm only concerned about the case where there's an IB card, the user 
expects the IB card to be used, and the IB card isn't used.  If the 
changes don't silence a warning in that situation, I'm fine with whatever 
you do.  But does ibv_get_device_list return an HCA when the port is down 
(because the SM failed and the machine rebooted since that time)?  If not, 
we still ahve a (fairly common, unfortunately) error case that we need to 
report (in my opinion).



Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
Then we disagree on a core point.  I believe that users should never have 
something silently unexpected happen (like falling back to TCP from a high 
speed interconnect because of a NIC reset / software issue).  YOu clearly 
don't feel this way.  I don't really work on the project, but do have lots 
of experience being yelled at by users when something unexpected happens.


I guarantee you we'll see a report of poor IB / application performance 
because of the silent fallback to TCP.  There's a reason that error 
message was put in.  I don't get a vote anymore, so do whatever you think 
is best.


Brian


On Wed, 21 May 2008, Jeff Squyres wrote:


One thing I should clarify -- the ibverbs error message from my
previous mail is a red herring.  libibverbs prints that message on
systems where the kernel portions of the OFED stack are not installed
(such as the quick-n-dirty test that I did before -- all I did was
install libibverbs without the corresponding kernel stuff).  I
installed the whole OFED stack on a machine with no verbs-capable
hardware and verified that the libibverbs message does *not* appear
when the kernel bits are properly installed and running.

So we're only talking about the Open MPI warning message here.  More
below.



On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:


2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.


Which is easily solved with a better error message, as Pasha
suggested.


I guess this is where we disagree: I don't believe that the issue is
solved by making a "better" message.  Specifically: this is the first
case where we're saying "if you run with a valid configuration, you're
going to get a warning message and you have to do something extra to
turn it off."

That just seems darn weird to me, especially when other MPI's don't do
the same thing.  Come to think of it, I can't think of many other
software packages that do that.


In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.


But here's the real problem -- with our current selection logic, a
user
with libibverbs but no IB cards gets an error message saying "hey,
we need
you to set this flag to make this error go away" (or would, per
Pasha's
suggestion).  A user with a busted IB stack on a node (which we
still saw
pretty often at LANL) starts using TCP and their application runs
like a
dog.

I guess it's a matter of how often you see errors in the IB stack that
cause nic initialization to fail.  The machines I tend to use still
exhibit this problem pretty often, but it's possible I just work on
bad
hardware more often than is usual in the wild.


I guess this is the central issue: what *is* the common case?  Which
set of users should be forced to do something different?

I'm claiming that now that the Linux distros are shipping libibverbs,
the number of users who have the openib BTL installed but do not have
verbs-capable hardware will be *much* larger than those with verbs-
capable hardware.  Hence, I think the pain point should be for the
smaller group (those with verbs-capable hardware): set an MCA param if
you want to see the warning message.

(we can debate the default value for the BTL-wide base param later --
let's first just debate the *concept* as specific to the openib BTL)


It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.


Yes, this capability in libiverbs would be good.  Parsing the PCI
tables doesn't sound like our role.

I'll ask the libibverbs authors about it...


I guess the root of my concern is that unexpected behavior with no
explanation is (in my mind) the most dangerous case and the one we
should
address by default.  And turning this error message off is going to
cause
unexpected behavior without explanation.



But more information is available, and subject to normal
troubleshooting techniques.  And if you're in an environment where you
*do* want to use verbs-capable hardware, then setting the MCA param
seems perfectly acceptable to me.  IIRC, LANL sets a whole pile of MCA
params in the top-level openmpi-mca-params.conf file that are specific
to their environment (right?).  If that's true, what's one more param?

Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca-
params.cof by default (which is what most verbs-capable-hardware-users
utilize).  That would solve the issue for 98% of the IB/iWARP users
out there.  Those who compile from source would need to do it manually.

I agree that this is less than perfect.  My main point is that I
really don't like the idea 

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
One thing I should clarify -- the ibverbs error message from my  
previous mail is a red herring.  libibverbs prints that message on  
systems where the kernel portions of the OFED stack are not installed  
(such as the quick-n-dirty test that I did before -- all I did was  
install libibverbs without the corresponding kernel stuff).  I  
installed the whole OFED stack on a machine with no verbs-capable  
hardware and verified that the libibverbs message does *not* appear  
when the kernel bits are properly installed and running.


So we're only talking about the Open MPI warning message here.  More  
below.




On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:


2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.


Which is easily solved with a better error message, as Pasha  
suggested.


I guess this is where we disagree: I don't believe that the issue is  
solved by making a "better" message.  Specifically: this is the first  
case where we're saying "if you run with a valid configuration, you're  
going to get a warning message and you have to do something extra to  
turn it off."


That just seems darn weird to me, especially when other MPI's don't do  
the same thing.  Come to think of it, I can't think of many other  
software packages that do that.



In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.


But here's the real problem -- with our current selection logic, a  
user
with libibverbs but no IB cards gets an error message saying "hey,  
we need
you to set this flag to make this error go away" (or would, per  
Pasha's
suggestion).  A user with a busted IB stack on a node (which we  
still saw
pretty often at LANL) starts using TCP and their application runs  
like a

dog.

I guess it's a matter of how often you see errors in the IB stack that
cause nic initialization to fail.  The machines I tend to use still
exhibit this problem pretty often, but it's possible I just work on  
bad

hardware more often than is usual in the wild.


I guess this is the central issue: what *is* the common case?  Which  
set of users should be forced to do something different?


I'm claiming that now that the Linux distros are shipping libibverbs,  
the number of users who have the openib BTL installed but do not have  
verbs-capable hardware will be *much* larger than those with verbs- 
capable hardware.  Hence, I think the pain point should be for the  
smaller group (those with verbs-capable hardware): set an MCA param if  
you want to see the warning message.


(we can debate the default value for the BTL-wide base param later --  
let's first just debate the *concept* as specific to the openib BTL)


It would be great if libibverbs could return two different error  
messages
- one for "there's no IB card in this machine" and one for "there's  
an IB

card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by  
parsing

the PCI tables, but that sounds ... painful.


Yes, this capability in libiverbs would be good.  Parsing the PCI  
tables doesn't sound like our role.


I'll ask the libibverbs authors about it...


I guess the root of my concern is that unexpected behavior with no
explanation is (in my mind) the most dangerous case and the one we  
should
address by default.  And turning this error message off is going to  
cause

unexpected behavior without explanation.



But more information is available, and subject to normal  
troubleshooting techniques.  And if you're in an environment where you  
*do* want to use verbs-capable hardware, then setting the MCA param  
seems perfectly acceptable to me.  IIRC, LANL sets a whole pile of MCA  
params in the top-level openmpi-mca-params.conf file that are specific  
to their environment (right?).  If that's true, what's one more param?


Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca- 
params.cof by default (which is what most verbs-capable-hardware-users  
utilize).  That would solve the issue for 98% of the IB/iWARP users  
out there.  Those who compile from source would need to do it manually.


I agree that this is less than perfect.  My main point is that I  
really don't like the idea of "mpirun a.out" will result in warning  
messages for perfectly valid configurations.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
As I know only Openib kernel drivers is installed by default with 
distribution.
But the user level - libibverbs and other openib stuff is not installed 
by default. User need go to the package manager and explicitly
select libibverb.  So if user decided to install libibverbs he had 
reasons for it, and I think it will be ok to show this warning.


Pasha.

Jeff Squyres wrote:

On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

  
I think having a parameter to turn off the warning is a great idea.   
So
great in fact, that it already exists in the trunk and v1.2 :)!   
Setting
the default value for the btl_base_warn_component_unused flag from 0  
to 1

will have the desired effect.



Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error  
messages (on a machine where I installed libibverbs, but with no  
OpenFabrics hardware, with OMPI 1.2.6):


% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

So the MCA param takes care of the OMPI message; I'll contact the  
libibverbs authors about their message.


  
I'm not sure I agree with setting the default to 0, however.  The  
warning
has proven extremely useful for diagnosing that IB (or less often GM  
or
MX) isn't properly configured on a compute node due to some random  
error.

It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install  
phase of

the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.



I guess that this is what I am torn about.  Yes, it's a useful message  
-- in some cases.  But now that libibverbs is shipping in Debain and  
other Linuxes, the number of machines out there with verbs-capable  
hardware is far, far smaller than the number of machines without verbs- 
capable hardware.  Specifically:


1. The number of cases where seeing the message by default is *not*  
useful is now potentially [much] larger than the number of cases where  
the default message is useful.


2. An out-of-the-box "mpirun a.out" will print warning messages in  
perfectly valid/good configurations (no verbs-capable hardware, but  
just happen to have libibverbs installed).  This is a Big Deal.


3. Problems with HCA hardware and/or verbs stack are uncommon  
(nowadays).  I'd be ok asking someone to enable a debug flag to get  
more information on configuration problems or hardware faults.


Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with  
libibverbs installed must also have verbs-capable hardware.


  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Terry Dontje
So are you proposing to set btl_base_warn_component_unused to 0 or 
something more BTL specific?


--td
Jeff Squyres wrote:
What: Change default in openib BTL to not complain if no OpenFabrics  
devices are found


Why: Many linuxes are shipping libibverbs these days, but most users  
still don't have OpenFabrics hardware


Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.   
OMPI will therefore build the openib BTL by default, but then  
complains at run time when there's no OpenFabrics hardware.


We should change the default in v1.3 to not complain if there is no  
OpenFabrics devices found (perhaps have an MCA param to enable the  
warning if desired).


Longer version
==

I just got a request from the Debian Open MPI package maintainers to  
include the following in the default openmpi-mca-params.conf for the  
OMPI v1.2 package:


# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy  
documentation path for users to shut up these warnings when they build  
on machines with libibverbs present but no OpenFabrics hardware.


I think that this is fine for the v1.2 series (and will file a CMR for  
it).  But for v1.3, I think we should change the default.


The vast majority of users will not have OpenFabrics devices, and we  
should therefore not complain if we can't find any at run-time.  We  
can/should still complain if we find OpenFabrics devices but no active  
ports (i.e., don't change this behavior).


But for optimizing the common case: I think we should (by default) not  
print a warning if no OpenFabrics devices are found.  We can also  
[easily] have an MCA parameter that *will* display a warning if no  
OpenFabrics devices are found.


  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.


Which is easily solved with a better error message, as Pasha suggested.


3. Problems with HCA hardware and/or verbs stack are uncommon
(nowadays).  I'd be ok asking someone to enable a debug flag to get
more information on configuration problems or hardware faults.

Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.


But here's the real problem -- with our current selection logic, a user 
with libibverbs but no IB cards gets an error message saying "hey, we need 
you to set this flag to make this error go away" (or would, per Pasha's 
suggestion).  A user with a busted IB stack on a node (which we still saw 
pretty often at LANL) starts using TCP and their application runs like a 
dog.


I guess it's a matter of how often you see errors in the IB stack that 
cause nic initialization to fail.  The machines I tend to use still 
exhibit this problem pretty often, but it's possible I just work on bad 
hardware more often than is usual in the wild.


It would be great if libibverbs could return two different error messages 
- one for "there's no IB card in this machine" and one for "there's an IB 
card here, but we can't initialize it".  I think that would make this 
argument go away.  Open MPI could probably mimic that behavior by parsing 
the PCI tables, but that sounds ... painful.


I guess the root of my concern is that unexpected behavior with no 
explanation is (in my mind) the most dangerous case and the one we should 
address by default.  And turning this error message off is going to cause 
unexpected behavior without explanation.


Just my $0.02.


Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

I think having a parameter to turn off the warning is a great idea.   
So
great in fact, that it already exists in the trunk and v1.2 :)!   
Setting
the default value for the btl_base_warn_component_unused flag from 0  
to 1

will have the desired effect.


Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error  
messages (on a machine where I installed libibverbs, but with no  
OpenFabrics hardware, with OMPI 1.2.6):


% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

So the MCA param takes care of the OMPI message; I'll contact the  
libibverbs authors about their message.


I'm not sure I agree with setting the default to 0, however.  The  
warning
has proven extremely useful for diagnosing that IB (or less often GM  
or
MX) isn't properly configured on a compute node due to some random  
error.

It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install  
phase of

the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.


I guess that this is what I am torn about.  Yes, it's a useful message  
-- in some cases.  But now that libibverbs is shipping in Debain and  
other Linuxes, the number of machines out there with verbs-capable  
hardware is far, far smaller than the number of machines without verbs- 
capable hardware.  Specifically:


1. The number of cases where seeing the message by default is *not*  
useful is now potentially [much] larger than the number of cases where  
the default message is useful.


2. An out-of-the-box "mpirun a.out" will print warning messages in  
perfectly valid/good configurations (no verbs-capable hardware, but  
just happen to have libibverbs installed).  This is a Big Deal.


3. Problems with HCA hardware and/or verbs stack are uncommon  
(nowadays).  I'd be ok asking someone to enable a debug flag to get  
more information on configuration problems or hardware faults.


Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with  
libibverbs installed must also have verbs-capable hardware.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
And there's a typo in my first paragraph.  The flag currently defaults to 
1 (print the warning).  It should be switched to 0 to turn off the 
warning.  Sorry for any confusion I might have caused -- I blame the lack 
of caffeine in the morning.


Brian

On Wed, 21 May 2008, Pavel Shamis (Pasha) wrote:


I'm agree with Brian. We may add to the warning message detailed
description how to disable it.

Pasha

Brian W. Barrett wrote:

I think having a parameter to turn off the warning is a great idea.  So
great in fact, that it already exists in the trunk and v1.2 :)!  Setting
the default value for the btl_base_warn_component_unused flag from 0 to 1
will have the desired effect.

I'm not sure I agree with setting the default to 0, however.  The warning
has proven extremely useful for diagnosing that IB (or less often GM or
MX) isn't properly configured on a compute node due to some random error.
It's trivially easy for any packaging group to have the line

   btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install phase of
the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.


Brian


On Wed, 21 May 2008, Jeff Squyres wrote:



What: Change default in openib BTL to not complain if no OpenFabrics
devices are found

Why: Many linuxes are shipping libibverbs these days, but most users
still don't have OpenFabrics hardware

Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.
OMPI will therefore build the openib BTL by default, but then
complains at run time when there's no OpenFabrics hardware.

We should change the default in v1.3 to not complain if there is no
OpenFabrics devices found (perhaps have an MCA param to enable the
warning if desired).

Longer version
==

I just got a request from the Debian Open MPI package maintainers to
include the following in the default openmpi-mca-params.conf for the
OMPI v1.2 package:

# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy
documentation path for users to shut up these warnings when they build
on machines with libibverbs present but no OpenFabrics hardware.

I think that this is fine for the v1.2 series (and will file a CMR for
it).  But for v1.3, I think we should change the default.

The vast majority of users will not have OpenFabrics devices, and we
should therefore not complain if we can't find any at run-time.  We
can/should still complain if we find OpenFabrics devices but no active
ports (i.e., don't change this behavior).

But for optimizing the common case: I think we should (by default) not
print a warning if no OpenFabrics devices are found.  We can also
[easily] have an MCA parameter that *will* display a warning if no
OpenFabrics devices are found.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel