Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Roland Dreier
> Is it possible that /sys/class/infiniband directory exist and it is > empty ? In which cases ? Do "modprobe ib_core" on a system with no hardware drivers loaded (or no RDMA hardware installed)

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Jeff Squyres
On May 29, 2008, at 3:27 AM, Pavel Shamis (Pasha) wrote: I got some more feedback from Roland off-list explaining that if / sys/ class/infiniband does exist and is non-empty and /sys/class/ infiniband_verbs/abi_version does not exist, then this is definitely a case where we want to warn

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Pavel Shamis (Pasha)
I got some more feedback from Roland off-list explaining that if /sys/ class/infiniband does exist and is non-empty and /sys/class/ infiniband_verbs/abi_version does not exist, then this is definitely a case where we want to warn because it implies that config is screwed up -- RDMA devices

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-28 Thread Jeff Squyres
On May 28, 2008, at 8:02 AM, Jeff Squyres wrote: Note that the two /sys checks may be redundant; I'm not entirely sure how the two files relate to each other. libibverbs will complain about the first if it is not present; the second is used to indicate that the kernel drivers are loaded. I

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-28 Thread Jeff Squyres
Ok. With lots more off-list discussion, how's this pseudocode for a proposal: # Main assumption: if the kernel drivers are loaded, the user wants RDMA # hardware support in OMPI. $sysfsdir = ibv_get_sysfs_path(); # Avoid printing "Fatal: couldn't read uverbs ABI version" message.

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-24 Thread Manuel Prinz
Am Donnerstag, den 22.05.2008, 17:18 -0400 schrieb Jeff Squyres: > Could you check with some of your other Debian maintainers? I'm sorry that I can't check that before Monday! I'll let you know then but I'm not aware of that. Never thought about the impact of my initial request to Jeff. I

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Roland Dreier
> Either that or udev in not configured properly. Debian has a correct udev configuration, modulo http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=449081 > ib_core/mthca/mlx4 should be loaded automatically by hotplug if HW is > present. No need for any additional configuration. Yes

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Roland Dreier
> OFED is one distribution of the OpenFabrics software. It can be > bundled up and packaged differently, too. I suspect that Debian does > not include OFED directly, because OFED is pretty heavily dependent > upon RPM. So the OpenFabrics kernel bits must be there somewhere >

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Dirk Eddelbuettel
On Fri, May 23, 2008 at 09:56:44AM +0300, Gleb Natapov wrote: > On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote: > > > > Also, if this test depends on the Debian kernel packages, then we're > > > > back to square one as some folks (like myself) run binary kernels, > > > > other

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Gleb Natapov
On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote: > > > Also, if this test depends on the Debian kernel packages, then we're > > > back to square one as some folks (like myself) run binary kernels, > > > other may just hand-compile and this test may not work as we may miss > > >

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Gleb Natapov
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote: > On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote: > > > Is there a test I could run for you? > > Can you see if /dev/infiniband exists? If it does, the OpenFabrics > kernel drivers are running. If not, they aren't. Either

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres
On May 22, 2008, at 4:30 PM, Dirk Eddelbuettel wrote: Can you see if /dev/infiniband exists? If it does, the OpenFabrics kernel drivers are running. If not, they aren't. Negative -- I have no /dev/infiniband. So his test idea seems feasible which is nice! Good! Do you have any

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote: > On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote: > > > Is there a test I could run for you? > > Can you see if /dev/infiniband exists? If it does, the OpenFabrics > kernel drivers are running. If not, they aren't. Negative

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres
On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote: Is there a test I could run for you? Can you see if /dev/infiniband exists? If it does, the OpenFabrics kernel drivers are running. If not, they aren't. Also, if this test depends on the Debian kernel packages, then we're back to

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 03:45:36PM -0400, Jeff Squyres wrote: > On May 22, 2008, at 3:42 PM, Dirk Eddelbuettel wrote: > > >> When you install binary OMPI (which pulls in libibverbs and all the > >> rest), do you set the OpenFabrics kernel drivers to start upon boot? > >> Or does the user have to

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres
On May 22, 2008, at 3:42 PM, Dirk Eddelbuettel wrote: When you install binary OMPI (which pulls in libibverbs and all the rest), do you set the OpenFabrics kernel drivers to start upon boot? Or does the user have to do that manually? I think so. To the best of my knowledge, we don't do

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 03:35:03PM -0400, Jeff Squyres wrote: > Dirk / Debian guys -- > > When you install binary OMPI (which pulls in libibverbs and all the > rest), do you set the OpenFabrics kernel drivers to start upon boot? > Or does the user have to do that manually? I think so. To

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres
Dirk / Debian guys -- When you install binary OMPI (which pulls in libibverbs and all the rest), do you set the OpenFabrics kernel drivers to start upon boot? Or does the user have to do that manually? I ask because of the check Pasha proposes: if the user has started the OpenFabrics

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Patrick Geoffray
Brian W. Barrett wrote: With MX, it's one initialization call (mx_init), and it's not clear from the errors it can return that you can differentiate between the two cases. If you run mx_init() on a machine without the MX driver loaded or no NIC detected by the driver, you get a specific error

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres
On May 22, 2008, at 11:53 AM, Pavel Shamis (Pasha) wrote: 1. Driver doesn't support the HCA - If I remember correct , RH40 by default doesn't support ConnectX hca . The device_list will be empty. It is very exotic case. 2. Driver version doesn't correspond with fw version 3. FW was broken

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Dirk Eddelbuettel
On Thu, May 22, 2008 at 06:53:38PM +0300, Pavel Shamis (Pasha) wrote: > If user will decide to upgrade his ompi + libibverb rpm/deb package > install , he will be need to do a lot of other "annoying" steps, like: > source code download, installing all required *-dev.rpm , compilation. No --

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Brian W. Barrett
On Thu, 22 May 2008, Terry Dontje wrote: The major difference here is that libmyriexpress is not being included in mainline Linux distributions. Specifically: if you can find/use libmyriexpress, it's likely because you have that hardware. The same *used* to be true for libibverbs, but is no

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Terry Dontje
Jeff Squyres wrote: On May 22, 2008, at 6:50 AM, Terry Dontje wrote: Brian and I chatted a bit about this off-list, and I think we're in agreement now: - do not change the default value or meaning of btl_base_want_component_unsed. - major point of confusion: the openib BTL is actually

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Jeff Squyres
On May 22, 2008, at 6:50 AM, Terry Dontje wrote: Brian and I chatted a bit about this off-list, and I think we're in agreement now: - do not change the default value or meaning of btl_base_want_component_unsed. - major point of confusion: the openib BTL is actually fairly unique in that it

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Terry Dontje
Jeff Squyres wrote: Brian and I chatted a bit about this off-list, and I think we're in agreement now: - do not change the default value or meaning of btl_base_want_component_unsed. - major point of confusion: the openib BTL is actually fairly unique in that it can (and does) tell the

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)
Brian and I chatted a bit about this off-list, and I think we're in agreement now: - do not change the default value or meaning of btl_base_want_component_unsed. - major point of confusion: the openib BTL is actually fairly unique in that it can (and does) tell the difference between

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
Brian and I chatted a bit about this off-list, and I think we're in agreement now: - do not change the default value or meaning of btl_base_want_component_unsed. - major point of confusion: the openib BTL is actually fairly unique in that it can (and does) tell the difference between

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote: If this is true (for some reason I thought it wasn't), then I think we'd actually be ok with your proposal, but you're right, you'd need something new in the IB btl. I'm not concerned about the dual rail issue -- if you're smart enough

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: I'm only concerned about the case where there's an IB card, the user expects the IB card to be used, and the IB card isn't used. Can you put in a site wide btl = ^tcp to avoid the problem? If the IB card fails, then you'll get unreachable MPI

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
On May 21, 2008, at 12:21 PM, Terry Dontje wrote: So are you proposing to set btl_base_warn_component_unused to 0 or something more BTL specific? Probably something more btl-specific. The libibverbs-being-shipped-in- main-line-Linux-distro's issue doesn't really affect other BTLs, so I

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: On May 21, 2008, at 3:38 PM, Jeff Squyres wrote: It would be great if libibverbs could return two different error messages - one for "there's no IB card in this machine" and one for "there's an IB card here, but we can't initialize it". I think that

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
Then we disagree on a core point. I believe that users should never have something silently unexpected happen (like falling back to TCP from a high speed interconnect because of a NIC reset / software issue). YOu clearly don't feel this way. I don't really work on the project, but do have

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
One thing I should clarify -- the ibverbs error message from my previous mail is a red herring. libibverbs prints that message on systems where the kernel portions of the OFED stack are not installed (such as the quick-n-dirty test that I did before -- all I did was install libibverbs

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
As I know only Openib kernel drivers is installed by default with distribution. But the user level - libibverbs and other openib stuff is not installed by default. User need go to the package manager and explicitly select libibverb. So if user decided to install libibverbs he had reasons for

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Terry Dontje
So are you proposing to set btl_base_warn_component_unused to 0 or something more BTL specific? --td Jeff Squyres wrote: What: Change default in openib BTL to not complain if no OpenFabrics devices are found Why: Many linuxes are shipping libibverbs these days, but most users still don't

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: 2. An out-of-the-box "mpirun a.out" will print warning messages in perfectly valid/good configurations (no verbs-capable hardware, but just happen to have libibverbs installed). This is a Big Deal. Which is easily solved with a better error message,

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote: I think having a parameter to turn off the warning is a great idea. So great in fact, that it already exists in the trunk and v1.2 :)! Setting the default value for the btl_base_warn_component_unused flag from 0 to 1 will have the

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
And there's a typo in my first paragraph. The flag currently defaults to 1 (print the warning). It should be switched to 0 to turn off the warning. Sorry for any confusion I might have caused -- I blame the lack of caffeine in the morning. Brian On Wed, 21 May 2008, Pavel Shamis (Pasha)