> Is it possible that /sys/class/infiniband directory exist and it is
> empty ? In which cases ?
Do "modprobe ib_core" on a system with no hardware drivers loaded (or no
RDMA hardware installed)
On May 29, 2008, at 3:27 AM, Pavel Shamis (Pasha) wrote:
I got some more feedback from Roland off-list explaining that if /
sys/
class/infiniband does exist and is non-empty and /sys/class/
infiniband_verbs/abi_version does not exist, then this is
definitely a
case where we want to warn
I got some more feedback from Roland off-list explaining that if /sys/
class/infiniband does exist and is non-empty and /sys/class/
infiniband_verbs/abi_version does not exist, then this is definitely a
case where we want to warn because it implies that config is screwed
up -- RDMA devices
On May 28, 2008, at 8:02 AM, Jeff Squyres wrote:
Note that the two /sys checks may be redundant; I'm not entirely sure
how the two files relate to each other. libibverbs will complain
about the first if it is not present; the second is used to indicate
that the kernel drivers are loaded.
I
Ok. With lots more off-list discussion, how's this pseudocode for a
proposal:
# Main assumption: if the kernel drivers are loaded, the user wants
RDMA
# hardware support in OMPI.
$sysfsdir = ibv_get_sysfs_path();
# Avoid printing "Fatal: couldn't read uverbs ABI version" message.
Am Donnerstag, den 22.05.2008, 17:18 -0400 schrieb Jeff Squyres:
> Could you check with some of your other Debian maintainers?
I'm sorry that I can't check that before Monday! I'll let you know then
but I'm not aware of that.
Never thought about the impact of my initial request to Jeff. I
> Either that or udev in not configured properly.
Debian has a correct udev configuration, modulo
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=449081
> ib_core/mthca/mlx4 should be loaded automatically by hotplug if HW is
> present. No need for any additional configuration.
Yes
> OFED is one distribution of the OpenFabrics software. It can be
> bundled up and packaged differently, too. I suspect that Debian does
> not include OFED directly, because OFED is pretty heavily dependent
> upon RPM. So the OpenFabrics kernel bits must be there somewhere
>
On Fri, May 23, 2008 at 09:56:44AM +0300, Gleb Natapov wrote:
> On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote:
> > > > Also, if this test depends on the Debian kernel packages, then we're
> > > > back to square one as some folks (like myself) run binary kernels,
> > > > other
On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote:
> > > Also, if this test depends on the Debian kernel packages, then we're
> > > back to square one as some folks (like myself) run binary kernels,
> > > other may just hand-compile and this test may not work as we may miss
> > >
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote:
> On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote:
>
> > Is there a test I could run for you?
>
> Can you see if /dev/infiniband exists? If it does, the OpenFabrics
> kernel drivers are running. If not, they aren't.
Either
On May 22, 2008, at 4:30 PM, Dirk Eddelbuettel wrote:
Can you see if /dev/infiniband exists? If it does, the OpenFabrics
kernel drivers are running. If not, they aren't.
Negative -- I have no /dev/infiniband. So his test idea seems
feasible which is nice!
Good!
Do you have any
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote:
> On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote:
>
> > Is there a test I could run for you?
>
> Can you see if /dev/infiniband exists? If it does, the OpenFabrics
> kernel drivers are running. If not, they aren't.
Negative
On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote:
Is there a test I could run for you?
Can you see if /dev/infiniband exists? If it does, the OpenFabrics
kernel drivers are running. If not, they aren't.
Also, if this test depends on the Debian kernel packages, then we're
back to
On Thu, May 22, 2008 at 03:45:36PM -0400, Jeff Squyres wrote:
> On May 22, 2008, at 3:42 PM, Dirk Eddelbuettel wrote:
>
> >> When you install binary OMPI (which pulls in libibverbs and all the
> >> rest), do you set the OpenFabrics kernel drivers to start upon boot?
> >> Or does the user have to
On May 22, 2008, at 3:42 PM, Dirk Eddelbuettel wrote:
When you install binary OMPI (which pulls in libibverbs and all the
rest), do you set the OpenFabrics kernel drivers to start upon boot?
Or does the user have to do that manually?
I think so. To the best of my knowledge, we don't do
On Thu, May 22, 2008 at 03:35:03PM -0400, Jeff Squyres wrote:
> Dirk / Debian guys --
>
> When you install binary OMPI (which pulls in libibverbs and all the
> rest), do you set the OpenFabrics kernel drivers to start upon boot?
> Or does the user have to do that manually?
I think so. To
Dirk / Debian guys --
When you install binary OMPI (which pulls in libibverbs and all the
rest), do you set the OpenFabrics kernel drivers to start upon boot?
Or does the user have to do that manually?
I ask because of the check Pasha proposes: if the user has started the
OpenFabrics
Brian W. Barrett wrote:
With MX, it's one initialization call (mx_init), and it's not clear from
the errors it can return that you can differentiate between the two cases.
If you run mx_init() on a machine without the MX driver loaded or no NIC
detected by the driver, you get a specific error
On May 22, 2008, at 11:53 AM, Pavel Shamis (Pasha) wrote:
1. Driver doesn't support the HCA - If I remember correct , RH40 by
default doesn't support ConnectX hca . The device_list will be
empty. It is very exotic case.
2. Driver version doesn't correspond with fw version
3. FW was broken
On Thu, May 22, 2008 at 06:53:38PM +0300, Pavel Shamis (Pasha) wrote:
> If user will decide to upgrade his ompi + libibverb rpm/deb package
> install , he will be need to do a lot of other "annoying" steps, like:
> source code download, installing all required *-dev.rpm , compilation.
No --
On Thu, 22 May 2008, Terry Dontje wrote:
The major difference here is that libmyriexpress is not being included
in mainline Linux distributions. Specifically: if you can find/use
libmyriexpress, it's likely because you have that hardware. The same
*used* to be true for libibverbs, but is no
Jeff Squyres wrote:
On May 22, 2008, at 6:50 AM, Terry Dontje wrote:
Brian and I chatted a bit about this off-list, and I think we're in
agreement now:
- do not change the default value or meaning of
btl_base_want_component_unsed.
- major point of confusion: the openib BTL is actually
On May 22, 2008, at 6:50 AM, Terry Dontje wrote:
Brian and I chatted a bit about this off-list, and I think we're in
agreement now:
- do not change the default value or meaning of
btl_base_want_component_unsed.
- major point of confusion: the openib BTL is actually fairly unique
in that it
Jeff Squyres wrote:
Brian and I chatted a bit about this off-list, and I think we're in
agreement now:
- do not change the default value or meaning of
btl_base_want_component_unsed.
- major point of confusion: the openib BTL is actually fairly unique
in that it can (and does) tell the
Brian and I chatted a bit about this off-list, and I think we're in
agreement now:
- do not change the default value or meaning of
btl_base_want_component_unsed.
- major point of confusion: the openib BTL is actually fairly unique
in that it can (and does) tell the difference between
Brian and I chatted a bit about this off-list, and I think we're in
agreement now:
- do not change the default value or meaning of
btl_base_want_component_unsed.
- major point of confusion: the openib BTL is actually fairly unique
in that it can (and does) tell the difference between
On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:
If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl. I'm not concerned about the dual rail issue -- if
you're smart enough
On Wed, 21 May 2008, Jeff Squyres wrote:
I'm only concerned about the case where there's an IB card, the user
expects the IB card to be used, and the IB card isn't used.
Can you put in a site wide
btl = ^tcp
to avoid the problem? If the IB card fails, then you'll get
unreachable MPI
On May 21, 2008, at 12:21 PM, Terry Dontje wrote:
So are you proposing to set btl_base_warn_component_unused to 0 or
something more BTL specific?
Probably something more btl-specific. The libibverbs-being-shipped-in-
main-line-Linux-distro's issue doesn't really affect other BTLs, so I
On Wed, 21 May 2008, Jeff Squyres wrote:
On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:
It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it". I think that
Then we disagree on a core point. I believe that users should never have
something silently unexpected happen (like falling back to TCP from a high
speed interconnect because of a NIC reset / software issue). YOu clearly
don't feel this way. I don't really work on the project, but do have
One thing I should clarify -- the ibverbs error message from my
previous mail is a red herring. libibverbs prints that message on
systems where the kernel portions of the OFED stack are not installed
(such as the quick-n-dirty test that I did before -- all I did was
install libibverbs
As I know only Openib kernel drivers is installed by default with
distribution.
But the user level - libibverbs and other openib stuff is not installed
by default. User need go to the package manager and explicitly
select libibverb. So if user decided to install libibverbs he had
reasons for
So are you proposing to set btl_base_warn_component_unused to 0 or
something more BTL specific?
--td
Jeff Squyres wrote:
What: Change default in openib BTL to not complain if no OpenFabrics
devices are found
Why: Many linuxes are shipping libibverbs these days, but most users
still don't
On Wed, 21 May 2008, Jeff Squyres wrote:
2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed). This is a Big Deal.
Which is easily solved with a better error message,
On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:
I think having a parameter to turn off the warning is a great idea.
So
great in fact, that it already exists in the trunk and v1.2 :)!
Setting
the default value for the btl_base_warn_component_unused flag from 0
to 1
will have the
And there's a typo in my first paragraph. The flag currently defaults to
1 (print the warning). It should be switched to 0 to turn off the
warning. Sorry for any confusion I might have caused -- I blame the lack
of caffeine in the morning.
Brian
On Wed, 21 May 2008, Pavel Shamis (Pasha)
38 matches
Mail list logo