[Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-20 Thread Ms. Megan Larko
Hello to Everyone!

I have a question to which I think I know the answer, but I am seeking
confirmation (re-assurance?).

I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
rpms from the Whamcloud site for linux kernel
2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
have from the Whamcloud site
kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
kernel-ib-devel for same.

The lustre file system works properly for TCP.

I would like to use InfiniBand.   The system has a new Mellanox card
for which mlxn1 firmware and drivers were installed.   After this was
done (I cannot speak to before) the IB network will come up on boot
and copy and ping in a traditional network fashion.

Hard Part:  I would like to run the lustre file system on the IB (ib0).
I re-created the lustre network to use /etc/modprobe.d/lustre.conf
pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
the osts point to mgs on IB net).   When I modprobe lustre to start
the system I receive error messages stating that there are
Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
lov.ko.   The lustre.ko cannot be started.   A look in
/var/log/messages reveals many Unknown symbol and Disagrees about
version of symbol  from the ko2iblnd module.

A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
pointing to the Modules.symvers of the lustre kernel.

Am I correct in thinking that because of the specific Mellanox IB
hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
file), that I have to build Lustre-2.1.2 from tarball to use the
configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

Is a rebuild of lustre from source mandartory or is there a way in
which I may point to the appropriate symbols needed by the
ko2iblnd.ko?

Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
else in the world, have a great weekend!

Megan Larko
Hewlett-Packard
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-20 Thread Colin Faber
Hi Megan,

One thing to check is if the existing IB drivers are installed on your 
system. They will conflict with the MLX ones. Not sure how Intel is 
building against IB these days but if they're using stock, and you're 
trying to use MLX, you're going to run into these symbol errors. If 
that's the case then recompile against the correct driver set is the fix 
here.

-cf

On 11/20/2012 02:20 PM, Ms. Megan Larko wrote:
 Hello to Everyone!

 I have a question to which I think I know the answer, but I am seeking
 confirmation (re-assurance?).

 I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
 rpms from the Whamcloud site for linux kernel
 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
 lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
 have from the Whamcloud site
 kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
 kernel-ib-devel for same.

 The lustre file system works properly for TCP.

 I would like to use InfiniBand.   The system has a new Mellanox card
 for which mlxn1 firmware and drivers were installed.   After this was
 done (I cannot speak to before) the IB network will come up on boot
 and copy and ping in a traditional network fashion.

 Hard Part:  I would like to run the lustre file system on the IB (ib0).
 I re-created the lustre network to use /etc/modprobe.d/lustre.conf
 pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
 osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
 the osts point to mgs on IB net).   When I modprobe lustre to start
 the system I receive error messages stating that there are
 Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
 lov.ko.   The lustre.ko cannot be started.   A look in
 /var/log/messages reveals many Unknown symbol and Disagrees about
 version of symbol  from the ko2iblnd module.

 A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
 pointing to the Modules.symvers of the lustre kernel.

 Am I correct in thinking that because of the specific Mellanox IB
 hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
 file), that I have to build Lustre-2.1.2 from tarball to use the
 configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
 system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
 the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

 Is a rebuild of lustre from source mandartory or is there a way in
 which I may point to the appropriate symbols needed by the
 ko2iblnd.ko?

 Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
 else in the world, have a great weekend!

 Megan Larko
 Hewlett-Packard
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss