Re: [Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-26 Thread Jerome, Ron
I've had to rebuild against the Mellanox OFED every time I change Lustre or 
OFED versions.  It's a bit of a catch 22 situation because you have to build 
the Mellanox OFED against the Lustre kernel, install the Mellanox OFED, then 
rebuild the Lustre modules against the Mellanox OFED.  The procedure I use is 
as follows...

* install upgraded Lustre kernel and kernel-devel rpms
* rebuild Mellanox OFED against Lustre kernel 
- mount -o loop MLNX_OFED.iso /root/mnt
- /root/mnt/docs/mlnx_add_kernel_support.sh -i /root/MLNX_OFED.iso
* install Mellanox OFED from rebuilt  MLNX_OFED.iso 
* install kernel-ib-devel from rebuilt MLNX_OFED.iso 

Now rebuld lustre-modules RPM to get ko2iblnd.ko which is compatible with 
Mellanox kernel-ib drivers...

* cd /usr/src/lustre-x.x.x
* configure --with-o2ib=/usr/src/openib  
* make  rpms


Ron. 
-Original Message-
From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Ms. Megan Larko
Sent: November 20, 2012 4:21 PM
To: Lustre User Discussion Mailing List
Subject: [Lustre-discuss] lo2iblnd and Mellanox IB question

Hello to Everyone!

I have a question to which I think I know the answer, but I am seeking
confirmation (re-assurance?).

I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
rpms from the Whamcloud site for linux kernel
2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
have from the Whamcloud site
kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
kernel-ib-devel for same.

The lustre file system works properly for TCP.

I would like to use InfiniBand.   The system has a new Mellanox card
for which mlxn1 firmware and drivers were installed.   After this was
done (I cannot speak to before) the IB network will come up on boot
and copy and ping in a traditional network fashion.

Hard Part:  I would like to run the lustre file system on the IB (ib0).
I re-created the lustre network to use /etc/modprobe.d/lustre.conf
pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
the osts point to mgs on IB net).   When I modprobe lustre to start
the system I receive error messages stating that there are
Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
lov.ko.   The lustre.ko cannot be started.   A look in
/var/log/messages reveals many Unknown symbol and Disagrees about
version of symbol  from the ko2iblnd module.

A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
pointing to the Modules.symvers of the lustre kernel.

Am I correct in thinking that because of the specific Mellanox IB
hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
file), that I have to build Lustre-2.1.2 from tarball to use the
configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

Is a rebuild of lustre from source mandartory or is there a way in
which I may point to the appropriate symbols needed by the
ko2iblnd.ko?

Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
else in the world, have a great weekend!

Megan Larko
Hewlett-Packard
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-21 Thread Jeff Johnson
Megan,

You will have to rebuild Lustre from source. Furthermore you will have 
to have the Mellanox ib driver source installed so the Lustre build 
process can grab the necessary bits from the Mellanox source.

The issue you are seeing is exactly what you think it is. The WC builds 
use the RHEL in-kernel IB driver. I have even had issues with MDS/OSS 
boxes running RHEL in-kernel IB and clients running Mellanox of OFED IB 
drivers. Even though IB is a standard you really need to have 
everything, from core to edge, talking the same driver.

I recently did nearly the same config you have; RHEL6.2 x86_64, MLX 
OFED, Lustre 2.1.3.

You could opt to run your Mellanox IB HCA using the RHEL in-kernel IB 
drivers and not have to recompile anything.

--Jeff


On 11/20/12 1:20 PM, Ms. Megan Larko wrote:
 Hello to Everyone!

 I have a question to which I think I know the answer, but I am seeking
 confirmation (re-assurance?).

 I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
 rpms from the Whamcloud site for linux kernel
 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
 lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
 have from the Whamcloud site
 kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
 kernel-ib-devel for same.

 The lustre file system works properly for TCP.

 I would like to use InfiniBand.   The system has a new Mellanox card
 for which mlxn1 firmware and drivers were installed.   After this was
 done (I cannot speak to before) the IB network will come up on boot
 and copy and ping in a traditional network fashion.

 Hard Part:  I would like to run the lustre file system on the IB (ib0).
 I re-created the lustre network to use /etc/modprobe.d/lustre.conf
 pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
 osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
 the osts point to mgs on IB net).   When I modprobe lustre to start
 the system I receive error messages stating that there are
 Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
 lov.ko.   The lustre.ko cannot be started.   A look in
 /var/log/messages reveals many Unknown symbol and Disagrees about
 version of symbol  from the ko2iblnd module.

 A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
 pointing to the Modules.symvers of the lustre kernel.

 Am I correct in thinking that because of the specific Mellanox IB
 hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
 file), that I have to build Lustre-2.1.2 from tarball to use the
 configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
 system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
 the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

 Is a rebuild of lustre from source mandartory or is there a way in
 which I may point to the appropriate symbols needed by the
 ko2iblnd.ko?

 Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
 else in the world, have a great weekend!

 Megan Larko
 Hewlett-Packard
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-21 Thread Ms. Megan Larko
Thanks, especially to Colin and to Jeff.

Yup.  I suspected that I would have to rebuild the Lustre 2.1.2 I have
to make use of the Mellanox IB.   Colin,  I appreciate the check; I
did not have conflicting IB drivers.  Jeff, I will heed your advice
and I will start my rebuild after the (U.S.) holiday weekend.

An enjoyable weekend to one and all!
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-20 Thread Ms. Megan Larko
Hello to Everyone!

I have a question to which I think I know the answer, but I am seeking
confirmation (re-assurance?).

I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
rpms from the Whamcloud site for linux kernel
2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
have from the Whamcloud site
kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
kernel-ib-devel for same.

The lustre file system works properly for TCP.

I would like to use InfiniBand.   The system has a new Mellanox card
for which mlxn1 firmware and drivers were installed.   After this was
done (I cannot speak to before) the IB network will come up on boot
and copy and ping in a traditional network fashion.

Hard Part:  I would like to run the lustre file system on the IB (ib0).
I re-created the lustre network to use /etc/modprobe.d/lustre.conf
pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
the osts point to mgs on IB net).   When I modprobe lustre to start
the system I receive error messages stating that there are
Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
lov.ko.   The lustre.ko cannot be started.   A look in
/var/log/messages reveals many Unknown symbol and Disagrees about
version of symbol  from the ko2iblnd module.

A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
pointing to the Modules.symvers of the lustre kernel.

Am I correct in thinking that because of the specific Mellanox IB
hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
file), that I have to build Lustre-2.1.2 from tarball to use the
configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

Is a rebuild of lustre from source mandartory or is there a way in
which I may point to the appropriate symbols needed by the
ko2iblnd.ko?

Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
else in the world, have a great weekend!

Megan Larko
Hewlett-Packard
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-20 Thread Colin Faber
Hi Megan,

One thing to check is if the existing IB drivers are installed on your 
system. They will conflict with the MLX ones. Not sure how Intel is 
building against IB these days but if they're using stock, and you're 
trying to use MLX, you're going to run into these symbol errors. If 
that's the case then recompile against the correct driver set is the fix 
here.

-cf

On 11/20/2012 02:20 PM, Ms. Megan Larko wrote:
 Hello to Everyone!

 I have a question to which I think I know the answer, but I am seeking
 confirmation (re-assurance?).

 I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
 rpms from the Whamcloud site for linux kernel
 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
 lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
 have from the Whamcloud site
 kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
 kernel-ib-devel for same.

 The lustre file system works properly for TCP.

 I would like to use InfiniBand.   The system has a new Mellanox card
 for which mlxn1 firmware and drivers were installed.   After this was
 done (I cannot speak to before) the IB network will come up on boot
 and copy and ping in a traditional network fashion.

 Hard Part:  I would like to run the lustre file system on the IB (ib0).
 I re-created the lustre network to use /etc/modprobe.d/lustre.conf
 pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
 osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
 the osts point to mgs on IB net).   When I modprobe lustre to start
 the system I receive error messages stating that there are
 Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
 lov.ko.   The lustre.ko cannot be started.   A look in
 /var/log/messages reveals many Unknown symbol and Disagrees about
 version of symbol  from the ko2iblnd module.

 A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
 pointing to the Modules.symvers of the lustre kernel.

 Am I correct in thinking that because of the specific Mellanox IB
 hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
 file), that I have to build Lustre-2.1.2 from tarball to use the
 configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
 system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
 the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

 Is a rebuild of lustre from source mandartory or is there a way in
 which I may point to the appropriate symbols needed by the
 ko2iblnd.ko?

 Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
 else in the world, have a great weekend!

 Megan Larko
 Hewlett-Packard
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss