[Lustre-discuss] problem with installing lustre and OFED

2013-01-02 Thread Ms. Megan Larko
Greetings Jason,

As you have most likely discovered, Mellanox (MLNX) needs to be built
into the lustre linux kernel to use InfiniBand.

I worked on such an issue recently.   The Whamcloud linux kernel
2.1.2-2.6.32_220.17.1.el6_lustre would not work with our Mellanox
InfiniBand (IB) drivers optimally.  We got the MLXN version 1.8.5 to
match our Mellanox hardware and had to do the dance already described
to you in this list of...
1.   downloading all of the appropriate (Whamcloud) lustre linux
kernels, header and devel rpms
2.   boot into the lustre kernel
3.   in our /usr/src/lustre-2.1.2 directory built lustre against the
Mellanox Module.symvers information (which is why you see the
Input/Output errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of
the aforementioned items, the lustre.ko.   The MLNX version 1.8.5 that
we needed was in the /usr/src/ofa_kernel directory (with the
Module.symvers etc)  We used the defaults other than the o2ib so
our command in the /usr/src/lustre-2.1.2 directory looked like
./configure --with-o2ib=/usr/src/ofa_kernel
4.   next we issued make
5.   next we chose to run a make rpms command so that we could have
rpms for our system for cluster re-building

We had to do this for *both* our lustre servers and lustre clients
(using the lustre-client Whamcloud kernel, headers, ...   So we had
the servers and the clients communicating properly over the MLNX ib
fabric.

In /etc/modprobe.d  we used a lustre.conf file to explicitly direct
the system to use the o2ib network when starting lustre at boot.

Without the above actions the ko2iblnd would not load.

Just confirming that you need to build Mellanox on servers and clients
to use MLNX IB with Lustre cluster file system.

megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-31 Thread Brian J. Murrell
On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote:
 Hello,

Hi,

 I am having trouble installing the server modules for  lustre 2.1.4
 and use mellanox's OFED distribution

Is there a particular need for the Mellanox OFED distribution?  The
Redhat EL 6 kernel comes stock with the inifiniband drivers and stack
already baked in and we leverage that and build our Lustre modules RPM
against it.

So unless there is something particular that you need that is only in
the Mellanox OFED distribution and is not already in EL6's kernels, you
should be able to just use the binary kernel and lustre-modules RPMs
that we supply and have working inifiniband support.

Cheers,
b.



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-31 Thread Michael Shuey
RedHat's OFED tends to lag Mellanox's.  They're pretty current on
bugfixes, but support for the latest hardware is usually 3-6 months
behind - it took about 4 months to bring in drivers for our most
recent FDR system.  Also, support for Mellanox's advanced features
(e.g., MXM, FCA) is often missing.

--
Mike Shuey


On Mon, Dec 31, 2012 at 11:32 AM, Brian J. Murrell
brian.murr...@linux.intel.com wrote:
 On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote:
 Hello,

 Hi,

 I am having trouble installing the server modules for  lustre 2.1.4
 and use mellanox's OFED distribution

 Is there a particular need for the Mellanox OFED distribution?  The
 Redhat EL 6 kernel comes stock with the inifiniband drivers and stack
 already baked in and we leverage that and build our Lustre modules RPM
 against it.

 So unless there is something particular that you need that is only in
 the Mellanox OFED distribution and is not already in EL6's kernels, you
 should be able to just use the binary kernel and lustre-modules RPMs
 that we supply and have working inifiniband support.

 Cheers,
 b.



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jason Brooks
Hello,

I am having trouble installing the server modules for  lustre 2.1.4 and use 
mellanox's OFED distribution so we may use infiniband.  Would you folks look at 
my procedure and results below and let me know what you think?  Thanks very 
much!

The mellanox ofed installation builds and installs some kernel modules too, so 
I used this method to ensure OFED compiled against the correct kernel.  This is 
on centos 6.3.

 1.  download all lustre rpms from whamcloud
 2.  install kernel, kernel-firmware, kernel-headers, and kernel-devel
*   in this case, it's the rpm files with 
2.6.32-279.14.1.el6_lustre.x86_64 in their name
 3.  reboot into this lustre kernel
 4.  install the remaining rpms
 5.  download ofed from mellanox 
MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso
*   build mellanox ofed bits using the lustre kernel and kernel-devel info
*   install mellanox ofed
 6.  reboot
 7.  upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can 
modprobe lnet and lustre.
 8.  if I DO have o2ib3 present in the lnet parameters, running modprobe lustre 
gets me:

ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): 
Input/output error
WARNING: Error inserting fid 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
 Input/output error
WARNING: Error inserting mdc 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
 Input/output error
WARNING: Error inserting osc 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
 Input/output error
WARNING: Error inserting lov 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
 Input/output error
FATAL: Error inserting lustre 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
 Input/output error


dmesg shows:
ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
ko2iblnd: Unknown symbol ib_fmr_pool_unmap
ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
…





___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jeff Johnson
Jason,

The prebuilt server-side Lustre packages from Whamcloud are built 
against RHEL/CentOS kernel sources with kernel-ib active in them. This 
means that any of the Lustre prebuilt server packages are already tied 
to RHEL's kernel-ib.

To accomplish your stated goal you'll have to start with a non 
Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install 
the OFED version of your choice. Once you have that you can build Lustre 
from source where it will compile against OFED and the installed kernel.

--Jeff

---
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

/* Follow us on Twitter - @AeonComputing */




On 12/28/12 3:54 PM, Jason Brooks wrote:
 Hello,

 I am having trouble installing the server modules for lustre 2.1.4 and 
 use mellanox's OFED distribution so we may use infiniband. Would you 
 folks look at my procedure and results below and let me know what you 
 think? Thanks very much!

 The mellanox ofed installation builds and installs some kernel modules 
 too, so I used this method to ensure OFED compiled against the correct 
 kernel. This is on centos 6.3.

  1. download all lustre rpms from whamcloud
  2. install kernel, kernel-firmware, kernel-headers, and kernel-devel
  1. in this case, it's the rpm files with
 2.6.32-279.14.1.el6_lustre.x86_64 in their name
  3. reboot into this lustre kernel
  4. install the remaining rpms
  5. download ofed from mellanox
 MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso
  1. build mellanox ofed bits using the lustre kernel and
 kernel-devel info
  2. install mellanox ofed
  6. reboot
  7. upon reboot, if I do NOT have o2ib3 in my lnet networks
 parameters, I can modprobe lnet and lustre.
  8. if I DO have o2ib3 present in the lnet parameters, running
 modprobe lustre gets me:

 ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko):
  
 Input/output error
 WARNING: Error inserting fid 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
  
 Input/output error
 WARNING: Error inserting mdc 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
  
 Input/output error
 WARNING: Error inserting osc 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
  
 Input/output error
 WARNING: Error inserting lov 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
  
 Input/output error
 FATAL: Error inserting lustre 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
  
 Input/output error


 dmesg shows:
 ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
 ko2iblnd: Unknown symbol ib_fmr_pool_unmap
 ko2iblnd: disagrees about version of symbol ib_create_cq
 ko2iblnd: Unknown symbol ib_create_cq
 …







 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jason Brooks
Hello,

That's good to know kernel-ib comes with the lustre stock install.

What about the rest of the OFED tools?  I mean things like ibdiagnet,
ibstatus, etc?  (I will look at the contents of the other rpms and see
what I can learn)

On 12/28/12 4:45 PM, Jeff Johnson jeff.john...@aeoncomputing.com wrote:

Jason,

The prebuilt server-side Lustre packages from Whamcloud are built
against RHEL/CentOS kernel sources with kernel-ib active in them. This
means that any of the Lustre prebuilt server packages are already tied
to RHEL's kernel-ib.

To accomplish your stated goal you'll have to start with a non
Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install
the OFED version of your choice. Once you have that you can build Lustre
from source where it will compile against OFED and the installed kernel.

--Jeff

---
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

/* Follow us on Twitter - @AeonComputing */




On 12/28/12 3:54 PM, Jason Brooks wrote:
 Hello,

 I am having trouble installing the server modules for lustre 2.1.4 and
 use mellanox's OFED distribution so we may use infiniband. Would you
 folks look at my procedure and results below and let me know what you
 think? Thanks very much!

 The mellanox ofed installation builds and installs some kernel modules
 too, so I used this method to ensure OFED compiled against the correct
 kernel. This is on centos 6.3.

  1. download all lustre rpms from whamcloud
  2. install kernel, kernel-firmware, kernel-headers, and kernel-devel
  1. in this case, it's the rpm files with
 2.6.32-279.14.1.el6_lustre.x86_64 in their name
  3. reboot into this lustre kernel
  4. install the remaining rpms
  5. download ofed from mellanox
 MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso
  1. build mellanox ofed bits using the lustre kernel and
 kernel-devel info
  2. install mellanox ofed
  6. reboot
  7. upon reboot, if I do NOT have o2ib3 in my lnet networks
 parameters, I can modprobe lnet and lustre.
  8. if I DO have o2ib3 present in the lnet parameters, running
 modprobe lustre gets me:

 
ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld
.ko): 
 Input/output error
 WARNING: Error inserting fid
 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
fid.ko): 
 Input/output error
 WARNING: Error inserting mdc
 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
mdc.ko): 
 Input/output error
 WARNING: Error inserting osc
 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
osc.ko): 
 Input/output error
 WARNING: Error inserting lov
 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
lov.ko): 
 Input/output error
 FATAL: Error inserting lustre
 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
lustre.ko): 
 Input/output error


 dmesg shows:
 ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
 ko2iblnd: Unknown symbol ib_fmr_pool_unmap
 ko2iblnd: disagrees about version of symbol ib_create_cq
 ko2iblnd: Unknown symbol ib_create_cq
 Š







 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Ken Hornstein
That's good to know kernel-ib comes with the lustre stock install.

What about the rest of the OFED tools?  I mean things like ibdiagnet,
ibstatus, etc?  (I will look at the contents of the other rpms and see
what I can learn)

I think Jeff missed a few steps.  If you want the _server-side_ packages,
what you need to do is:

- Install a Lustre-patched kernel, including devel packages (you can use
  the ones from Whamcloud if they're suitable).
- Build your OFED against that kernel  install it.
- Compile Lustre against the Lustre-patched kernel and the OFED.  This
  is the tricky part; you need to make sure to tell Lustre to link against
  the right OFED package.

There are Lustre build scripts that actually automate all of this; last
time I checked, they were only available in the git tree, NOT in the
source tarball.  Those build scripts are a bit of a pain to use, and I
find that I always have to tweak them a bit.  But once you figure them all
out it makes things easier.

Now as for the userspace utilities ... well, you need to make sure they're
not too far off from the kernel.  How far is too far?  Good question.
I don't think they're guaranteed to work when they don't match, but in my
limited experience minor version differences are ok.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss