[Lustre-discuss] problem with installing lustre and OFED
Greetings Jason, As you have most likely discovered, Mellanox (MLNX) needs to be built into the lustre linux kernel to use InfiniBand. I worked on such an issue recently. The Whamcloud linux kernel 2.1.2-2.6.32_220.17.1.el6_lustre would not work with our Mellanox InfiniBand (IB) drivers optimally. We got the MLXN version 1.8.5 to match our Mellanox hardware and had to do the dance already described to you in this list of... 1. downloading all of the appropriate (Whamcloud) lustre linux kernels, header and devel rpms 2. boot into the lustre kernel 3. in our /usr/src/lustre-2.1.2 directory built lustre against the Mellanox Module.symvers information (which is why you see the Input/Output errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of the aforementioned items, the lustre.ko. The MLNX version 1.8.5 that we needed was in the /usr/src/ofa_kernel directory (with the Module.symvers etc) We used the defaults other than the o2ib so our command in the /usr/src/lustre-2.1.2 directory looked like ./configure --with-o2ib=/usr/src/ofa_kernel 4. next we issued make 5. next we chose to run a make rpms command so that we could have rpms for our system for cluster re-building We had to do this for *both* our lustre servers and lustre clients (using the lustre-client Whamcloud kernel, headers, ... So we had the servers and the clients communicating properly over the MLNX ib fabric. In /etc/modprobe.d we used a lustre.conf file to explicitly direct the system to use the o2ib network when starting lustre at boot. Without the above actions the ko2iblnd would not load. Just confirming that you need to build Mellanox on servers and clients to use MLNX IB with Lustre cluster file system. megan ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote: Hello, Hi, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox's OFED distribution Is there a particular need for the Mellanox OFED distribution? The Redhat EL 6 kernel comes stock with the inifiniband drivers and stack already baked in and we leverage that and build our Lustre modules RPM against it. So unless there is something particular that you need that is only in the Mellanox OFED distribution and is not already in EL6's kernels, you should be able to just use the binary kernel and lustre-modules RPMs that we supply and have working inifiniband support. Cheers, b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
RedHat's OFED tends to lag Mellanox's. They're pretty current on bugfixes, but support for the latest hardware is usually 3-6 months behind - it took about 4 months to bring in drivers for our most recent FDR system. Also, support for Mellanox's advanced features (e.g., MXM, FCA) is often missing. -- Mike Shuey On Mon, Dec 31, 2012 at 11:32 AM, Brian J. Murrell brian.murr...@linux.intel.com wrote: On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote: Hello, Hi, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox's OFED distribution Is there a particular need for the Mellanox OFED distribution? The Redhat EL 6 kernel comes stock with the inifiniband drivers and stack already baked in and we leverage that and build our Lustre modules RPM against it. So unless there is something particular that you need that is only in the Mellanox OFED distribution and is not already in EL6's kernels, you should be able to just use the binary kernel and lustre-modules RPMs that we supply and have working inifiniband support. Cheers, b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] problem with installing lustre and ofed
Hello, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox's OFED distribution so we may use infiniband. Would you folks look at my procedure and results below and let me know what you think? Thanks very much! The mellanox ofed installation builds and installs some kernel modules too, so I used this method to ensure OFED compiled against the correct kernel. This is on centos 6.3. 1. download all lustre rpms from whamcloud 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel * in this case, it's the rpm files with 2.6.32-279.14.1.el6_lustre.x86_64 in their name 3. reboot into this lustre kernel 4. install the remaining rpms 5. download ofed from mellanox MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso * build mellanox ofed bits using the lustre kernel and kernel-devel info * install mellanox ofed 6. reboot 7. upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can modprobe lnet and lustre. 8. if I DO have o2ib3 present in the lnet parameters, running modprobe lustre gets me: ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): Input/output error WARNING: Error inserting fid (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko): Input/output error WARNING: Error inserting mdc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko): Input/output error WARNING: Error inserting osc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko): Input/output error WARNING: Error inserting lov (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko): Input/output error FATAL: Error inserting lustre (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko): Input/output error dmesg shows: ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap ko2iblnd: Unknown symbol ib_fmr_pool_unmap ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq … ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
Jason, The prebuilt server-side Lustre packages from Whamcloud are built against RHEL/CentOS kernel sources with kernel-ib active in them. This means that any of the Lustre prebuilt server packages are already tied to RHEL's kernel-ib. To accomplish your stated goal you'll have to start with a non Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install the OFED version of your choice. Once you have that you can build Lustre from source where it will compile against OFED and the installed kernel. --Jeff --- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4170 Morena Boulevard, Suite D - San Diego, CA 92117 /* Follow us on Twitter - @AeonComputing */ On 12/28/12 3:54 PM, Jason Brooks wrote: Hello, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox's OFED distribution so we may use infiniband. Would you folks look at my procedure and results below and let me know what you think? Thanks very much! The mellanox ofed installation builds and installs some kernel modules too, so I used this method to ensure OFED compiled against the correct kernel. This is on centos 6.3. 1. download all lustre rpms from whamcloud 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel 1. in this case, it's the rpm files with 2.6.32-279.14.1.el6_lustre.x86_64 in their name 3. reboot into this lustre kernel 4. install the remaining rpms 5. download ofed from mellanox MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso 1. build mellanox ofed bits using the lustre kernel and kernel-devel info 2. install mellanox ofed 6. reboot 7. upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can modprobe lnet and lustre. 8. if I DO have o2ib3 present in the lnet parameters, running modprobe lustre gets me: ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): Input/output error WARNING: Error inserting fid (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko): Input/output error WARNING: Error inserting mdc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko): Input/output error WARNING: Error inserting osc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko): Input/output error WARNING: Error inserting lov (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko): Input/output error FATAL: Error inserting lustre (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko): Input/output error dmesg shows: ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap ko2iblnd: Unknown symbol ib_fmr_pool_unmap ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq … ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
Hello, That's good to know kernel-ib comes with the lustre stock install. What about the rest of the OFED tools? I mean things like ibdiagnet, ibstatus, etc? (I will look at the contents of the other rpms and see what I can learn) On 12/28/12 4:45 PM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: Jason, The prebuilt server-side Lustre packages from Whamcloud are built against RHEL/CentOS kernel sources with kernel-ib active in them. This means that any of the Lustre prebuilt server packages are already tied to RHEL's kernel-ib. To accomplish your stated goal you'll have to start with a non Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install the OFED version of your choice. Once you have that you can build Lustre from source where it will compile against OFED and the installed kernel. --Jeff --- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4170 Morena Boulevard, Suite D - San Diego, CA 92117 /* Follow us on Twitter - @AeonComputing */ On 12/28/12 3:54 PM, Jason Brooks wrote: Hello, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox's OFED distribution so we may use infiniband. Would you folks look at my procedure and results below and let me know what you think? Thanks very much! The mellanox ofed installation builds and installs some kernel modules too, so I used this method to ensure OFED compiled against the correct kernel. This is on centos 6.3. 1. download all lustre rpms from whamcloud 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel 1. in this case, it's the rpm files with 2.6.32-279.14.1.el6_lustre.x86_64 in their name 3. reboot into this lustre kernel 4. install the remaining rpms 5. download ofed from mellanox MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso 1. build mellanox ofed bits using the lustre kernel and kernel-devel info 2. install mellanox ofed 6. reboot 7. upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can modprobe lnet and lustre. 8. if I DO have o2ib3 present in the lnet parameters, running modprobe lustre gets me: ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld .ko): Input/output error WARNING: Error inserting fid (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ fid.ko): Input/output error WARNING: Error inserting mdc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ mdc.ko): Input/output error WARNING: Error inserting osc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ osc.ko): Input/output error WARNING: Error inserting lov (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ lov.ko): Input/output error FATAL: Error inserting lustre (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ lustre.ko): Input/output error dmesg shows: ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap ko2iblnd: Unknown symbol ib_fmr_pool_unmap ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq Š ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
That's good to know kernel-ib comes with the lustre stock install. What about the rest of the OFED tools? I mean things like ibdiagnet, ibstatus, etc? (I will look at the contents of the other rpms and see what I can learn) I think Jeff missed a few steps. If you want the _server-side_ packages, what you need to do is: - Install a Lustre-patched kernel, including devel packages (you can use the ones from Whamcloud if they're suitable). - Build your OFED against that kernel install it. - Compile Lustre against the Lustre-patched kernel and the OFED. This is the tricky part; you need to make sure to tell Lustre to link against the right OFED package. There are Lustre build scripts that actually automate all of this; last time I checked, they were only available in the git tree, NOT in the source tarball. Those build scripts are a bit of a pain to use, and I find that I always have to tweak them a bit. But once you figure them all out it makes things easier. Now as for the userspace utilities ... well, you need to make sure they're not too far off from the kernel. How far is too far? Good question. I don't think they're guaranteed to work when they don't match, but in my limited experience minor version differences are ok. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss