Re: [lustre-discuss] Lustre and OFED

2017-07-31 Thread Arman Khalatyan
To find the RDMA devices you can use:
ibv_devices

You can bench your RDMA connection using qperf:
yum install qperf -y

on the client machine: qperf
on the server or ost  machine: qperf clienthostname  ud_lat ud_bw


On Mon, Jul 31, 2017 at 4:08 PM, Ben Evans <bev...@cray.com> wrote:

>
>
> From: "E.S. Rosenberg" <esr+lus...@mail.hebrew.edu>
> Date: Sunday, July 30, 2017 at 7:53 AM
> To: Ben Evans <bev...@cray.com>
> Cc: Harald van Pee <p...@hiskp.uni-bonn.de>, "lustre-discuss@lists.lustre.
> org" <lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] Lustre and OFED
>
>
>
> On Fri, Jul 28, 2017 at 6:22 PM, Ben Evans <bev...@cray.com> wrote:
>
>>
>>
>> On 7/28/17, 11:12 AM, "Harald van Pee" <p...@hiskp.uni-bonn.de> wrote:
>>
>> >Hello
>> >
>> >On Friday 28 July 2017 15:48:12 Ben Evans wrote:
>> >> Eli, just to clarify are you talking about using the in-kernel OFED vs.
>> >>a
>> >> vendor (Mellanox) OFED, or
>> >
>> >In our case we are using the OFED of the debian distribution used.
>>
> I am using the IB support that ships with Debian/CentOS/mainline kernel
> and did not install any OFED/Mellanox OFED package.
> As far as I can tell RDMA does work. (using the various test tools
> suggested here https://community.mellanox.com/docs/DOC-2086)
>
>> >
>> >> are you talking about using the ConnectX-3
>> >> hardware in IPoIB mode and just using it as a faster Ethernet?
>> >
>> >is  possible? How one have to do this?
>>
>> You'd configure the lustre LNET to use it like any other ethernet device.
>> The downside of this is that it's slower due to a lack of RDMA and other
>> features that IB has.  I'm not sure if there's a real upside to it.
>>
>
> Now you've made me unsure of what whether or not my Lustre install is
> using RDMA, how should I be able to tell (we are definitely using
> IPoIB/o2ib)?
>
> If you are mounting Lustre with a string that looks like 192.168.1.10@o2ib
> ,192.168.0.11@o2ib:/lustre then you're using OFED and RDMA.
>
> Thanks,
> Eli
>
>>
>> >
>> >Harald
>> >
>> >
>> >>
>> >> -Ben Evans
>> >>
>> >> From: lustre-discuss
>> >>
>> >><lustre-discuss-boun...@lists.lustre.org<mailto:lustre-
>> discuss-bounces@li
>> >>s
>> >> ts.lustre.org>> on behalf of "E.S. Rosenberg"
>> >> <esr+lus...@mail.hebrew.edu<mailto:esr+lus...@mail.hebrew.edu>> Date:
>> >> Thursday, July 27, 2017 at 4:55 PM
>> >> To:
>> >>
>> >>"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org
>> >"
>> >>
>> >><lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org
>> >>
>> >> Subject: [lustre-discuss] Lustre and OFED
>> >>
>> >> Hi all,
>> >>
>> >> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
>> >> once in a while and that got me thinking a bit.
>> >>
>> >> What things are gained by installing OFED? Performance? Accurate
>> traffic
>> >> reports?
>> >>
>> >> Currently I am using a lustre system without OFED but our IB hardware
>> is
>> >> from the FDR generation so not bleeding edge and probably doesn't need
>> >> OFED because of that
>> >>
>> >> Thanks,
>> >> Eli
>> >>
>> >> Tech specs:
>> >> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
>> >> Clients: Debian + kernel 4.2 + Lustre 2.8
>> >> IB: ConnectX-3 FDR
>> >
>>
>>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-31 Thread Ben Evans


From: "E.S. Rosenberg" 
<esr+lus...@mail.hebrew.edu<mailto:esr+lus...@mail.hebrew.edu>>
Date: Sunday, July 30, 2017 at 7:53 AM
To: Ben Evans <bev...@cray.com<mailto:bev...@cray.com>>
Cc: Harald van Pee <p...@hiskp.uni-bonn.de<mailto:p...@hiskp.uni-bonn.de>>, 
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Lustre and OFED



On Fri, Jul 28, 2017 at 6:22 PM, Ben Evans 
<bev...@cray.com<mailto:bev...@cray.com>> wrote:


On 7/28/17, 11:12 AM, "Harald van Pee" 
<p...@hiskp.uni-bonn.de<mailto:p...@hiskp.uni-bonn.de>> wrote:

>Hello
>
>On Friday 28 July 2017 15:48:12 Ben Evans wrote:
>> Eli, just to clarify are you talking about using the in-kernel OFED vs.
>>a
>> vendor (Mellanox) OFED, or
>
>In our case we are using the OFED of the debian distribution used.
I am using the IB support that ships with Debian/CentOS/mainline kernel and did 
not install any OFED/Mellanox OFED package.
As far as I can tell RDMA does work. (using the various test tools suggested 
here https://community.mellanox.com/docs/DOC-2086)
>
>> are you talking about using the ConnectX-3
>> hardware in IPoIB mode and just using it as a faster Ethernet?
>
>is  possible? How one have to do this?

You'd configure the lustre LNET to use it like any other ethernet device.
The downside of this is that it's slower due to a lack of RDMA and other
features that IB has.  I'm not sure if there's a real upside to it.

Now you've made me unsure of what whether or not my Lustre install is using 
RDMA, how should I be able to tell (we are definitely using IPoIB/o2ib)?

If you are mounting Lustre with a string that looks like 
192.168.1.10@o2ib,192.168.0.11@o2ib:/lustre then you're using OFED and RDMA.

Thanks,
Eli

>
>Harald
>
>
>>
>> -Ben Evans
>>
>> From: lustre-discuss
>>
>><lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org><mailto:lustre-discuss-bounces@li<mailto:lustre-discuss-bounces@li>
>>s
>> ts.lustre.org<http://ts.lustre.org>>> on behalf of "E.S. Rosenberg"
>> <esr+lus...@mail.hebrew.edu<mailto:esr%2blus...@mail.hebrew.edu><mailto:esr+lus...@mail.hebrew.edu<mailto:esr%2blus...@mail.hebrew.edu>>>
>>  Date:
>> Thursday, July 27, 2017 at 4:55 PM
>> To:
>>
>>"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>"
>>
>><lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>>
>> Subject: [lustre-discuss] Lustre and OFED
>>
>> Hi all,
>>
>> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
>> once in a while and that got me thinking a bit.
>>
>> What things are gained by installing OFED? Performance? Accurate traffic
>> reports?
>>
>> Currently I am using a lustre system without OFED but our IB hardware is
>> from the FDR generation so not bleeding edge and probably doesn't need
>> OFED because of that
>>
>> Thanks,
>> Eli
>>
>> Tech specs:
>> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
>> Clients: Debian + kernel 4.2 + Lustre 2.8
>> IB: ConnectX-3 FDR
>


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-30 Thread E.S. Rosenberg
On Fri, Jul 28, 2017 at 6:22 PM, Ben Evans <bev...@cray.com> wrote:

>
>
> On 7/28/17, 11:12 AM, "Harald van Pee" <p...@hiskp.uni-bonn.de> wrote:
>
> >Hello
> >
> >On Friday 28 July 2017 15:48:12 Ben Evans wrote:
> >> Eli, just to clarify are you talking about using the in-kernel OFED vs.
> >>a
> >> vendor (Mellanox) OFED, or
> >
> >In our case we are using the OFED of the debian distribution used.
>
I am using the IB support that ships with Debian/CentOS/mainline kernel and
did not install any OFED/Mellanox OFED package.
As far as I can tell RDMA does work. (using the various test tools
suggested here https://community.mellanox.com/docs/DOC-2086)

> >
> >> are you talking about using the ConnectX-3
> >> hardware in IPoIB mode and just using it as a faster Ethernet?
> >
> >is  possible? How one have to do this?
>
> You'd configure the lustre LNET to use it like any other ethernet device.
> The downside of this is that it's slower due to a lack of RDMA and other
> features that IB has.  I'm not sure if there's a real upside to it.
>

Now you've made me unsure of what whether or not my Lustre install is using
RDMA, how should I be able to tell (we are definitely using IPoIB/o2ib)?
Thanks,
Eli

>
> >
> >Harald
> >
> >
> >>
> >> -Ben Evans
> >>
> >> From: lustre-discuss
> >>
> >><lustre-discuss-boun...@lists.lustre.org<mailto:lustre
> -discuss-bounces@li
> >>s
> >> ts.lustre.org>> on behalf of "E.S. Rosenberg"
> >> <esr+lus...@mail.hebrew.edu<mailto:esr+lus...@mail.hebrew.edu>> Date:
> >> Thursday, July 27, 2017 at 4:55 PM
> >> To:
> >>
> >>"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org
> >"
> >>
> >><lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org
> >>
> >> Subject: [lustre-discuss] Lustre and OFED
> >>
> >> Hi all,
> >>
> >> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> >> once in a while and that got me thinking a bit.
> >>
> >> What things are gained by installing OFED? Performance? Accurate traffic
> >> reports?
> >>
> >> Currently I am using a lustre system without OFED but our IB hardware is
> >> from the FDR generation so not bleeding edge and probably doesn't need
> >> OFED because of that
> >>
> >> Thanks,
> >> Eli
> >>
> >> Tech specs:
> >> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> >> Clients: Debian + kernel 4.2 + Lustre 2.8
> >> IB: ConnectX-3 FDR
> >
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-28 Thread Ben Evans


On 7/28/17, 11:12 AM, "Harald van Pee" <p...@hiskp.uni-bonn.de> wrote:

>Hello
>
>On Friday 28 July 2017 15:48:12 Ben Evans wrote:
>> Eli, just to clarify are you talking about using the in-kernel OFED vs.
>>a
>> vendor (Mellanox) OFED, or
>
>In our case we are using the OFED of the debian distribution used.
>
>> are you talking about using the ConnectX-3
>> hardware in IPoIB mode and just using it as a faster Ethernet?
>
>is  possible? How one have to do this?

You'd configure the lustre LNET to use it like any other ethernet device.
The downside of this is that it's slower due to a lack of RDMA and other
features that IB has.  I'm not sure if there's a real upside to it.

>
>Harald
>
>
>> 
>> -Ben Evans
>> 
>> From: lustre-discuss
>> 
>><lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-bounces@li
>>s
>> ts.lustre.org>> on behalf of "E.S. Rosenberg"
>> <esr+lus...@mail.hebrew.edu<mailto:esr+lus...@mail.hebrew.edu>> Date:
>> Thursday, July 27, 2017 at 4:55 PM
>> To:
>> 
>>"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>"
>> 
>><lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
>> Subject: [lustre-discuss] Lustre and OFED
>> 
>> Hi all,
>> 
>> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
>> once in a while and that got me thinking a bit.
>> 
>> What things are gained by installing OFED? Performance? Accurate traffic
>> reports?
>> 
>> Currently I am using a lustre system without OFED but our IB hardware is
>> from the FDR generation so not bleeding edge and probably doesn't need
>> OFED because of that
>> 
>> Thanks,
>> Eli
>> 
>> Tech specs:
>> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
>> Clients: Debian + kernel 4.2 + Lustre 2.8
>> IB: ConnectX-3 FDR
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-28 Thread Harald van Pee
Hello

On Friday 28 July 2017 15:48:12 Ben Evans wrote:
> Eli, just to clarify are you talking about using the in-kernel OFED vs. a
> vendor (Mellanox) OFED, or 

In our case we are using the OFED of the debian distribution used.

> are you talking about using the ConnectX-3
> hardware in IPoIB mode and just using it as a faster Ethernet?

is  possible? How one have to do this?

Harald


> 
> -Ben Evans
> 
> From: lustre-discuss
> <lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-bounces@lis
> ts.lustre.org>> on behalf of "E.S. Rosenberg"
> <esr+lus...@mail.hebrew.edu<mailto:esr+lus...@mail.hebrew.edu>> Date:
> Thursday, July 27, 2017 at 4:55 PM
> To:
> "lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>"
> <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
> Subject: [lustre-discuss] Lustre and OFED
> 
> Hi all,
> 
> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> once in a while and that got me thinking a bit.
> 
> What things are gained by installing OFED? Performance? Accurate traffic
> reports?
> 
> Currently I am using a lustre system without OFED but our IB hardware is
> from the FDR generation so not bleeding edge and probably doesn't need
> OFED because of that
> 
> Thanks,
> Eli
> 
> Tech specs:
> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> Clients: Debian + kernel 4.2 + Lustre 2.8
> IB: ConnectX-3 FDR

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-28 Thread Ben Evans
Eli, just to clarify are you talking about using the in-kernel OFED vs. a 
vendor (Mellanox) OFED, or are you talking about using the ConnectX-3 hardware 
in IPoIB mode and just using it as a faster Ethernet?

-Ben Evans

From: lustre-discuss 
<lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of "E.S. Rosenberg" 
<esr+lus...@mail.hebrew.edu<mailto:esr+lus...@mail.hebrew.edu>>
Date: Thursday, July 27, 2017 at 4:55 PM
To: "lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Subject: [lustre-discuss] Lustre and OFED

Hi all,

How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every once in 
a while and that got me thinking a bit.

What things are gained by installing OFED? Performance? Accurate traffic 
reports?

Currently I am using a lustre system without OFED but our IB hardware is from 
the FDR generation so not bleeding edge and probably doesn't need OFED because 
of that

Thanks,
Eli

Tech specs:
Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
Clients: Debian + kernel 4.2 + Lustre 2.8
IB: ConnectX-3 FDR
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-27 Thread Harald van Pee
Hi Eli,

we are running lustre without OFED on debian client and server. 
With lustre 2.4.0 on client and servers no problem at all since years.
With lustre 2.5.3 on servers and 2.6. 92 no problems at least for monthes.
with lustre 2.5.3 on servers and 2.7 on clients allways ib connection loss.
Here I'm wondering if a more recent OFED version could help?

We are mostly interested in a rock solid lustre version, lustre 2.6 is fast 
enough for us, but has a memory leak caused by cache usage, lustre 2.7 was 
perfect for us in tests with a small number of machines, but fails completly 
for the full cluster and/or certain tasks.

Best
Harald


On Donnerstag, 27. Juli 2017 22:55:33 CEST E.S. Rosenberg wrote:
> Hi all,
> 
> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> once in a while and that got me thinking a bit.
> 
> What things are gained by installing OFED? Performance? Accurate traffic
> reports?
> 
> Currently I am using a lustre system without OFED but our IB hardware is
> from the FDR generation so not bleeding edge and probably doesn't need OFED
> because of that
> 
> Thanks,
> Eli
> 
> Tech specs:
> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> Clients: Debian + kernel 4.2 + Lustre 2.8
> IB: ConnectX-3 FDR


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-27 Thread E.S. Rosenberg
Jeff (and Grigory - offlist),

Thanks for your fast replies!
On Fri, Jul 28, 2017 at 12:09 AM, Jeff Johnson <
jeff.john...@aeoncomputing.com> wrote:

> Eli,
>
> The biggest driver is usually the drivers. Newer Mellanox hardware not yet
> supported, or supported well, by kernel IB. Way back in the days of old
> there were some interoperability issues where everything (clients and
> servers) needed to be the same drivers and libraries but much of that was
> cleaned up. There could be situations where OFED is needed on the server
> side to support something under the Lustre layer like OST or MDT block
> devices via iSER, SRP, NVMeF, etc.
>
> There may be other reasons but those are off the top of my head.
>
So currently everything seems to be working just fine without OFED, my only
complaint is that the normal Linux interface counters don't report traffic
properly which means I have to write my own perfquery wrappers for tools
like zabbix etc.

I may try adding OFED if I have time at some point but I hope by then to at
least have moved our servers to CentOS 7.3 + Lustre 2.9/10.

Has anyone ever run benchmarks of vanilla vs. OFED?
Thanks again,
Eli

>
> --Jeff
>
> On Thu, Jul 27, 2017 at 4:55 PM, E.S. Rosenberg <
> esr+lus...@mail.hebrew.edu> wrote:
>
>> Hi all,
>>
>> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
>> once in a while and that got me thinking a bit.
>>
>> What things are gained by installing OFED? Performance? Accurate traffic
>> reports?
>>
>> Currently I am using a lustre system without OFED but our IB hardware is
>> from the FDR generation so not bleeding edge and probably doesn't need OFED
>> because of that
>>
>> Thanks,
>> Eli
>>
>> Tech specs:
>> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
>> Clients: Debian + kernel 4.2 + Lustre 2.8
>> IB: ConnectX-3 FDR
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
>
> --
> --
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.john...@aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001 <(858)%20412-3810>   f: 858-412-3845
> <(858)%20412-3845>
> m: 619-204-9061 <(619)%20204-9061>
>
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-27 Thread Jeff Johnson
Eli,

The biggest driver is usually the drivers. Newer Mellanox hardware not yet
supported, or supported well, by kernel IB. Way back in the days of old
there were some interoperability issues where everything (clients and
servers) needed to be the same drivers and libraries but much of that was
cleaned up. There could be situations where OFED is needed on the server
side to support something under the Lustre layer like OST or MDT block
devices via iSER, SRP, NVMeF, etc.

There may be other reasons but those are off the top of my head.

--Jeff

On Thu, Jul 27, 2017 at 4:55 PM, E.S. Rosenberg 
wrote:

> Hi all,
>
> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> once in a while and that got me thinking a bit.
>
> What things are gained by installing OFED? Performance? Accurate traffic
> reports?
>
> Currently I am using a lustre system without OFED but our IB hardware is
> from the FDR generation so not bleeding edge and probably doesn't need OFED
> because of that
>
> Thanks,
> Eli
>
> Tech specs:
> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> Clients: Debian + kernel 4.2 + Lustre 2.8
> IB: ConnectX-3 FDR
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre and OFED

2017-07-27 Thread E.S. Rosenberg
Hi all,

How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
once in a while and that got me thinking a bit.

What things are gained by installing OFED? Performance? Accurate traffic
reports?

Currently I am using a lustre system without OFED but our IB hardware is
from the FDR generation so not bleeding edge and probably doesn't need OFED
because of that

Thanks,
Eli

Tech specs:
Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
Clients: Debian + kernel 4.2 + Lustre 2.8
IB: ConnectX-3 FDR
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [Lustre-discuss] Lustre 1.6.4.3 + OFED 1.2.5.5 + RHEL 4u4 AS

2008-07-01 Thread cxmtnbike
It doesn't appear as if you've come up with a solution to this
problem.  I too have run into the same set of issues as you appear to
have here, but I think I have resolved them.  I am running SLES 10 SP1
but this should also work for you.

first patch lnet/klnds/o2iblnd/o2iblnd.h  You'll need to add the
following line somewhere near the top:

#include linux/pci.h

Next you'll need to patch lnet/klnds/o2iblnd/o2iblnd.c.  ib_create_cq
uses 6 args in OFED-1.2.5.5 but o2iblnd.c only has 5 in the call.  The
new line should look like this:

cq = ib_create_cq(cmid-device,
  kiblnd_cq_completion, kiblnd_cq_event, conn,
  IBLND_CQ_ENTRIES(), 0);


After applying the two changes mentioned above I have been able to
make though 'make rpms' finishes it is a tad chatty with warning
messages.


On May 15, 12:37 pm, Malcolm Cowe [EMAIL PROTECTED] wrote:
 Hi Folks,

 Having some trouble with building Lustre 1.6.4.3 with OFED 1.2.5.5 on a
 RHEL 4u4 AS server. Could somebody please help me to understand where
 I've gone wrong? Here's what I have done so far:

 1. Install RHEL 4u4 AS (full installation).

 2. Download the Lustre RPMs from sun.com:

e2fsprogs-1.40.4.cfs1-0redhat.x86_64.rpm
e2fsprogs-devel-1.40.4.cfs1-0redhat.x86_64.rpm
kernel-lustre-smp-2.6.9-67.0.4.EL_lustre.1.6.4.3.x86_64.rpm
kernel-lustre-source-2.6.9-67.0.4.EL_lustre.1.6.4.3.x86_64.rpm
lustre-1.6.4.3-2.6.9_67.0.4.EL_lustre.1.6.4.3smp.x86_64.rpm
lustre-modules-1.6.4.3-2.6.9_67.0.4.EL_lustre.1.6.4.3smp.x86_64.rpm
lustre-source-1.6.4.3-2.6.9_67.0.4.EL_lustre.1.6.4.3smp.x86_64.rpm
plus: lustre-1.6.4.3.tar.gz

 3. Install kernel-lustre-smp and kernel-lustre-source rpms.
- Change grub to boot from lustre patched kernel by default.
- Reboot.

 4. Download OFED distribution from openib.org:

OFED-1.2.5.5.tgz

 5. Extract OFED distribution.

 6. Install OFED:

cd OFED-1.2.5.5/
./install.sh
  2) Install OFED Software
  3) All packages (all of Basic, HPC)
[accept defaults for everything, configure IPoIB IP address].

 7. Reboot.

 8. Modify Module.symvers, removing all references to Infiniband
modules supplied with the kernel distribution. N.B. Could not find this
file in the lustre kernel source tree, only in the -obj tree.

vi 
 /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3-obj/x86_64/smp/Module.symvers

 9. Run /usr/share/doc/ofed-docs-1.2.5.5/create_Module.symvers.sh and
append the resulting file to the existing Module.symvers file:

/usr/share/doc/ofed-docs-1.2.5.5/create_Module.symvers.sh
cat Module.symvers  
 /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3-obj/x86_64/smp/Module.symvers

 10. Change into the lustre kernel source and edit the Makefile. Change
custom suffix to smp in the variable EXTRAVERSION.

 11.  Change into the lustre kernel source and run the setup commands:

cd /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3
[linux]$ cp /boot/config-`uname -r` .config
[linux]$ make oldconfig || make menuconfig
# For 2.6 kernels
[linux]$ make include/asm
[linux]$ make include/linux/version.h
[linux]$ make SUBDIRS=scripts

 12. Extract the lustre source distribution (using lustre-1.6.4.3.tar.gz
rather than the RPM):

tar zxf lustre-1.6.4.3.tar.gz

 13. Run the configure script:

cd lustre-1.6.4.3/
./configure --with-linux=/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3 
 --with-o2ib=/usr/src/ofa_kernel

N.B. Cannot include --with-linux-obj= option as the configure script
exits with an error and a recommendation to run make config in the
linux src tree:

  checking for 
 /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3-obj/include/linux/autoconf.h... 
 no
  configure: error: Run make config in 
 /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3.

 If I do this, the build fails very early on:

 [EMAIL PROTECTED] lustre-1.6.4.3]# make
 test -d CVS || exit 0; \
 list=; for mod in $list; do \
perl ./build/kabi -v archive $HOME/nonfree $mod || exit $?; \
 done
 make  all-recursive
 make[1]: Entering directory `/root/HPC/build/lustre-1.6.4.3'
 Making all in ldiskfs
 make[2]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs'
 test -d CVS || exit 0; \
 list=; for mod in $list; do \
perl ./build/kabi -v archive $HOME/nonfree $mod || exit $?; \
 done
 make  all-recursive
 make[3]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs'
 Making all in .
 make[4]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs'
 for dir in ldiskfs ; do \
  make sources -C $dir || exit $? ; \
 done
 make[5]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs/ldiskfs'
 rm -rf linux-stage linux sources
 mkdir -p linux-stage/fs/ext3 linux-stage/include/linux
 cp /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/acl.c 
 /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/balloc.c 
 

[Lustre-discuss] Lustre 1.6.4.3 + OFED 1.2.5.5 + RHEL 4u4 AS

2008-05-15 Thread Malcolm Cowe
Hi Folks,

Having some trouble with building Lustre 1.6.4.3 with OFED 1.2.5.5 on a
RHEL 4u4 AS server. Could somebody please help me to understand where
I've gone wrong? Here's what I have done so far:


1. Install RHEL 4u4 AS (full installation).

2. Download the Lustre RPMs from sun.com:

   e2fsprogs-1.40.4.cfs1-0redhat.x86_64.rpm
   e2fsprogs-devel-1.40.4.cfs1-0redhat.x86_64.rpm
   kernel-lustre-smp-2.6.9-67.0.4.EL_lustre.1.6.4.3.x86_64.rpm
   kernel-lustre-source-2.6.9-67.0.4.EL_lustre.1.6.4.3.x86_64.rpm
   lustre-1.6.4.3-2.6.9_67.0.4.EL_lustre.1.6.4.3smp.x86_64.rpm
   lustre-modules-1.6.4.3-2.6.9_67.0.4.EL_lustre.1.6.4.3smp.x86_64.rpm
   lustre-source-1.6.4.3-2.6.9_67.0.4.EL_lustre.1.6.4.3smp.x86_64.rpm
   plus: lustre-1.6.4.3.tar.gz

3. Install kernel-lustre-smp and kernel-lustre-source rpms.
   - Change grub to boot from lustre patched kernel by default.
   - Reboot.

4. Download OFED distribution from openib.org:

   OFED-1.2.5.5.tgz

5. Extract OFED distribution.

6. Install OFED:

   cd OFED-1.2.5.5/
   ./install.sh
 2) Install OFED Software
 3) All packages (all of Basic, HPC)
   [accept defaults for everything, configure IPoIB IP address].

7. Reboot.

8. Modify Module.symvers, removing all references to Infiniband
   modules supplied with the kernel distribution. N.B. Could not find this
   file in the lustre kernel source tree, only in the -obj tree.

   vi 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3-obj/x86_64/smp/Module.symvers

9. Run /usr/share/doc/ofed-docs-1.2.5.5/create_Module.symvers.sh and
   append the resulting file to the existing Module.symvers file:

   /usr/share/doc/ofed-docs-1.2.5.5/create_Module.symvers.sh
   cat Module.symvers  
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3-obj/x86_64/smp/Module.symvers

10. Change into the lustre kernel source and edit the Makefile. Change
   custom suffix to smp in the variable EXTRAVERSION.

11.  Change into the lustre kernel source and run the setup commands:

   cd /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3
   [linux]$ cp /boot/config-`uname -r` .config
   [linux]$ make oldconfig || make menuconfig
   # For 2.6 kernels
   [linux]$ make include/asm
   [linux]$ make include/linux/version.h
   [linux]$ make SUBDIRS=scripts

12. Extract the lustre source distribution (using lustre-1.6.4.3.tar.gz
   rather than the RPM):

   tar zxf lustre-1.6.4.3.tar.gz

13. Run the configure script:

   cd lustre-1.6.4.3/
   ./configure --with-linux=/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3 
--with-o2ib=/usr/src/ofa_kernel

   N.B. Cannot include --with-linux-obj= option as the configure script
   exits with an error and a recommendation to run make config in the
   linux src tree:

 checking for 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3-obj/include/linux/autoconf.h... no
 configure: error: Run make config in 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3.


If I do this, the build fails very early on:


[EMAIL PROTECTED] lustre-1.6.4.3]# make
test -d CVS || exit 0; \
list=; for mod in $list; do \
   perl ./build/kabi -v archive $HOME/nonfree $mod || exit $?; \
done
make  all-recursive
make[1]: Entering directory `/root/HPC/build/lustre-1.6.4.3'
Making all in ldiskfs
make[2]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs'
test -d CVS || exit 0; \
list=; for mod in $list; do \
   perl ./build/kabi -v archive $HOME/nonfree $mod || exit $?; \
done
make  all-recursive
make[3]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs'
Making all in .
make[4]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs'
for dir in ldiskfs ; do \
 make sources -C $dir || exit $? ; \
done
make[5]: Entering directory `/root/HPC/build/lustre-1.6.4.3/ldiskfs/ldiskfs'
rm -rf linux-stage linux sources
mkdir -p linux-stage/fs/ext3 linux-stage/include/linux
cp /usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/acl.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/balloc.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/bitmap.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/dir.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/file.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/fsync.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/hash.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/ialloc.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/inode.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/ioctl.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/namei.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/resize.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/super.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/symlink.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/xattr.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/xattr_security.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/xattr_trusted.c 
/usr/src/linux-2.6.9-67.0.4.EL_lustre.1.6.4.3/fs/ext3/xattr_user.c 

Re: [Lustre-discuss] Lustre 1.6.4.3 + OFED 1.2.5.5 + RHEL 4u4 AS

2008-05-15 Thread Brian J. Murrell
A search of bugzilla yields bug 15315 which identifies bug 15030 as
well.  Please read through those two bugs.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 1.4.6.3 + OFED 1.2.5.5: Error loadingko2iblnd

2008-05-09 Thread David_Kewley
 From: [EMAIL PROTECTED] [mailto:lustre-discuss-
 [EMAIL PROTECTED] On Behalf Of Kumaran Rajaram
 Sent: Tuesday, May 06, 2008 1:12 PM

 Hardware Config: x86_64
 Software Config: SLES10.1,
  2.6.16.46-0.12-lustre (Stock SP1 Kernel + Lustre
 patches),
  OFEDv1.2.5.5,
  Lustre-1.6.4.3
 Status: Lustre + TCP builds and loads fine
 Lustre + o2ib builds but ko2iblnd does not load :-(
 
 Applied the Bugzilla patch 12276 to get Lustre compiled with
 OFEDv1.2.5.5.  Configured as follows (see config.out attached):
 
 ./configure --with-linux=/usr/src/linux
 --with-o2ib=/usr/src/ofa_kernel-1.2.5.5

Kums,

Try '--with-o2ib=/usr/src/ofa_kernel' (leaving out the '-1.2.5.5').  This 
worked for me in different circumstances (Lustre 1.4.12, RHEL 4 kernel 
2.6.9-67.0.4).

David

--
David Kewley
Dell Infrastructure Consulting Services
Cell: 602-460-7617
[EMAIL PROTECTED]

My views may not reflect Dell's views.

Dell Services: http://www.dell.com/services/
How am I doing? Email my manager [EMAIL PROTECTED] with any feedback.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre 1.4.6.3 + OFED 1.2.5.5: Error loading ko2iblnd

2008-05-06 Thread Kumaran Rajaram
Hi,

My cluster configuration is as follows:

Hardware Config: x86_64
Software Config: SLES10.1, 
 2.6.16.46-0.12-lustre (Stock SP1 Kernel + Lustre
patches), 
 OFEDv1.2.5.5, 
 Lustre-1.6.4.3
Status: Lustre + TCP builds and loads fine
Lustre + o2ib builds but ko2iblnd does not load :-(

Applied the Bugzilla patch 12276 to get Lustre compiled with
OFEDv1.2.5.5.  Configured as follows (see config.out attached):

./configure --with-linux=/usr/src/linux
--with-o2ib=/usr/src/ofa_kernel-1.2.5.5 

Get the following warnings when building the RPMs:
-
WARNING:
rdma_accept 
[/usr/src/packages/BUILD/lustre-1.6.4.3/lnet/klnds/o2iblnd/ko2iblnd.ko] 
undefined!
WARNING:
rdma_destroy_id 
[/usr/src/packages/BUILD/lustre-1.6.4.3/lnet/klnds/o2iblnd/ko2iblnd.ko] 
undefined!
WARNING:
rdma_connect 
[/usr/src/packages/BUILD/lustre-1.6.4.3/lnet/klnds/o2iblnd/ko2iblnd.ko] 
undefined!
-

I get the following error when I try to load ko2iblnd.ko (modprobe). 
-
May  6 16:42:47 storagehost kernel: ko2iblnd: disagrees about version of
symbol ib_create_cq
May  6 16:42:47 storagehost kernel: ko2iblnd: Unknown symbol
ib_create_cq
May  6 16:42:47 storagehost kernel: ko2iblnd: disagrees about version of
symbol ib_dereg_mr
May  6 16:42:47 storagehost kernel: ko2iblnd: Unknown symbol ib_dereg_mr
May  6 16:42:47 storagehost kernel: ko2iblnd: disagrees about version of
symbol ib_destroy_cq
May  6 16:42:47 storagehost kernel: ko2iblnd: Unknown symbol
ib_destroy_cq
May  6 16:42:47 storagehost kernel: ko2iblnd: disagrees about version of
symbol ib_get_dma_mr


In addition to the IB source, made the following symbolic links  
i) /usr/src/linux/drivers/infiniband to /usr/src/ofa_kernel-1.2.5.5
ii) /usr/src/linux/include/rdma
to /usr/src/ofa_kernel-1.2.5.5/include/rdma
iii) /usr/include/infiniband/verbs.h is from OFED-1.2.5.5

storagehost[3] root~$ modinfo ib_core
filename:   
/lib/modules/2.6.16.46-0.12-lustre/updates/kernel/drivers/infiniband/core/ib_core.ko
license:Dual BSD/GPL
description:core kernel InfiniBand API
author: Roland Dreier
srcversion: 4429863EA75C0750E651039
depends:
vermagic:   2.6.16.46-0.12-lustre SMP gcc-4.1

storagehost[5]
root/lib/modules/2.6.16.46-0.12-lustre/updates/kernel/drivers/infiniband/core$ 
nm ib_core.ko | grep ib_create_cq
66d8cf93 A __crc_ib_create_cq
55140081 A __crc_ib_create_cq_mod
0c61 T ib_create_cq
0cb7 T ib_create_cq_mod
00a0 r __kcrctab_ib_create_cq
0098 r __kcrctab_ib_create_cq_mod
0161 r __kstrtab_ib_create_cq
0150 r __kstrtab_ib_create_cq_mod
0140 r __ksymtab_ib_create_cq
0130 r __ksymtab_ib_create_cq_mod

Any ideas to what I may be doing wrong to get the ko2iblnd.ko loaded
properly?

Thanks in Advance,
-Kums
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking whether to build Cray XT3 features... no
checking whether to build BGL features... no
checking for ranlib... ranlib
checking for buggy compiler... no known problems
checking for unsigned long long... yes
checking size of unsigned long long... 8
--- size SIZEOF 
--- size SIZEOF 8
checking whether __i386__ is declared... no
checking if gcc accepts -m64... yes

checking whether to posix osd... no
checking whether to build docs... no
checking whether to build utilities... yes
checking whether to install init scripts... no
checking whether to build Lustre tests... yes
checking whether to build Lustre server support... yes
checking whether to build Lustre client support... yes
./configure: line 4461: LC_CONFIG_SPLIT: command not found
./configure: line 4462: LC_CONFIG_LDISKFS: command not found
checking whether to enable