Re: [lustre-discuss] mlx4 and mxl5 mix environment

2020-07-03 Thread Andreas Dilger
There is a "Request Account" link at the top of every page: 
http://wiki.lustre.org/Special:RequestAccount



On Jul 1, 2020, at 08:39, Ms. Megan Larko 
mailto:dobsonu...@gmail.com>> wrote:

Awesome, thanks!   Unfortunately the password reset site is not finding my UID. 
  Maybe I never had access to the Lustre wiki.  (I have so many accounts that 
sometimes my head spins.)   I'm still willing to help.  Is there a request 
password site?

Cheers,
megan

On Fri, Jun 26, 2020 at 8:54 PM Spitz, Cory James 
mailto:cory.sp...@hpe.com>> wrote:
Megan,

You wrote:
PS. [I am willing to add/contribute to the 
http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my account 
for wiki editing has expired (at least the one I thought I had did not work).

Thank you for your offer!  Did you try 
http://wiki.lustre.org/Special:PasswordReset?  If that didn’t work then I think 
that you could email 
lustre@lists.opensfs.org.

-Cory



On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" 
mailto:lustre-discuss-boun...@lists.lustre.org>
 on behalf of dobsonu...@gmail.com> wrote:

On 22 Jun 2020 "guru.novice" wrote:
Hi, all
We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
Later I find something in wiki
http://wiki.lustre.org/Infiniband_Configuration_Howto and
http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
which was
last edited on 2016.
So do i need to change lnet configuration described in this page ?
Or the problem has been resolved in new version (like 2.12.x) ?
Anymore where can i find more details ?

Any suggestions would be appreciated.
Thanks?

Hello guru.novice,
Lustre 2.12.x has some nice LNet configuration abilities.  The old 
/etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An 
install of Lustre 2.12.x provides a sample of this file (with the lines 
commented out).  Our experience has shown that not all lines are necessary; 
edit to suit.

The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt to 
automatically find active and viable LNet paths to use.  This should have no 
issue with your mlx4/5 mix environment; we have some mixed IB and eth that 
work. To explicitly use MR one may set "Multi-Rail: true" in the "peer" NID 
section of the /etc/lnet.conf file.  But that was not necessary for us.  We 
used a simple /etc/lnet.conf for MR systems:
File stub: /etc/lnet.conf
net:
   - net type: o2ib0
 local NI(s):
- interfaces:
 0: ib0
  - net type: o2ib777
 local NI(s):
- interfaces:
 0: ib0:1
This allowed LNet to use any NID o2ib0 and o2ib777.

Whatever is placed in the /etc/lnet.conf file is loaded into the kernel modules 
used via the Lustre starting mechanism (CentOS uses /usr/lib/systemd/system).  
Because we are choosing _not_ to use MR on a different box, we explicitly 
defined the available routes in /etc/lnet.conf using the lines:
route:
   - net: tcp
 gateway: 10.10.10.101@o2ib1
   - net: tcp
 gateway: 10.10.10.102@o2ib
And so on up to 10.10.10.116@o2ib

 In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced below.  
(details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1 and  kernel 
3.10.957.27.2.el7)
File lnet.service:
[unit]
Description=lnet management
Requires=network-online.target
After=network-online.target openibd.service rdma.service opa.service
ConditionsPathExists=!/proc/sys/lnet/

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if you 
want MR function
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with router, 
credit and similar info
ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib1 
--non_mr  <--Omit non_rm if you want to use MR
ExecStop=/usr/sbin/lustre_rmmod ptlrpc
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs

[Install]
WantedBy=multi-user.target

I hope this info can help you in the right direction.

Cheers,
megan
PS. [I am willing to add/contribute to the 
http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my account 
for wiki editing has expired (at least the one I thought I had did not work).
Our site had issues with Multi-Rail "not socially distancing appropriately" 
from other LNet networks so in our particular case we disabled MR.  (An 
entirely different experience.) ]
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://li

Re: [lustre-discuss] mlx4 and mxl5 mix environment

2020-07-01 Thread Ms. Megan Larko
Awesome, thanks!   Unfortunately the password reset site is not finding my
UID.   Maybe I never had access to the Lustre wiki.  (I have so many
accounts that sometimes my head spins.)   I'm still willing to help.  Is
there a request password site?

Cheers,
megan

On Fri, Jun 26, 2020 at 8:54 PM Spitz, Cory James 
wrote:

> Megan,
>
>
>
> You wrote:
>
> PS. [I am willing to add/contribute to the
> http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
> account for wiki editing has expired (at least the one I thought I had did
> not work).
>
>
>
> Thank you for your offer!  Did you try
> http://wiki.lustre.org/Special:PasswordReset?  If that didn’t work then I
> think that you could email lustre@lists.opensfs.org.
>
>
>
> -Cory
>
>
>
>
>
>
>
> On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" <
> lustre-discuss-boun...@lists.lustre.org on behalf of dobsonu...@gmail.com>
> wrote:
>
>
>
> On 22 Jun 2020 "guru.novice" wrote:
>
> Hi, all
> We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
> Later I find something in wiki
> http://wiki.lustre.org/Infiniband_Configuration_Howto and
>
> http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
> which was
> last edited on 2016.
> So do i need to change lnet configuration described in this page ?
> Or the problem has been resolved in new version (like 2.12.x) ?
> Anymore where can i find more details ?
>
> Any suggestions would be appreciated.
> Thanks?
>
>
>
> Hello guru.novice,
>
> Lustre 2.12.x has some nice LNet configuration abilities.  The old
> /etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An
> install of Lustre 2.12.x provides a sample of this file (with the lines
> commented out).  Our experience has shown that not all lines are necessary;
> edit to suit.
>
>
>
> The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt
> to automatically find active and viable LNet paths to use.  This should
> have no issue with your mlx4/5 mix environment; we have some mixed IB and
> eth that work. To explicitly use MR one may set "Multi-Rail: true" in the
> "peer" NID section of the /etc/lnet.conf file.  But that was not necessary
> for us.  We used a simple /etc/lnet.conf for MR systems:
>
> File stub: /etc/lnet.conf
>
> net:
>
>- net type: o2ib0
>
>  local NI(s):
>
> - interfaces:
>
>  0: ib0
>
>   - net type: o2ib777
>
>  local NI(s):
>
> - interfaces:
>
>  0: ib0:1
>
> This allowed LNet to use any NID o2ib0 and o2ib777.
>
>
>
> Whatever is placed in the /etc/lnet.conf file is loaded into the kernel
> modules used via the Lustre starting mechanism (CentOS uses
> /usr/lib/systemd/system).  Because we are choosing _not_ to use MR on a
> different box, we explicitly defined the available routes in /etc/lnet.conf
> using the lines:
>
> route:
>
>- net: tcp
>
>  gateway: 10.10.10.101@o2ib1
>
>- net: tcp
>
>  gateway: 10.10.10.102@o2ib
>
> And so on up to 10.10.10.116@o2ib
>
>
>
>  In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced
> below.  (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1
> and  kernel 3.10.957.27.2.el7)
>
> File lnet.service:
>
> [unit]
>
> Description=lnet management
>
> Requires=network-online.target
>
> After=network-online.target openibd.service rdma.service opa.service
>
> ConditionsPathExists=!/proc/sys/lnet/
>
>
>
> [Service]
>
> Type=oneshot
>
> RemainAfterExit=true
>
> ExecStart=/sbin/modprobe lnet
>
> ExecStart=/usr/sbin/lnetctl lnet configure
>
> ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if
> you want MR function
>
> ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with
> router, credit and similar info
>
> ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib1
> --non_mr  <--Omit non_rm if you want to use MR
>
> ExecStop=/usr/sbin/lustre_rmmod ptlrpc
>
> ExecStop=/usr/sbin/lnetctl lnet unconfigure
>
> ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs
>
>
>
> [Install]
>
> WantedBy=multi-user.target
>
>
>
> I hope this info can help you in the right direction.
>
>
>
> Cheers,
>
> megan
>
> PS. [I am willing to add/contribute to the
> http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
> account for wiki editing has expired (at least the one I thought I had did
> not work).
>
> Our site had issues with Multi-Rail "not socially distancing
> appropriately" from other LNet networks so in our particular case we
> disabled MR.  (An entirely different experience.) ]
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] mlx4 and mxl5 mix environment

2020-06-26 Thread Spitz, Cory James
Megan,

You wrote:
PS. [I am willing to add/contribute to the 
http://wiki.lustre.org/Infiniband_Configuration_Howto
 but I think my account for wiki editing has expired (at least the one I 
thought I had did not work).

Thank you for your offer!  Did you try 
http://wiki.lustre.org/Special:PasswordReset?  If that didn’t work then I think 
that you could email 
lustre@lists.opensfs.org.

-Cory



On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" 
mailto:lustre-discuss-boun...@lists.lustre.org>
 on behalf of dobsonu...@gmail.com> wrote:

On 22 Jun 2020 "guru.novice" wrote:
Hi, all
We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
Later I find something in wiki
http://wiki.lustre.org/Infiniband_Configuration_Howto
 and
http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
which was
last edited on 2016.
So do i need to change lnet configuration described in this page ?
Or the problem has been resolved in new version (like 2.12.x) ?
Anymore where can i find more details ?

Any suggestions would be appreciated.
Thanks?

Hello guru.novice,
Lustre 2.12.x has some nice LNet configuration abilities.  The old 
/etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An 
install of Lustre 2.12.x provides a sample of this file (with the lines 
commented out).  Our experience has shown that not all lines are necessary; 
edit to suit.

The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt to 
automatically find active and viable LNet paths to use.  This should have no 
issue with your mlx4/5 mix environment; we have some mixed IB and eth that 
work. To explicitly use MR one may set "Multi-Rail: true" in the "peer" NID 
section of the /etc/lnet.conf file.  But that was not necessary for us.  We 
used a simple /etc/lnet.conf for MR systems:
File stub: /etc/lnet.conf
net:
   - net type: o2ib0
 local NI(s):
- interfaces:
 0: ib0
  - net type: o2ib777
 local NI(s):
- interfaces:
 0: ib0:1
This allowed LNet to use any NID o2ib0 and o2ib777.

Whatever is placed in the /etc/lnet.conf file is loaded into the kernel modules 
used via the Lustre starting mechanism (CentOS uses /usr/lib/systemd/system).  
Because we are choosing _not_ to use MR on a different box, we explicitly 
defined the available routes in /etc/lnet.conf using the lines:
route:
   - net: tcp
 gateway: 10.10.10.101@o2ib1
   - net: tcp
 gateway: 10.10.10.102@o2ib
And so on up to 10.10.10.116@o2ib

 In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced below.  
(details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1 and  kernel 
3.10.957.27.2.el7)
File lnet.service:
[unit]
Description=lnet management
Requires=network-online.target
After=network-online.target openibd.service rdma.service opa.service
ConditionsPathExists=!/proc/sys/lnet/

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if you 
want MR function
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with router, 
credit and similar info
ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib1 
--non_mr  <--Omit non_rm if you want to use MR
ExecStop=/usr/sbin/lustre_rmmod ptlrpc
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs

[Install]
WantedBy=multi-user.target

I hope this info can help you in the right direction.

Cheers,
megan
PS. [I am willing to add/contribute to the 
http://wiki.lustre.org/Infiniband_Configuration_Howto
 but I think my account for wiki editing has expired (at least the one I 
thought I had did not work).
Our site had issues with Multi-Rail "not socially distancing appropriately" 
from other LNet networks so in our particular case we disabled MR.  (An 
entirely different experience.) ]
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] mlx4 and mxl5 mix environment

2020-06-24 Thread Ms. Megan Larko
On 22 Jun 2020 "guru.novice" wrote:
Hi, all
We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
Later I find something in wiki
http://wiki.lustre.org/Infiniband_Configuration_Howto and
http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
which was
last edited on 2016.
So do i need to change lnet configuration described in this page ?
Or the problem has been resolved in new version (like 2.12.x) ?
Anymore where can i find more details ?

Any suggestions would be appreciated.
Thanks?

Hello guru.novice,
Lustre 2.12.x has some nice LNet configuration abilities.  The old
/etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An
install of Lustre 2.12.x provides a sample of this file (with the lines
commented out).  Our experience has shown that not all lines are necessary;
edit to suit.

The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt
to automatically find active and viable LNet paths to use.  This should
have no issue with your mlx4/5 mix environment; we have some mixed IB and
eth that work. To explicitly use MR one may set "Multi-Rail: true" in the
"peer" NID section of the /etc/lnet.conf file.  But that was not necessary
for us.  We used a simple /etc/lnet.conf for MR systems:
File stub: /etc/lnet.conf
net:
   - net type: o2ib0
 local NI(s):
- interfaces:
 0: ib0
  - net type: o2ib777
 local NI(s):
- interfaces:
 0: ib0:1
This allowed LNet to use any NID o2ib0 and o2ib777.

Whatever is placed in the /etc/lnet.conf file is loaded into the kernel
modules used via the Lustre starting mechanism (CentOS uses
/usr/lib/systemd/system).  Because we are choosing _not_ to use MR on a
different box, we explicitly defined the available routes in /etc/lnet.conf
using the lines:
route:
   - net: tcp
 gateway: 10.10.10.101@o2ib1
   - net: tcp
 gateway: 10.10.10.102@o2ib
And so on up to 10.10.10.116@o2ib

 In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced
below.  (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1
and  kernel 3.10.957.27.2.el7)
File lnet.service:
[unit]
Description=lnet management
Requires=network-online.target
After=network-online.target openibd.service rdma.service opa.service
ConditionsPathExists=!/proc/sys/lnet/

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if you
want MR function
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with router,
credit and similar info
ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib1
--non_mr  <--Omit non_rm if you want to use MR
ExecStop=/usr/sbin/lustre_rmmod ptlrpc
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs

[Install]
WantedBy=multi-user.target

I hope this info can help you in the right direction.

Cheers,
megan
PS. [I am willing to add/contribute to the
http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
account for wiki editing has expired (at least the one I thought I had did
not work).
Our site had issues with Multi-Rail "not socially distancing appropriately"
from other LNet networks so in our particular case we disabled MR.  (An
entirely different experience.) ]
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] mlx4 and mxl5 mix environment

2020-06-22 Thread Andreas Dilger
On Jun 22, 2020, at 2:13 AM, 肖正刚  wrote:
> We setup up a cluster use mlx4 and mlx5 driver mixed,all things goes well.
> Later I find something in wiki 
> http://wiki.lustre.org/Infiniband_Configuration_Howto and 
> http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
>  which was last edited on 2016.
> So do i need to change lnet configuration described in this page ?

One of the benefits of being a wiki page is that you can also update it
yourself, after registering for an account.

> Or the problem has been resolved in new version (like 2.12.x) ?
> Anymore where can i find more details ?
> 
> Any suggestions would be appreciated.
> Thanks!

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] mlx4 and mxl5 mix environment

2020-06-22 Thread 肖正刚
Hi, all
We setup up a cluster use mlx4 and mlx5 driver mixed,all things goes well.
Later I find something in wiki
http://wiki.lustre.org/Infiniband_Configuration_Howto and
http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
which was
last edited on 2016.
So do i need to change lnet configuration described in this page ?
Or the problem has been resolved in new version (like 2.12.x) ?
Anymore where can i find more details ?

Any suggestions would be appreciated.
Thanks!
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org