Re: [lustre-discuss] New client mounts fail after deactivating OSTs

2023-11-17 Thread Nehring, Shane R [ITS] via lustre-discuss
Little late to the party here, but I just ran into this myself.

I worked around it without having to regenerate everything with --writeconf,
which I realize isn't helpful 4 months after the fact, but I figured I'd post
here to help anyone else who runs into this issue in the future.

In my case I had removed all the llog entries for the OSTs except the conf_param
entries setting osc.active=0, assuming for whatever reason those should be
retained. This is incorrect, you'll want to remove those too for each relevant
OST.

I've opened an issue in LUDOC with some suggestions about how phrasing might be
improved.


On Tue, 2023-07-18 at 23:55 +, Andreas Dilger via lustre-discuss wrote:
> Brian,
> Please file a ticket in LUDOC with details of how the manual should be
> updated. Ideally, including a patch. :-)
> 
> Cheers, Andreas
> 
> > On Jul 11, 2023, at 15:39, Brad Merchant 
> > wrote:
> > 
> > 
> > We recreated the issue in a test cluster and it was definitely the
> > llog_cancel steps that caused the issue. Clients couldn't process the llog
> > properly on new mounts and would fail. We had to completely clear the
> > llog and --writeconf every target to regenerate it from scratch.
> > 
> > The cluster is up and running now but I would certainly recommend at least
> > revising that section of the manual.
> > 
> > 
> > 
> > On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant
> >  wrote:
> > > We deactivated half of 32 OSTs after draining them. We followed the steps
> > > in section 14.9.3 of the lustre manual
> > > 
> > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > 
> > > After running the steps in subhead "3. Deactivate the OST." on OST0010-
> > > OST001f, new client mounts fail with the below log messages. Existing
> > > client mounts seem to function correctly but are on a bit of a ticking
> > > timebomb because they are configured with autofs.
> > > 
> > > The llog_cancel steps are new to me and the issues seemed to appear after
> > > those commands were issued (can't say that 100% definitively however).
> > > Servers are running 2.12.5 and clients are on 2.14.x
> > > 
> > > 
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> > > 26814:0:(obd_config.c:1514:class_process_config()) no device for: hydra-
> > > OST0010-osc-8be5340c2000
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> > > 26814:0:(obd_config.c:2038:class_config_llog_handler())
> > > MGC172.16.100.101@o2ib: cfg command failed: rc = -22
> > > Jul 10 15:22:40 adm-sup1 kernel: Lustre:    cmd=cf00f 0:hydra-OST0010-osc
> > >  1:osc.active=0  
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f:
> > > MGC172.16.100.101@o2ib: Configuration from log hydra-client failed from
> > > MGS -22. Check client and MGS are on compatible version.
> > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to
> > > 99:99
> > > Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl
> > > set_param 'llite.hydra-8be5340c2000.nosquash_nids=192.168.80.84@tcp
> > > 192.168.80.122@tcp 192.168.80.21@tcp 172.16.90.11@o2ib 172.16.100.211@o2ib
> > > 172.16.100.212@o2ib 172.16.100.213@o2ib 172.16.100.214@o2ib
> > > 172.16.100.215@o2ib 172.16.90.51@o2ib'' failed with exit code 2.
> > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> > > 26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)
> > > 
> > > 
> > > 
> > > 
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



smime.p7s
Description: S/MIME cryptographic signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] New client mounts fail after deactivating OSTs

2023-07-18 Thread Andreas Dilger via lustre-discuss
Brian,
Please file a ticket in LUDOC with details of how the manual should be updated. 
Ideally, including a patch. :-)

Cheers, Andreas

On Jul 11, 2023, at 15:39, Brad Merchant  
wrote:


We recreated the issue in a test cluster and it was definitely the llog_cancel 
steps that caused the issue. Clients couldn't process the llog properly on new 
mounts and would fail. We had to completely clear the llog and --writeconf 
every target to regenerate it from scratch.

The cluster is up and running now but I would certainly recommend at least 
revising that section of the manual.

On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant 
mailto:bmerch...@cambridgecomputer.com>> wrote:
We deactivated half of 32 OSTs after draining them. We followed the steps in 
section 14.9.3 of the lustre manual

https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost

After running the steps in subhead "3. Deactivate the OST." on OST0010-OST001f, 
new client mounts fail with the below log messages. Existing client mounts seem 
to function correctly but are on a bit of a ticking timebomb because they are 
configured with autofs.

The llog_cancel steps are new to me and the issues seemed to appear after those 
commands were issued (can't say that 100% definitively however). Servers are 
running 2.12.5 and clients are on 2.14.x


Jul 10 15:22:40 adm-sup1 kernel: LustreError: 
26814:0:(obd_config.c:1514:class_process_config()) no device for: 
hydra-OST0010-osc-8be5340c2000
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 
26814:0:(obd_config.c:2038:class_config_llog_handler()) MGC172.16.100.101@o2ib: 
cfg command failed: rc = -22
Jul 10 15:22:40 adm-sup1 kernel: Lustre:cmd=cf00f 0:hydra-OST0010-osc  
1:osc.active=0
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f: MGC172.16.100.101@o2ib: 
Configuration from log hydra-client failed from MGS -22. Check client and MGS 
are on compatible version.
Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to 99:99
Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl 
set_param 'llite.hydra-8be5340c2000.nosquash_nids=192.168.80.84@tcp 
192.168.80.122@tcp 192.168.80.21@tcp 172.16.90.11@o2ib 172.16.100.211@o2ib 
172.16.100.212@o2ib 172.16.100.213@o2ib 172.16.100.214@o2ib 172.16.100.215@o2ib 
172.16.90.51@o2ib'' failed with exit code 2.
Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 
26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] New client mounts fail after deactivating OSTs

2023-07-11 Thread Brad Merchant
We recreated the issue in a test cluster and it was definitely the
llog_cancel steps that caused the issue. Clients couldn't process the llog
properly on new mounts and would fail. We had to completely clear the
llog and --writeconf every target to regenerate it from scratch.

The cluster is up and running now but I would certainly recommend at least
revising that section of the manual.

On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant <
bmerch...@cambridgecomputer.com> wrote:

> We deactivated half of 32 OSTs after draining them. We followed the steps
> in section 14.9.3 of the lustre manual
>
> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
>
> After running the steps in subhead "3. Deactivate the OST." on
> OST0010-OST001f, new client mounts fail with the below log messages.
> Existing client mounts seem to function correctly but are on a bit of a
> ticking timebomb because they are configured with autofs.
>
> The llog_cancel steps are new to me and the issues seemed to appear after
> those commands were issued (can't say that 100% definitively however).
> Servers are running 2.12.5 and clients are on 2.14.x
>
>
> Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> 26814:0:(obd_config.c:1514:class_process_config()) no device for:
> hydra-OST0010-osc-8be5340c2000
> Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> 26814:0:(obd_config.c:2038:class_config_llog_handler())
> MGC172.16.100.101@o2ib: cfg command failed: rc = -22
> Jul 10 15:22:40 adm-sup1 kernel: Lustre:cmd=cf00f 0:hydra-OST0010-osc
>  1:osc.active=0
> Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f: MGC172.16.100.101@o2ib:
> Configuration from log hydra-client failed from MGS -22. Check client and
> MGS are on compatible version.
> Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to 99:99
> Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl
> set_param 'llite.hydra-8be5340c2000.nosquash_nids=192.168.80.84@tcp
> 192.168.80.122@tcp 192.168.80.21@tcp 172.16.90.11@o2ib 172.16.100.211@o2ib
> 172.16.100.212@o2ib 172.16.100.213@o2ib 172.16.100.214@o2ib
> 172.16.100.215@o2ib 172.16.90.51@o2ib'' failed with exit code 2.
> Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
> Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> 26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] New client mounts fail after deactivating OSTs

2023-07-10 Thread Brad Merchant
We deactivated half of 32 OSTs after draining them. We followed the steps
in section 14.9.3 of the lustre manual

https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost

After running the steps in subhead "3. Deactivate the OST." on
OST0010-OST001f, new client mounts fail with the below log messages.
Existing client mounts seem to function correctly but are on a bit of a
ticking timebomb because they are configured with autofs.

The llog_cancel steps are new to me and the issues seemed to appear after
those commands were issued (can't say that 100% definitively however).
Servers are running 2.12.5 and clients are on 2.14.x


Jul 10 15:22:40 adm-sup1 kernel: LustreError:
26814:0:(obd_config.c:1514:class_process_config()) no device for:
hydra-OST0010-osc-8be5340c2000
Jul 10 15:22:40 adm-sup1 kernel: LustreError:
26814:0:(obd_config.c:2038:class_config_llog_handler())
MGC172.16.100.101@o2ib: cfg command failed: rc = -22
Jul 10 15:22:40 adm-sup1 kernel: Lustre:cmd=cf00f 0:hydra-OST0010-osc
 1:osc.active=0
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f: MGC172.16.100.101@o2ib:
Configuration from log hydra-client failed from MGS -22. Check client and
MGS are on compatible version.
Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to 99:99
Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl
set_param 'llite.hydra-8be5340c2000.nosquash_nids=192.168.80.84@tcp
192.168.80.122@tcp 192.168.80.21@tcp 172.16.90.11@o2ib 172.16.100.211@o2ib
172.16.100.212@o2ib 172.16.100.213@o2ib 172.16.100.214@o2ib
172.16.100.215@o2ib 172.16.90.51@o2ib'' failed with exit code 2.
Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
Jul 10 15:22:40 adm-sup1 kernel: LustreError:
26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org