On Sep 26, 2023, at 13:44, Audet, Martin via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:

Hello all,

I would appreciate if the community would give more attention to this issue 
because upgrading from 2.12.x to 2.15.x, two LTS versions, is something that we 
can expect many cluster admin will try to do in the next few months...

Who, in particular, is "the community"?

That term implies a collective effort, and I'd welcome feedback from your 
testing of the upgrade process.  It is definitely possible for an individual to 
install Lustre 2.12.9 on one or more VMs (or better, use a clone of your 
current server OS image), format a small test filesystem with the current 
configuration, copy some data into it, and then follow your planned process to 
upgrade to 2.15.3 (which should mostly just be "unmount everything, install new 
RPMs, mount").  That is prudent system administration to test the process in 
advance of changing your production system.

We ourselves plan to upgrade a small Lustre (production) system from 2.12.9 to 
2.15.3 in the next couple of weeks...

After seeing problems reports like this we start feeling a bit nervous...

The documentation for doing this major update appears to me as not very 
specific...

Patches with improvements to the process described in the manual are welcome.  
Please see https://wiki.lustre.org/Lustre_Manual_Changes for details on how to 
submit your contributions.

In this document for example, 
https://doc.lustre.org/lustre_manual.xhtml#upgradinglustre , the update process 
appears not so difficult and there is no mention of using "tunefs.lustre 
--writeconf" for this kind of update.

Or am I missing something ?

I think you are answering your own question here...  The documented upgrade 
process has no mention of running "writeconf", but it was run for an unknown 
reason. This introduced an unknown problem with the configuration files that 
prevented the target from mounting.

Then, rather than re-running writeconf to fix the configuration files, the 
entire MDT was copied to a new storage device (a large no-op IMHO, since any 
issue with the MDT config files would be copied along with it) and writeconf 
was run again to regenerate the configs, which could have been done just as 
easily on the original MDT.

So the relatively straight forward upgrade process was turned into a 
complicated process for no apparent reason.

There have been 2.12->2.15 upgrades done already in production without issues, 
and this is also tested continuously during development.  Of course there are a 
wide variety of different configurations, features, and hardware on which 
Lustre is run, and it isn't possible to test even a fraction of all 
configurations.  I don't think one problem report on the mailing list is an 
indication that there are fundamental issues with the upgrade process.

Cheers, Andreas

Thanks in advance for providing more tips for this kind of update.

Martin Audet
________________________________
From: lustre-discuss 
<lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Tung-Han Hsieh via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Sent: September 23, 2023 2:20 PM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 
to 2.15.3

***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.
Dear All,

Today we tried to upgrade Lustre file system from version 2.12.6 to 2.15.3. But 
after the work, we cannot mount MDT successfully. Our MDT is ldiskfs backend. 
The procedure of upgrade is

1. Install the new version of e2fsprogs-1.47.0
2. Install Lustre-2.15.3
3. After reboot, run: tunefs.lustre --writeconf /dev/md0

Then when mounting MDT, we got the error message in dmesg:

===========================================================
[11662.434724] LDISKFS-fs (md0): mounted filesystem with ordered data mode. 
Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[11662.584593] Lustre: 3440:0:(scrub.c:189:scrub_file_load()) chome-MDT0000: 
reset scrub OI count for format change (LU-16655)
[11666.036253] Lustre: MGS: Logs for fs chome were removed by user request.  
All servers must be restarted in order to regenerate the logs: rc = 0
[11666.523144] Lustre: chome-MDT0000: Imperative Recovery not enabled, recovery 
window 300-900
[11666.594098] LustreError: 3440:0:(mdd_device.c:1355:mdd_prepare()) 
chome-MDD0000: get default LMV of root failed: rc = -2
[11666.594291] LustreError: 
3440:0:(obd_mount_server.c:2027:server_fill_super()) Unable to start targets: -2
[11666.594951] Lustre: Failing over chome-MDT0000
[11672.868438] Lustre: 3440:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ 
Request sent has timed out for slow reply: [sent 1695492248/real 1695492248]  
req@000000005dfd9b53 x1777852464760768/t0(0) 
o251->MGC192.168.32.240@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 1695492254 ref 
2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:''
[11672.925905] Lustre: server umount chome-MDT0000 complete
[11672.926036] LustreError: 3440:0:(super25.c:183:lustre_fill_super()) llite: 
Unable to mount <unknown>: rc = -2
[11872.893970] LDISKFS-fs (md0): mounted filesystem with ordered data mode. 
Opts: (null)
============================================================

Could anyone help to solve this problem ? Sorry that it is really urgent.

Thank you very much.

T.H.Hsieh
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to