Re: [lustre-discuss] OST still has inodes and size after deleting all files
On Jan 19, 2024, at 13:48, Pavlo Khmel via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi, I'm trying to remove 4 OSTs. # lfs osts OBDS: 0: cluster-OST_UUID ACTIVE 1: cluster-OST0001_UUID ACTIVE 2: cluster-OST0002_UUID ACTIVE 3: cluster-OST0003_UUID ACTIVE . . . I moved all files to other OSTs. "lfs find" cannot find any files on these 4 OSTs. # time lfs find --ost 0 --ost 1 --ost 2 --ost 3 /cluster real 936m8.528s user 13m48.298s sys 210m1.245s But still: 2624 inods are in use and 14.5G total size. # lfs df -i | grep -e OST -e OST0001 -e OST0002 -e OST0003 cluster-OST_UUID 4293438576 644 4293437932 1% /cluster[OST:0] cluster-OST0001_UUID 4293438576 640 4293437936 1% /cluster[OST:1] cluster-OST0002_UUID 4293438576 671 4293437905 1% /cluster[OST:2] cluster-OST0003_UUID 4293438576 669 4293437907 1% /cluster[OST:3] # lfs df -h | grep -e OST -e OST0001 -e OST0002 -e OST0003 cluster-OST_UUID 29.2T3.8G 27.6T 1% /cluster[OST:0] cluster-OST0001_UUID 29.2T3.7G 27.6T 1% /cluster[OST:1] cluster-OST0002_UUID 29.2T3.3G 27.6T 1% /cluster[OST:2] cluster-OST0003_UUID 29.2T3.7G 27.6T 1% /cluster[OST:3] I tried to check the file-system for errors: # umount /lustre/ost01 # e2fsck -fy /dev/mapper/ost01 and # lctl lfsck_start --device cluster-OST0001 # lctl get_param -n osd-ldiskfs.cluster-OST0001.oi_scrub . . . status: completed I tried to mount OST as ldiskfs and there are several files in /O/0/d*/ # umount /lustre/ost01 # mount -t ldiskfs /dev/mapper/ost01 /mnt/ # ls -Rhl /mnt/O/0/d*/ . . . /mnt/O/0/d11/: -rw-rw-rw- 1 user1 group1 603K Nov 8 21:37 450605003 /mnt/O/0/d12/: -rw-rw-rw- 1 user1 group1 110K Jun 16 2023 450322028 -rw-rw-rw- 1 user1 group1 21M Nov 8 22:17 450605484 . . . Is it expected behavior? Is it save to delere OST even with those files? You can run the debugfs "stat" command to print the "fid" xattr and it will print the MDT parent FID for use with "lfs fid2path" on the client to see if there are any files related to these objects. You could also run "ll_decode_filter_fid" to do the same thing on the mounted ldiskfs filesystem. It is likely that there are a few stray objects from deleted files, but hard to say for sure. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre errors asking for help
Dear Andreas, All, Thanks for your response. Just to give you some more information. Yes, we have run e2fsck on all OSTs, MDTs, and MGS and all come back completely clean. We have also removed and recreated the quota files on them, again without any issues. The underlying storage upon which the OSTs and MDTs are built is fine. We have run verifies on all LUNs, and then come back clean. All the spinning disks are fine and the controllers report no errors are failures. Similarly, the Infiniband fabric connecting all storage servers has also been checked and no errors or issues are present. You are correct that no files are being created on those OSTs. However, Lustre is behaving poorly, and sometimes goes into a hung state. Today, it was reporting the following: pdsh -g storage uptime mds2: 14:39:27 up 4 days, 21:17, 0 users, load average: 0.00, 0.00, 0.00 mds1: 14:39:27 up 4 days, 21:17, 0 users, load average: 0.00, 0.00, 0.00 oss2: 14:39:27 up 4 days, 21:07, 0 users, load average: 0.00, 0.00, 0.00 oss1: 14:39:27 up 4 days, 21:07, 0 users, load average: 0.00, 0.00, 0.00 oss4: 14:39:27 up 4 days, 21:06, 0 users, load average: 0.00, 0.00, 0.00 oss6: 14:39:27 up 4 days, 21:06, 0 users, load average: 0.00, 0.00, 0.00 oss5: 14:39:27 up 4 days, 21:06, 0 users, load average: 0.00, 0.00, 0.00 oss3: 14:39:27 up 4 days, 21:07, 0 users, load average: 0.06, 0.03, 0.00 The load on the storage servers is effectively zero yet the following messages are being produced on mds2 (the mds serving the problematic OSTs) and other OSTs from our lustre.log Jan 19 14:36:57 oss5 kernel: : LustreError: 13751:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0004: unable to precreate: rc = -116 Jan 19 14:36:57 oss5 kernel: : LustreError: 13751:0:(ofd_obd.c:1348:ofd_create()) Skipped 76 previous similar messages Jan 19 14:39:35 mds2 kernel: : Lustre: 24647:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Jan 19 14:39:35 mds2 kernel: : Lustre: 24647:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 133 previous similar messages Jan 19 14:40:57 oss3 kernel: : LustreError: 13903:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0014: unable to precreate: rc = -116 Jan 19 14:40:57 oss3 kernel: : LustreError: 13903:0:(ofd_obd.c:1348:ofd_create()) Skipped 80 previous similar messages Jan 19 14:43:31 mds2 kernel: : Lustre: scratch-MDT: Client e258850e-e603-8c7b-843f-66886cc67347 (at 192.168.113.1@o2ib) reconnecting Jan 19 14:43:31 mds2 kernel: : Lustre: Skipped 4160 previous similar messagesJan 19 14:43:31 mds2 kernel: : Lustre: scratch-MDT: Client e258850e-e603-8c7b-843f-66886cc67347 (at 192.168.113.1@o2ib) refused reconnection, still busy with 1 active RPCs Jan 19 14:43:31 mds2 kernel: : Lustre: Skipped 4094 previous similar messagesJan 19 14:44:30 mds2 kernel: : Lustre: lock timed out (enqueued at 1705703070, 1200s ago)Jan 19 14:44:30 mds2 kernel: : LustreError: dumping log to /tmp/lustre-log.1705704270.4999 Jan 19 14:44:30 mds2 kernel: : Lustre: Skipped 10 previous similar messages Jan 19 14:44:32 mds2 kernel: : LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 4307s: evicting client at 192.168.122.11@o2ib ns: mdt-scratch-MDT_UUID lock: 8804455dc480/0x280a58b66b7affa7 lrc: 3/0,0 mode: PR/PR res: [0x2000733da:0x14d:0x0].0 bits 0x13 rrc: 267 type: IBT flags: 0x20040020 nid: 192.168.122.11@o2ib remote: 0xa9053ebb6b8f23c1 expref: 7 pid: 28304 timeout: 4717246067 lvb_type: 0 Jan 19 14:44:32 mds2 kernel: : LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) Skipped 1 previous similar message Jan 19 14:44:32 mds2 kernel: : Lustre: 26575:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:4253s); client may timeout. req@880127936800 x1788463595089220/t609893172938(0) o101->03cbede6-68a6-36ae-866e-74e858cf47f1@192.168.114.2@o2ib:0/0 lens 584/600 e 0 to 0 dl 1705700019 ref 1 fl Complete:/0/0 rc 0/0 Jan 19 14:44:32 mds2 kernel: : LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) Skipped 1 previous similar message Jan 19 14:44:32 mds2 kernel: : Lustre: 26575:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:4253s); client may timeout. req@880127936800 x1788463595089220/t609893172938(0) o101->03cbede6-68a6-36ae-866e-74e858cf47f1@192.168.114.2@o2ib:0/0 lens 584/600 e 0 to 0 dl 1705700019 ref 1 fl Complete:/0/0 rc 0/0 Jan 19 14:44:32 mds2 kernel: : LustreError: 29019:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-192.168.128.8@o2ib: deadline 100:599s ago Jan 19 14:44:32 mds2 kernel: : LustreError: 29019:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 73 previous similar messages Jan 19 14:44:32 mds2 kernel: : LustreError: 24604:0:(ldlm_lockd.c:1309:ldlm_handle_enqueue0()) ### lock on
[lustre-discuss] OST still has inodes and size after deleting all files
Hi, I'm trying to remove 4 OSTs. # lfs osts OBDS: 0: cluster-OST_UUID ACTIVE 1: cluster-OST0001_UUID ACTIVE 2: cluster-OST0002_UUID ACTIVE 3: cluster-OST0003_UUID ACTIVE . . . I moved all files to other OSTs. "lfs find" cannot find any files on these 4 OSTs. # time lfs find --ost 0 --ost 1 --ost 2 --ost 3 /cluster real936m8.528s user13m48.298s sys 210m1.245s But still: 2624 inods are in use and 14.5G total size. # lfs df -i | grep -e OST -e OST0001 -e OST0002 -e OST0003 cluster-OST_UUID 4293438576 644 4293437932 1% /cluster[OST:0] cluster-OST0001_UUID 4293438576 640 4293437936 1% /cluster[OST:1] cluster-OST0002_UUID 4293438576 671 4293437905 1% /cluster[OST:2] cluster-OST0003_UUID 4293438576 669 4293437907 1% /cluster[OST:3] # lfs df -h | grep -e OST -e OST0001 -e OST0002 -e OST0003 cluster-OST_UUID 29.2T3.8G 27.6T 1% /cluster[OST:0] cluster-OST0001_UUID 29.2T3.7G 27.6T 1% /cluster[OST:1] cluster-OST0002_UUID 29.2T3.3G 27.6T 1% /cluster[OST:2] cluster-OST0003_UUID 29.2T3.7G 27.6T 1% /cluster[OST:3] I tried to check the file-system for errors: # umount /lustre/ost01 # e2fsck -fy /dev/mapper/ost01 and # lctl lfsck_start --device cluster-OST0001 # lctl get_param -n osd-ldiskfs.cluster-OST0001.oi_scrub . . . status: completed I tried to mount OST as ldiskfs and there are several files in /O/0/d*/ # umount /lustre/ost01 # mount -t ldiskfs /dev/mapper/ost01 /mnt/ # ls -Rhl /mnt/O/0/d*/ . . . /mnt/O/0/d11/: -rw-rw-rw- 1 user1 group1 603K Nov 8 21:37 450605003 /mnt/O/0/d12/: -rw-rw-rw- 1 user1 group1 110K Jun 16 2023 450322028 -rw-rw-rw- 1 user1 group1 21M Nov 8 22:17 450605484 . . . Is it expected behavior? Is it save to delere OST even with those files? Best regards, Pavlo Khmel ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2
For the lustre-client-dkms, maybe you can --skip-broken to force it to install, and it will automatically compile the source code of it, during which you can check the error message. Use dkms command or go directly to the source folder there to check. If there's no error message there, then you should be fine. Likely some test tolls Andreas mentioned, or simply bad rpmbuild configuration when the binary was built. I met some rpmbuild issue before, not related, bu maybe helpful: https://github.com/prod-feng/Luste-KMOD-2.12.9-with-ZFS-0.7.13-on-Centos-7.9 Best, Feng On Fri, Jan 19, 2024 at 2:01 PM Andreas Dilger via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > It looks like there may be a couple of test tools that are referencing > python2, but it definitely isn't needed > for normal operation. Are you using the lustre-client binary or the > lustre-client-dkms? Only one is needed. > > For the short term it would be possible to override this dependency, but > it would be good to understand > why this dependency is actually being generated. > > On Jan 19, 2024, at 04:06, BALVERS Martin via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > > FYI > It seems that lustre-client-dkms-2.15.4 is still checking for python2 and > does not install on AlmaLinux 9.3 > > # dnf --enablerepo=lustre-client install lustre-client lustre-client-dkms > Last metadata expiration check: 0:04:50 ago on Fri Jan 19 11:43:54 2024. > Error: > Problem: conflicting requests > - nothing provides /usr/bin/python2 needed by > lustre-client-dkms-2.15.4-1.el9.noarch from lustre-client > (try to add '--skip-broken' to skip uninstallable packages or '--nobest' > to use not only best candidate packages) > > According to the changelog this should have been fixed ( > https://wiki.lustre.org/Lustre_2.15.4_Changelog). > > Regards, > Martin Balvers > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2
It looks like there may be a couple of test tools that are referencing python2, but it definitely isn't needed for normal operation. Are you using the lustre-client binary or the lustre-client-dkms? Only one is needed. For the short term it would be possible to override this dependency, but it would be good to understand why this dependency is actually being generated. On Jan 19, 2024, at 04:06, BALVERS Martin via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: FYI It seems that lustre-client-dkms-2.15.4 is still checking for python2 and does not install on AlmaLinux 9.3 # dnf --enablerepo=lustre-client install lustre-client lustre-client-dkms Last metadata expiration check: 0:04:50 ago on Fri Jan 19 11:43:54 2024. Error: Problem: conflicting requests - nothing provides /usr/bin/python2 needed by lustre-client-dkms-2.15.4-1.el9.noarch from lustre-client (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) According to the changelog this should have been fixed (https://wiki.lustre.org/Lustre_2.15.4_Changelog). Regards, Martin Balvers Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2
Hi, FYI It seems that lustre-client-dkms-2.15.4 is still checking for python2 and does not install on AlmaLinux 9.3 # dnf --enablerepo=lustre-client install lustre-client lustre-client-dkms Last metadata expiration check: 0:04:50 ago on Fri Jan 19 11:43:54 2024. Error: Problem: conflicting requests - nothing provides /usr/bin/python2 needed by lustre-client-dkms-2.15.4-1.el9.noarch from lustre-client (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) According to the changelog this should have been fixed (https://wiki.lustre.org/Lustre_2.15.4_Changelog). Regards, Martin Balvers Ce message électronique et tous les fichiers attachés qu'il contient sont confidentiels et destinés exclusivement à l'usage de la personne à laquelle ils sont adressés. Si vous avez reçu ce message par erreur, merci de le retourner à son émetteur. Les idées et opinions présentées dans ce message sont celles de son auteur, et ne représentent pas nécessairement celles de DANONE ou d'une quelconque de ses filiales. La publication, l'usage, la distribution, l'impression ou la copie non autorisée de ce message et des attachements qu'il contient sont strictement interdits. This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual to whom it is addressed. If you have received this email in error please send it back to the person that sent it to you. Any views or opinions presented are solely those of its author and do not necessarily represent those of DANONE or any of its subsidiary companies. Unauthorized publication, use, dissemination, forwarding, printing or copying of this email and its associated attachments is strictly prohibited. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org