Re: [lustre-discuss] OST still has inodes and size after deleting all files

2024-01-19 Thread Andreas Dilger via lustre-discuss


On Jan 19, 2024, at 13:48, Pavlo Khmel via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

Hi,

I'm trying to remove 4 OSTs.

# lfs osts
OBDS:
0: cluster-OST_UUID ACTIVE
1: cluster-OST0001_UUID ACTIVE
2: cluster-OST0002_UUID ACTIVE
3: cluster-OST0003_UUID ACTIVE
. . .

I moved all files to other OSTs. "lfs find" cannot find any files on these 4 
OSTs.

# time lfs find --ost 0 --ost 1 --ost 2 --ost 3 /cluster

real 936m8.528s
user 13m48.298s
sys 210m1.245s

But still: 2624 inods are in use and 14.5G total size.

# lfs df -i | grep -e OST -e OST0001 -e OST0002 -e OST0003
cluster-OST_UUID  4293438576 644  4293437932   1% /cluster[OST:0]
cluster-OST0001_UUID  4293438576 640  4293437936   1% /cluster[OST:1]
cluster-OST0002_UUID  4293438576 671  4293437905   1% /cluster[OST:2]
cluster-OST0003_UUID  4293438576 669  4293437907   1% /cluster[OST:3]

# lfs df -h | grep -e OST -e OST0001 -e OST0002 -e OST0003
cluster-OST_UUID   29.2T3.8G   27.6T   1% /cluster[OST:0]
cluster-OST0001_UUID   29.2T3.7G   27.6T   1% /cluster[OST:1]
cluster-OST0002_UUID   29.2T3.3G   27.6T   1% /cluster[OST:2]
cluster-OST0003_UUID   29.2T3.7G   27.6T   1% /cluster[OST:3]

I tried to check the file-system for errors:

# umount /lustre/ost01
# e2fsck -fy /dev/mapper/ost01

and

# lctl lfsck_start --device cluster-OST0001
# lctl get_param -n osd-ldiskfs.cluster-OST0001.oi_scrub
. . .
status: completed

I tried to mount OST as ldiskfs and there are several files in /O/0/d*/

# umount /lustre/ost01
# mount -t ldiskfs /dev/mapper/ost01 /mnt/
# ls -Rhl /mnt/O/0/d*/
. . .
/mnt/O/0/d11/:
-rw-rw-rw- 1 user1 group1 603K Nov  8 21:37 450605003
/mnt/O/0/d12/:
-rw-rw-rw- 1 user1 group1 110K Jun 16  2023 450322028
-rw-rw-rw- 1 user1 group1  21M Nov  8 22:17 450605484
. . .

Is it expected behavior? Is it save to delere OST even with those files?

You can run the debugfs "stat" command to print the "fid" xattr and it will 
print the MDT
parent FID for use with "lfs fid2path" on the client to see if there are any 
files related
to these objects.  You could also run "ll_decode_filter_fid" to do the same 
thing on the
mounted ldiskfs filesystem.

It is likely that there are a few stray objects from deleted files, but hard to 
say for sure.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre errors asking for help

2024-01-19 Thread Baranowski, Roman via lustre-discuss
Dear Andreas, All,

Thanks for your response.  Just to give you some more information.  Yes, we have
run e2fsck on all OSTs, MDTs, and MGS and all come back completely clean.  We
have also removed and recreated the quota files on them, again without any
issues.  The underlying storage upon which the OSTs and MDTs are built is fine.
We have run verifies on all LUNs, and then come back clean.  All the spinning
disks are fine and the controllers report no errors are failures.  Similarly,
the Infiniband fabric connecting all storage servers has also been checked and
no errors or issues are present.

You are correct that no files are being created on those OSTs.  However, Lustre
is behaving poorly, and sometimes goes into a hung state.  Today, it was
reporting the following:

pdsh -g storage uptime
mds2:  14:39:27 up 4 days, 21:17,  0 users,  load average: 0.00, 0.00, 0.00
mds1:  14:39:27 up 4 days, 21:17,  0 users,  load average: 0.00, 0.00, 0.00
oss2:  14:39:27 up 4 days, 21:07,  0 users,  load average: 0.00, 0.00, 0.00
oss1:  14:39:27 up 4 days, 21:07,  0 users,  load average: 0.00, 0.00, 0.00
oss4:  14:39:27 up 4 days, 21:06,  0 users,  load average: 0.00, 0.00, 0.00
oss6:  14:39:27 up 4 days, 21:06,  0 users,  load average: 0.00, 0.00, 0.00
oss5:  14:39:27 up 4 days, 21:06,  0 users,  load average: 0.00, 0.00, 0.00
oss3:  14:39:27 up 4 days, 21:07,  0 users,  load average: 0.06, 0.03, 0.00

The load on the storage servers is effectively zero yet the following messages
are being produced on mds2 (the mds serving the problematic OSTs) and other
OSTs from our lustre.log

Jan 19 14:36:57 oss5 kernel: : LustreError: 
13751:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0004: unable to precreate: rc 
= -116
Jan 19 14:36:57 oss5 kernel: : LustreError: 
13751:0:(ofd_obd.c:1348:ofd_create()) Skipped 76 previous similar messages
Jan 19 14:39:35 mds2 kernel: : Lustre: 
24647:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time 
(5/-150), not sending early reply
Jan 19 14:39:35 mds2 kernel: : Lustre: 
24647:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 133 previous 
similar messages
Jan 19 14:40:57 oss3 kernel: : LustreError: 
13903:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0014: unable to precreate: rc 
= -116
Jan 19 14:40:57 oss3 kernel: : LustreError: 
13903:0:(ofd_obd.c:1348:ofd_create()) Skipped 80 previous similar messages
Jan 19 14:43:31 mds2 kernel: : Lustre: scratch-MDT: Client 
e258850e-e603-8c7b-843f-66886cc67347 (at 192.168.113.1@o2ib) reconnecting
Jan 19 14:43:31 mds2 kernel: : Lustre: Skipped 4160 previous similar 
messagesJan 19 14:43:31 mds2 kernel: : Lustre: scratch-MDT: Client 
e258850e-e603-8c7b-843f-66886cc67347 (at 192.168.113.1@o2ib) refused 
reconnection, still busy with 1 active RPCs
Jan 19 14:43:31 mds2 kernel: : Lustre: Skipped 4094 previous similar 
messagesJan 19 14:44:30 mds2 kernel: : Lustre: lock timed out (enqueued at 
1705703070, 1200s ago)Jan 19 14:44:30 mds2 kernel: : LustreError: dumping log 
to /tmp/lustre-log.1705704270.4999
Jan 19 14:44:30 mds2 kernel: : Lustre: Skipped 10 previous similar messages
Jan 19 14:44:32 mds2 kernel: : LustreError: 
0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired 
after 4307s: evicting client at 192.168.122.11@o2ib  ns: 
mdt-scratch-MDT_UUID lock: 8804455dc480/0x280a58b66b7affa7 lrc: 3/0,0 
mode: PR/PR res: [0x2000733da:0x14d:0x0].0 bits 0x13 rrc: 267 type: IBT flags: 
0x20040020 nid: 192.168.122.11@o2ib remote: 0xa9053ebb6b8f23c1 expref: 7 
pid: 28304 timeout: 4717246067 lvb_type: 0
Jan 19 14:44:32 mds2 kernel: : LustreError: 
0:0:(ldlm_lockd.c:391:waiting_locks_callback()) Skipped 1 previous similar 
message
Jan 19 14:44:32 mds2 kernel: : Lustre: 
26575:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer 
than estimated (755:4253s); client may timeout.  req@880127936800 
x1788463595089220/t609893172938(0) 
o101->03cbede6-68a6-36ae-866e-74e858cf47f1@192.168.114.2@o2ib:0/0 lens 584/600 
e 0 to 0 dl 1705700019 ref 1 fl Complete:/0/0 rc 0/0
Jan 19 14:44:32 mds2 kernel: : LustreError: 
0:0:(ldlm_lockd.c:391:waiting_locks_callback()) Skipped 1 previous similar 
message
Jan 19 14:44:32 mds2 kernel: : Lustre: 
26575:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer 
than estimated (755:4253s); client may timeout.  req@880127936800 
x1788463595089220/t609893172938(0) 
o101->03cbede6-68a6-36ae-866e-74e858cf47f1@192.168.114.2@o2ib:0/0 lens 584/600 
e 0 to 0 dl 1705700019 ref 1 fl Complete:/0/0 rc 0/0
Jan 19 14:44:32 mds2 kernel: : LustreError: 
29019:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out 
request from 12345-192.168.128.8@o2ib: deadline 100:599s ago
Jan 19 14:44:32 mds2 kernel: : LustreError: 
29019:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 73 previous 
similar messages
Jan 19 14:44:32 mds2 kernel: : LustreError: 
24604:0:(ldlm_lockd.c:1309:ldlm_handle_enqueue0()) ### lock on 

[lustre-discuss] OST still has inodes and size after deleting all files

2024-01-19 Thread Pavlo Khmel via lustre-discuss
Hi,

I'm trying to remove 4 OSTs. 

# lfs osts
OBDS:
0: cluster-OST_UUID ACTIVE
1: cluster-OST0001_UUID ACTIVE
2: cluster-OST0002_UUID ACTIVE
3: cluster-OST0003_UUID ACTIVE
. . .

I moved all files to other OSTs. "lfs find" cannot find any files on these 4 
OSTs.

# time lfs find --ost 0 --ost 1 --ost 2 --ost 3 /cluster

real936m8.528s
user13m48.298s
sys 210m1.245s

But still: 2624 inods are in use and 14.5G total size.

# lfs df -i | grep -e OST -e OST0001 -e OST0002 -e OST0003
cluster-OST_UUID  4293438576 644  4293437932   1% /cluster[OST:0]
cluster-OST0001_UUID  4293438576 640  4293437936   1% /cluster[OST:1]
cluster-OST0002_UUID  4293438576 671  4293437905   1% /cluster[OST:2]
cluster-OST0003_UUID  4293438576 669  4293437907   1% /cluster[OST:3]

# lfs df -h | grep -e OST -e OST0001 -e OST0002 -e OST0003
cluster-OST_UUID   29.2T3.8G   27.6T   1% /cluster[OST:0]
cluster-OST0001_UUID   29.2T3.7G   27.6T   1% /cluster[OST:1]
cluster-OST0002_UUID   29.2T3.3G   27.6T   1% /cluster[OST:2]
cluster-OST0003_UUID   29.2T3.7G   27.6T   1% /cluster[OST:3]

I tried to check the file-system for errors:

# umount /lustre/ost01
# e2fsck -fy /dev/mapper/ost01

and

# lctl lfsck_start --device cluster-OST0001
# lctl get_param -n osd-ldiskfs.cluster-OST0001.oi_scrub
. . .
status: completed

I tried to mount OST as ldiskfs and there are several files in /O/0/d*/

# umount /lustre/ost01
# mount -t ldiskfs /dev/mapper/ost01 /mnt/
# ls -Rhl /mnt/O/0/d*/
. . .
/mnt/O/0/d11/:
-rw-rw-rw- 1 user1 group1 603K Nov  8 21:37 450605003
/mnt/O/0/d12/:
-rw-rw-rw- 1 user1 group1 110K Jun 16  2023 450322028
-rw-rw-rw- 1 user1 group1  21M Nov  8 22:17 450605484
. . .

Is it expected behavior? Is it save to delere OST even with those files?

Best regards,
Pavlo Khmel
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2

2024-01-19 Thread Feng Zhang via lustre-discuss
 For the  lustre-client-dkms, maybe you can --skip-broken to force it to
install, and it will automatically compile the source code of it, during
which you can check the error message. Use dkms command or go directly to
the source folder there to check. If there's no error message there, then
you should be fine. Likely some test tolls Andreas mentioned, or simply bad
rpmbuild configuration when the binary was built.

I met some rpmbuild issue before, not related, bu maybe helpful:
https://github.com/prod-feng/Luste-KMOD-2.12.9-with-ZFS-0.7.13-on-Centos-7.9

Best,

Feng


On Fri, Jan 19, 2024 at 2:01 PM Andreas Dilger via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> It looks like there may be a couple of test tools that are referencing
> python2, but it definitely isn't needed
> for normal operation.  Are you using the lustre-client binary or the
> lustre-client-dkms?  Only one is needed.
>
> For the short term it would be possible to override this dependency, but
> it would be good to understand
> why this dependency is actually being generated.
>
> On Jan 19, 2024, at 04:06, BALVERS Martin via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
>
>
> FYI
> It seems that lustre-client-dkms-2.15.4 is still checking for python2 and
> does not install on AlmaLinux 9.3
>
> # dnf --enablerepo=lustre-client install lustre-client lustre-client-dkms
> Last metadata expiration check: 0:04:50 ago on Fri Jan 19 11:43:54 2024.
> Error:
> Problem: conflicting requests
>   - nothing provides /usr/bin/python2 needed by
> lustre-client-dkms-2.15.4-1.el9.noarch from lustre-client
> (try to add '--skip-broken' to skip uninstallable packages or '--nobest'
> to use not only best candidate packages)
>
> According to the changelog this should have been fixed (
> https://wiki.lustre.org/Lustre_2.15.4_Changelog).
>
> Regards,
> Martin Balvers
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2

2024-01-19 Thread Andreas Dilger via lustre-discuss
It looks like there may be a couple of test tools that are referencing python2, 
but it definitely isn't needed
for normal operation.  Are you using the lustre-client binary or the 
lustre-client-dkms?  Only one is needed.

For the short term it would be possible to override this dependency, but it 
would be good to understand
why this dependency is actually being generated.

On Jan 19, 2024, at 04:06, BALVERS Martin via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

FYI
It seems that lustre-client-dkms-2.15.4 is still checking for python2 and does 
not install on AlmaLinux 9.3

# dnf --enablerepo=lustre-client install lustre-client lustre-client-dkms
Last metadata expiration check: 0:04:50 ago on Fri Jan 19 11:43:54 2024.
Error:
Problem: conflicting requests
  - nothing provides /usr/bin/python2 needed by 
lustre-client-dkms-2.15.4-1.el9.noarch from lustre-client
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use 
not only best candidate packages)

According to the changelog this should have been fixed 
(https://wiki.lustre.org/Lustre_2.15.4_Changelog).

Regards,
Martin Balvers

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2

2024-01-19 Thread BALVERS Martin via lustre-discuss
Hi,

FYI
It seems that lustre-client-dkms-2.15.4 is still checking for python2 and does 
not install on AlmaLinux 9.3

# dnf --enablerepo=lustre-client install lustre-client lustre-client-dkms
Last metadata expiration check: 0:04:50 ago on Fri Jan 19 11:43:54 2024.
Error:
Problem: conflicting requests
  - nothing provides /usr/bin/python2 needed by 
lustre-client-dkms-2.15.4-1.el9.noarch from lustre-client
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use 
not only best candidate packages)

According to the changelog this should have been fixed 
(https://wiki.lustre.org/Lustre_2.15.4_Changelog).

Regards,
Martin Balvers

Ce message électronique et tous les fichiers attachés qu'il contient sont 
confidentiels et destinés exclusivement à l'usage de la personne à laquelle ils 
sont adressés. Si vous avez reçu ce message par erreur, merci de le retourner à 
son émetteur. Les idées et opinions présentées dans ce message sont celles de 
son auteur, et ne représentent pas nécessairement celles de DANONE ou d'une 
quelconque de ses filiales. La publication, l'usage, la distribution, 
l'impression ou la copie non autorisée de ce message et des attachements qu'il 
contient sont strictement interdits. 

This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual to whom it is addressed. If you have 
received this email in error please send it back to the person that sent it to 
you. Any views or opinions presented are solely those of its author and do not 
necessarily represent those of DANONE or any of its subsidiary companies. 
Unauthorized publication, use, dissemination, forwarding, printing or copying 
of this email and its associated attachments is strictly prohibited.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org