[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-29 Thread Milind Changire
An MDS-wide lock is acquired before the cache dump is done. After the dump is complete, the lock is released. So, the MDS freezing temporarily during the cache dump is expected. On Fri, May 26, 2023 at 12:51 PM Emmanuel Jaep wrote: > Hi Milind, > > I finally managed to dump the cache and find

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-26 Thread Emmanuel Jaep
Hi Milind, I finally managed to dump the cache and find the file. It generated a 1.5 GB file with about 7 Mio lines. It's kind of hard to know what is out of the ordinary… Furthermore, I noticed that dumping the cache was actually stopping the MDS. Is it a normal behavior? Best, Emmanuel On Th

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-25 Thread Milind Changire
try the command with the --id argument: # ceph --id admin --cluster floki daemon mds.icadmin011 dump cache /tmp/dump.txt I presume that your keyring has an appropriate entry for the client.admin user On Wed, May 24, 2023 at 5:10 PM Emmanuel Jaep wrote: > Absolutely! :-) > > root@icadmin011:/t

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Emmanuel Jaep
Absolutely! :-) root@icadmin011:/tmp# ceph --cluster floki daemon mds.icadmin011 dump cache /tmp/dump.txt root@icadmin011:/tmp# ll total 48 drwxrwxrwt 12 root root 4096 May 24 13:23 ./ drwxr-xr-x 18 root root 4096 Jun 9 2022 ../ drwxrwxrwt 2 root root 4096 May 4 12:43 .ICE-unix/ drwxrwxrwt

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Milind Changire
I hope the daemon mds.icadmin011 is running on the same machine that you are looking for /tmp/dump.txt, since the file is created on the system which has that daemon running. On Wed, May 24, 2023 at 2:16 PM Emmanuel Jaep wrote: > Hi Milind, > > you are absolutely right. > > The dump_ops_in_flig

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Emmanuel Jaep
Hi Milind, you are absolutely right. The dump_ops_in_flight is giving a good hint about what's happening: { "ops": [ { "description": "internal op exportdir:mds.5:975673", "initiated_at": "2023-05-23T17:49:53.030611+0200", "age": 60596.355186077999,

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Milind Changire
Emmanuel, You probably missed the "daemon" keyword after the "ceph" command name. Here's the docs for pacific: https://docs.ceph.com/en/pacific/cephfs/troubleshooting/ So, your command should've been: # ceph daemon mds.icadmin011 dump cache /tmp/dump.txt You could also dump the ops in flight with