>>> Gang He <g...@suse.com> schrieb am 02.06.2021 um 08:34 in Nachricht <am6pr04mb6488de7d2da906bad73fa3a1cf...@am6pr04mb6488.eurprd04.prod.outlook.com>
> Hi Ulrich, > > The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec > ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 100% > sure. > If possible, could you help to report a bug to SUSE, then we can work on > that further. Hi! Actually a service request for the issue is open at SUSE. However I don't know which L3 engineer is working on it. I have some "funny" effects, like these: On one node "ls" hangs, but can be interrupted with ^C; on another node "ls" also hangs, but cannot be stopped with ^C or ^Z (Most processes cannot even be killed with "kill -9") "ls" on the directory also hangs, just as an "rm" for a non-existent file What I really wonder is what triggered the effect, and more importantly how to recover from it. Initially I had suspected a rather full (95%) flesystem, but that means there are still 24GB available. The other suspect was concurrent creation of reflink snapshots while the file being snapshot did change (e.g. allocate a hole in a sparse file) Regards, Ulrich > > Thanks > Gang > > ________________________________________ > From: Users <users‑boun...@clusterlabs.org> on behalf of Ulrich Windl > <ulrich.wi...@rz.uni‑regensburg.de> > Sent: Tuesday, June 1, 2021 15:14 > To: users@clusterlabs.org > Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else? > >>>> Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F : 161 > : > 60728>: >> Hi! >> >> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15 SP2, >> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%, and >> we have an odd effect: >> A stat() systemcall to some of the files hangs indefinitely (state "D"). >> ("ls ‑l" and "rm" also hang, but I suspect those are calling state() >> internally, too). >> My only suspect is that the effect might be related to the 95% being used. >> The other suspect is that concurrent reflink calls may trigger the effect. >> >> Did anyone else experience something similar? > > Hi! > > I have some details: > It seems there is a reader/writer deadlock trying to allocate additional > blocks for a file. > The stacktrace looks like this: > Jun 01 07:56:31 h16 kernel: rwsem_down_write_slowpath+0x251/0x620 > Jun 01 07:56:31 h16 kernel: ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > Jun 01 07:56:31 h16 kernel: __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > Jun 01 07:56:31 h16 kernel: ocfs2_fallocate+0x82/0xa0 [ocfs2] > Jun 01 07:56:31 h16 kernel: vfs_fallocate+0x13f/0x2a0 > Jun 01 07:56:31 h16 kernel: ksys_fallocate+0x3c/0x70 > Jun 01 07:56:31 h16 kernel: __x64_sys_fallocate+0x1a/0x20 > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > That is the only writer (on that host), bit there are multiple readers like > this: > Jun 01 07:56:31 h16 kernel: rwsem_down_read_slowpath+0x172/0x300 > Jun 01 07:56:31 h16 kernel: ? dput+0x2c/0x2f0 > Jun 01 07:56:31 h16 kernel: ? lookup_slow+0x27/0x50 > Jun 01 07:56:31 h16 kernel: lookup_slow+0x27/0x50 > Jun 01 07:56:31 h16 kernel: walk_component+0x1c4/0x300 > Jun 01 07:56:31 h16 kernel: ? path_init+0x192/0x320 > Jun 01 07:56:31 h16 kernel: path_lookupat+0x6e/0x210 > Jun 01 07:56:31 h16 kernel: ? __put_lkb+0x45/0xd0 [dlm] > Jun 01 07:56:31 h16 kernel: filename_lookup+0xb6/0x190 > Jun 01 07:56:31 h16 kernel: ? kmem_cache_alloc+0x3d/0x250 > Jun 01 07:56:31 h16 kernel: ? getname_flags+0x66/0x1d0 > Jun 01 07:56:31 h16 kernel: ? vfs_statx+0x73/0xe0 > Jun 01 07:56:31 h16 kernel: vfs_statx+0x73/0xe0 > Jun 01 07:56:31 h16 kernel: ? fsnotify_grab_connector+0x46/0x80 > Jun 01 07:56:31 h16 kernel: __do_sys_newstat+0x39/0x70 > Jun 01 07:56:31 h16 kernel: ? do_unlinkat+0x92/0x320 > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > So that will match the hanging stat() quite nicely! > > However the PID displayed as holding the writer does not exist in the system > (on that node). > > Regards, > Ulrich > > >> >> Regards, >> Ulrich >> >> >> >> > > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/