Hi, Starting from yesterday, we see frequent MDS crashes, all of them are showing ldlm_flock_deadlock. Servers are running Lustre 2.15.4, MDT and MGT are on LDISKFS and OSTs are on ZFS. AlmaLinux 8.9. Clients are mostly CentOS 7.9 with Lustre client 2.15.4.
In one of these crashes, we have a complete coredump in case if someone wants to check. Thanks, Lixin. [15817.464501] LustreError: 22687:0:(ldlm_flock.c:230:ldlm_flock_deadlock()) ASSERTION( req != lock ) failed: [15817.474247] LustreError: 22687:0:(ldlm_flock.c:230:ldlm_flock_deadlock()) LBUG [15817.481497] Pid: 22687, comm: mdt01_003 4.18.0-513.9.1.el8_lustre.x86_64 #1 SMP Sat Dec 23 05:23:32 UTC 2023 [15817.491318] Call Trace TBD: [15817.494137] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs] [15817.499297] [<0>] lbug_with_loc+0x3f/0x70 [libcfs] [15817.504097] [<0>] ldlm_flock_deadlock.isra.10+0x1fb/0x240 [ptlrpc] [15817.510398] [<0>] ldlm_process_flock_lock+0x289/0x1f90 [ptlrpc] [15817.516402] [<0>] ldlm_lock_enqueue+0x2a5/0xaa0 [ptlrpc] [15817.521813] [<0>] ldlm_handle_enqueue0+0x634/0x1520 [ptlrpc] [15817.527562] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc] [15817.532368] [<0>] tgt_request_handle+0xccd/0x1a20 [ptlrpc] [15817.537949] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [15817.544311] [<0>] ptlrpc_main+0xbec/0x1530 [ptlrpc] [15817.549294] [<0>] kthread+0x134/0x150 [15817.552966] [<0>] ret_from_fork+0x1f/0x40 [15817.556980] Kernel panic - not syncing: LBUG [15817.561248] CPU: 23 PID: 22687 Comm: mdt01_003 Kdump: loaded Tainted: G OE --------- - - 4.18.0-513.9.1.el8_lustre.x86_64 #1 [15817.573669] Hardware name: Dell Inc. PowerEdge R640/0CRT1G, BIOS 2.19.1 06/04/2023 [15817.581235] Call Trace: [15817.583687] dump_stack+0x41/0x60 [15817.587007] panic+0xe7/0x2ac [15817.589979] ? ret_from_fork+0x1f/0x40 [15817.593733] lbug_with_loc.cold.8+0x18/0x18 [libcfs] [15817.598714] ldlm_flock_deadlock.isra.10+0x1fb/0x240 [ptlrpc] [15817.604557] ldlm_process_flock_lock+0x289/0x1f90 [ptlrpc] [15817.610121] ? lustre_msg_get_flags+0x2a/0x90 [ptlrpc] [15817.615346] ? lustre_msg_add_version+0x21/0xa0 [ptlrpc] [15817.620745] ldlm_lock_enqueue+0x2a5/0xaa0 [ptlrpc] [15817.625702] ldlm_handle_enqueue0+0x634/0x1520 [ptlrpc] [15817.631007] tgt_enqueue+0xa4/0x220 [ptlrpc] [15817.635365] tgt_request_handle+0xccd/0x1a20 [ptlrpc] [15817.640503] ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] [15817.646337] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [15817.652256] ptlrpc_main+0xbec/0x1530 [ptlrpc] [15817.656791] ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] [15817.661928] kthread+0x134/0x150 [15817.665161] ? set_kthread_struct+0x50/0x50 [15817.669346] ret_from_fork+0x1f/0x40 _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org