Comments for dlm_dispatch_work is described below:
/* Worker function used during recovery. */

But actually dlm_worker is used by 4 types of dlm message workers:
        dlm_assert_master_worker
        dlm_deref_lockres_worker
        dlm_request_all_locks_worker
        dlm_mig_lockres_worker

And the first 2 are not dlm recovery related. Moreover, it will send
DLM_ASSERT_MASTER_MSG to all other nodes in dlm_assert_master_worker.
And it may do a lot of assert master during recovery. In our scenario,
it is tens of thousands.
This will delay the recovery because dlm_worker is a single thread
workqueue and cluster is hanging during dlm recovery.
So I doubt if we can move the assert master to a new workqueue or just
use a system workqueue.
Any suggestions?


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to