From: levin li <[email protected]> During recovery, a VDI creation request may waits for recovery to complete, and VDI creation request is a cluster request which prevent other cluster requests being processed, when recovery comes to notify_recovery_completion_work, it issues another cluster request with SD_OP_COMPLETE_RECOVERY which is blocked by VDI creation, and as result, notify_recovery_completion_work blocks the recovery_wqueue, if a new recovery comes, it's blocked, at the same time, a VDI creation request may waits for this recovery to complete, so it's a dead lock.
Signed-off-by: levin li <[email protected]> --- sheep/recovery.c | 2 +- sheep/sheep.c | 1 + sheep/sheep_priv.h | 1 + 3 files changed, 3 insertions(+), 1 deletions(-) diff --git a/sheep/recovery.c b/sheep/recovery.c index 2232110..59ac9d6 100644 --- a/sheep/recovery.c +++ b/sheep/recovery.c @@ -373,7 +373,7 @@ static inline void finish_recovery(struct recovery_work *rw) /* notify recovery completion to other nodes */ rw->work.fn = notify_recovery_completion_work; rw->work.done = notify_recovery_completion_main; - queue_work(sys->recovery_wqueue, &rw->work); + queue_work(sys->recovery_notify_wqueue, &rw->work); dprintf("recovery complete: new epoch %"PRIu32"\n", sys->recovered_epoch); diff --git a/sheep/sheep.c b/sheep/sheep.c index 31af42c..10c0501 100644 --- a/sheep/sheep.c +++ b/sheep/sheep.c @@ -370,6 +370,7 @@ int main(int argc, char **argv) sys->gateway_wqueue = init_work_queue("gateway", false); sys->io_wqueue = init_work_queue("io", false); sys->recovery_wqueue = init_work_queue("recovery", true); + sys->recovery_notify_wqueue = init_work_queue("recovery notify", true); sys->deletion_wqueue = init_work_queue("deletion", true); sys->block_wqueue = init_work_queue("block", true); sys->sockfd_wqueue = init_work_queue("sockfd", true); diff --git a/sheep/sheep_priv.h b/sheep/sheep_priv.h index 1f5a1bd..90006f6 100644 --- a/sheep/sheep_priv.h +++ b/sheep/sheep_priv.h @@ -115,6 +115,7 @@ struct cluster_info { struct work_queue *io_wqueue; struct work_queue *deletion_wqueue; struct work_queue *recovery_wqueue; + struct work_queue *recovery_notify_wqueue; struct work_queue *block_wqueue; struct work_queue *sockfd_wqueue; struct work_queue *reclaim_wqueue; -- 1.7.1 -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
