On 07/16/2012 01:07 PM, Yunkai Zhang wrote: > From: Yunkai Zhang <[email protected]> > > V2: > - fix zk_queue_pop() when it's called by zk_unblock(): > continue to process block event when is_zk_unblock is True > -------------------------------------------------------- >8 > > As cluster request may retry infinitely when some sheeps left, than > cluster_op_done could not to be called forever, so it will cause cluster > hang problem. > > By giving priority to process LEAVE event when there is unfinished BLOCK > event, we can fix this issue, but also comply with the rule which is very > important for distributed system I think: > > All sheeps should process all events in the same order. >
This actually relax the order though not as aggressive as the patch in corosync does. > Signed-off-by: Yunkai Zhang <[email protected]> > --- > sheep/cluster/zookeeper.c | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c > index 7bd20bd..e03fd22 100644 > --- a/sheep/cluster/zookeeper.c > +++ b/sheep/cluster/zookeeper.c > @@ -71,6 +71,7 @@ static struct zk_event zk_levents[SD_MAX_NODES]; > static int nr_zk_levents; > static unsigned zk_levent_head; > static unsigned zk_levent_tail; > +static bool is_zk_unblock; > > static void *zk_node_btroot; > static struct zk_node *zk_master; > @@ -239,9 +240,11 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event > *ev) > struct zk_event *lev; > eventfd_t value = 1; > > - /* process leave event */ > - if (uatomic_read(&zk_notify_blocked) <= 0 && > - uatomic_read(&nr_zk_levents)) { > + /* > + * Continue to process LEAVE event even if > + * we have an unfinished BLOCK event. > + */ > + if (!is_zk_unblock && uatomic_read(&nr_zk_levents)) { > nr_levents = uatomic_sub_return(&nr_zk_levents, 1) + 1; > dprintf("nr_zk_levents:%d, head:%u\n", nr_levents, > zk_levent_head); > > @@ -282,6 +285,9 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event > *ev) > return 0; > } > > + if (!is_zk_unblock && uatomic_read(&zk_notify_blocked) > 0) > + return -1; > + both is_zk_unblock and zk_notify_blocked are blocking related, this need at least a documentation what each variables means, if you can't manage to fold these two into one. Also, for the variable itself, better to rename is_zk_unblock as zk_blocked or similar. Having one var zk_notify_blocked and the other is_zk_unblock will cause a lot of confusion. Thanks, Yuan -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
