Flavio Paiva Junqueira commented on ZOOKEEPER-230:
This patch is supposed to be an improvement, not a fix to a critical problem. I
have seen applications that because of misconfiguration keep running even
though the servers are not performing any useful work. In such cases, servers
wake up frequently to send new notifications (also generating log messages)
without the increase on the amount of time they wait until the next check. So,
all I'm trying to do is reduce the amount of work a server performs in such
cases by reducing the number of times it wakes up to check the queue of
notifications over time. It doesn't hurt the performance of leader election,
though, because if a server is partitioned, then messages will remain queued
until the partition heals. You're right that the amount of time increases
without bounds, and it might be important for long partitions, so I'm ok with
having a an upper bound.
> Improvements to FLE
> Key: ZOOKEEPER-230
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-230
> Project: Zookeeper
> Issue Type: Improvement
> Components: leaderElection
> Affects Versions: 3.0.0
> Reporter: Flavio Paiva Junqueira
> Assignee: Flavio Paiva Junqueira
> Fix For: 3.1.0
> Attachments: ZOOKEEPER-230.patch
> I'm about to attach a patch that implements the following modifications:
> . Currently, if a server is on leader election and doesn't receive a
> notification for some amount of time t, then it sends a new set of
> notifications if at least one server has delivered a message from the
> previous set. With this patch, the amount of time a server waits for a
> notification before sending a new set increases exponentially;
> . I have separated connecting to servers and queuing new notification
> messages. Before they were all in the same message. The advantage is that now
> I can tell to an instance of QuorumCnxManager to try to connect to other
> servers without generating new notification messages;
> . I have changed the logging level of several messages on QuorumCnxManager.
> They were "warn", but they should really be either "info" or "debug". I've
> changed them to info.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.