Hi folks, We just saw an issue where a large resource (8K partitions) was offlined by using enableResource(.., .., false). First the messages were sent out and then each of the nodes offlined their respective partitions.
As this happened, the current states node in helix was updated and the controller started processing the notifications - at this time the zookeeper bandwidth peaked upto 150Mbps and came down after the processing was completed. I recall that a batch messaging mode exists in Helix - would it help alleviate this problem. Is the batching both ways, as in does the controller batch messages and also are the updates to CURRENTSTATES batched to minimize the # of zookeeper updates and hence notifications seen by controller ? Thanks Varun
