Hi everyone, I just want to note that I observed similar behaviour with a somewhat larger clusters of 10 or so nodes. I first noticed that handoff activity after node join (or leave for that matter) involved a lot more partitions than I would have expected. By comparing the old and the new ring file, I found out that more than 80 percent of partitions had to be moved to another node. My naive expectation was that joining a node to a cluster of size X would result in roughly ring_creation_size/(X+1) partitions to be handed off, which would also be the minimum if one expects a balanced cluster afterwards. Furthermore it would in theory be possible to move partitions in such a way that at least one partition from each preflist stays on the same node. Maybe for X>N it should even be possible to guarantee this for a basic quorum of each preflist, eliminating the notfound problem completely, but I am not sure about that.
I may be able to provide some ring files to analyze this behaviour if someone from basho is interested. Cheer Nico Am Montag, den 02.05.2011, 23:14 -0400 schrieb Ryan Zezeski: > Greg, > > > Your expectations are fair, just because you added a node doesn't mean > Riak should return notfounds. Unfortunately, we aren't quite there > yet. This is a side effect of how Riak currently implements handoff > in that it immediately updates/gossips the ring causing > many partitions to handoff immediately. If a request comes in that > relies on these partitions then it will get a notfound and perform > read repair. You're situation is multiplied by the fact that you are > going from 3 nodes to 4. More vnode shuffling occurs because of the > small cluster size. > > > We're well aware of this and have it on our radar for improvement in a > future release. > > > All this said, you data will be eventually consistent. That is, all > your data will eventually be handed off and things will work as > normal. It's only during the handoff that you _may_ encounter > notfounds. In this case it would be best to add a new node to your > cluster at lowest load times and if you can spare additional hardware > a few more nodes to start with is an even easier option. > > > -Ryan > > On Mon, May 2, 2011 at 9:48 PM, Greg Nelson <[email protected]> > wrote: > Hello riak users! > > > I have a 4 node cluster that started out as 3 nodes. > ring_creation_size = 2048, target_n_val is default (4), and > all buckets have n_val = 3. > > > When I joined the 4th node, for a few minutes some GETs were > returning 'not found' for data that was already in riak. > Eventually the data was returned, due to read repair I would > assume. Is this expected? It seems that 'not found' and read > repairs should only happen when something goes wrong, like a > node goes down. Not when adding a node to the cluster, which > is supposed to be part of normal operation! > > > Any help or insight is appreciated! > > > Greg > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
