Thanks for the breakdown. Is this true in cases where the coordinating node servicing the request is itself not part of the covering set of nodes that will own the key in question? Assume a 5 node cluster. Node 1 receives the request but the key ultimately belongs to nodes 2,3,4. Will node 1 write the key to its disk and return to client (w/dw=1)?
Thank you, Alexander @siculars http://siculars.posterous.com Sent from my iRotaryPhone On May 8, 2013, at 18:29, John Daily <[email protected]> wrote: > As you may have noticed, this week on the blog I've been tackling the deeper > meanings of various behavioral configuration parameters, ranging from ye olde > r/w parameters to the much more obscure basic_quorum. > > After posting today's missive, the esteemed Andrew Thompson noticed that > something I documented was no longer true in v1.3.1, and after some > discussions we realized that this change had implications that needed to be > shared with the community. > > For those unfamiliar: dw is short for durable write, so setting dw values for > a bucket or request indicates how many nodes should have the data saved to > the backend (typically bitcask or leveldb) before the client is sent a > response. > > > tl;dr > == > If you set w=1 for performance reasons, make sure you also set dw=1. > > > Slightly longer version > == > Until 1.3.1, dw (durable write) would be implicitly demoted to have the same > value as w when w was smaller. This is a reasonable optimization for the w=1 > case (dw defaults to quorum, despite what you may have read on > docs.basho.com) but a very unreasonable behavior when someone explicitly > asked for dw=3 without also asking for w=3. > > Now in 1.3.1 dw will be 1 (at a minimum), 2 (by default), and 3 (if > requested) no matter what value is set for w. > > > Cross-referenced version > == > Read http://basho.com/understanding-riaks-configurable-behaviors-part-1/ and > http://basho.com/riaks-config-behaviors-part-2/; the latter should be updated > today to reflect the 1.3.1 behavior. > > Also check back on the blog (http://basho.com/blog) later this week for 2 > more posts in the series. I think you'll enjoy them. > > > The really long version > == > (Actually, this is somewhat tangential to the original point and shorter than > my blog posts, so it's really the longish pedantic version.) > > This explanation assumes default behaviors, such as vnode_vclocks=true and > n_val=3. Vnode-based vector clocks are the defining behavioral characteristic > that makes this flow what it is. > > > When a write request arrives at the coordinating node, contrary to what one > might expect it is not immediately sent to the other 2 nodes with > responsibility over the key. > > Instead, the request is handed to the local vnode mapped to that key, and > until the vnode replies back with a new vector clock, nothing else happens. > > So, the approximate sequence of events: > > 1 Coordinating node receives request > 2 Request is forwarded to local vnode > 3 Local vnode replies with "w" message to the coordinating node indicating > it has received the request > 4 Local vnode creates a new vector clock based on the vclock received with > the request, if any, and possibly impacted by any existing object with the > same key > 5 Local vnode sends the new object to the backend > 6 Local vnode replies with "dw" message and new object to the coordinating > node > 7 If w=1 and dw=1, /now/ the coordinating node replies to the client, with > the new vclock if requested by the client > 8 The coordinating node sends the new object with new vclock to the remote > vnodes that also own the key > 9 Each vnode will reply with a "w" message upon receipt > 10 Each vnode will reply with a "dw" message upon sending the object to its > backend > 11 If w>1 or dw>1, the coordinating node replies to the client once it has > received enough successful replies from the remote vnodes to meet those values > > (This is why it's not meaningful, with vnode_vclocks=true, to set dw=0. It > has a minimum effective value of 1, regardless of what the client or operator > wishes, because the first vnode must construct a new vector clock and store > the object to disk before the client can ever receive a response.) > > And as you can see, all activity before the reply to the client is local to > the coordinating node when w=1 and dw=1, and the response can be sent back to > the client before the request is forwarded to other nodes. > > Prior to 1.3.1, dw would be effectively 1 if w was set to 1. Now, with 1.3.1, > both w and dw must be set to 1 before that optimal response time can be > achieved. > > -John > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
