(many profuse apologies to Dan - hit "reply" instead of "reply all")
Alrighty, I've done a little more digging. When I throttle the writes heavily (2/sec) and set R and W to 1 all around, the cluster works just fine after I restart the node for about 15-20 seconds. Then the read request hangs for about a minute, until node D disappears from connected_nodes in riak-admin status, at which point it returns the desired value (although sometimes I get a 503): --2010-11-23 13:*01:28*-- http://<node A>:8098/riak/<bucket>/<key>?r=1 Resolving <node A>... <ip addr> Connecting to <node A>|<ip addr>|:8098... connected. HTTP request sent, awaiting response... *<hang...> *200 OK Length: 3684 (3.6K) [image/jpeg] Saving to: `<key>?r=1' 100%[======================================>] 3,684 --.-K/s in 0s 2010-11-23 13:*02:21* (49.5 MB/s) - `<key>?r=1' saved [3684/3684] --2010-11-23 13:02:23-- http://<node A>:8098/riak/<bucket>/<key>?r=1 Resolving <node A>... <ip addr> Connecting to <node A>|<ip addr>|:8098... connected. HTTP request sent, awaiting response... 200 OK Length: 3684 (3.6K) [image/jpeg] Saving to: `<key>?r=1' 100%[======================================>] 3,684 --.-K/s in 0s 2010-11-23 13:02:23 (220 MB/s) - `<key>?r=1' saved [3684/3684] Afterwards, node D comes back up and re-joins the cluster seamlessly. Any insights? --Jay On Mon, Nov 22, 2010 at 5:59 PM, Jay Adkisson <[email protected]> wrote: > Hey Dan, > > Thanks for the response! I tried it again while watching `riak-admin > status` - basically, it takes about 30 seconds of node C being down before > riak realizes it's gone. During that time, if I'm writing to the cluster at > all (I throttled it to 2 writes per second for testing), both writes and > reads hang indefinitely, and sometimes time out. > > I'm using Ripple to do the writes, and wget to test reads, all on node A > for now, since I know it'll be up. I'm using the default R and W options > for now. > > Thanks for the help and clarification around ringready. > > --Jay > > > On Mon, Nov 22, 2010 at 5:15 PM, Dan Reverri <[email protected]> wrote: > >> Your HTTP calls should not being timing out. Are you sending requests >> directly to the Riak node or are you using a load balancer? How much load >> are you placing on node A? Is it a write only load or are there reads as >> well? Can you confirm "all" requests time out or is it a large subset of the >> requests? How large are the objects being written? Are you setting R and W >> in the request? Are you using a particular client (Ruby, Python, etc.)? Can >> you provide the output of "riak-admin status" from node A? >> >> Regarding the ringready command; that is behaving as I would expect >> considering a node is down. >> >> Thanks, >> Dan >> >> Daniel Reverri >> Developer Advocate >> Basho Technologies, Inc. >> [email protected] >> >> >> On Mon, Nov 22, 2010 at 4:55 PM, Jay Adkisson <[email protected]> wrote: >> >>> Hey all, >>> >>> Here's what I'm seeing: I have four nodes A, B, C, and D. I'm loading >>> lots of data into node A, which is being distributed evenly across the >>> nodes. If I physically reboot node D, all my HTTP calls time out, and >>> `riak-admin ringready` complains that not all nodes are up. Is this >>> intended behavior? Is there a configuration option I can set so it fails >>> more gracefully? >>> >>> --Jay >>> >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
