Re: Whole cluster times out if one node is gone

Dan Reverri Mon, 29 Nov 2010 10:47:36 -0800

If I continuously read from the node that I am rebooting, the request made
to that node hangs until the client times out, subsequent requests receive a
"Failed to connect" error.


I am using curl for my tests.

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
[email protected]


On Mon, Nov 29, 2010 at 10:27 AM, Jay Adkisson <[email protected]> wrote:

> Hm, that's curious.  Are you rebooting the physical machine?  When you
> reboot one of the nodes, what happens to HTTP calls to that node?  Do they
> immediately error, or do they hang indefinitely?
>
> In the meanwhile, I'll add some logging so I can see whether I'm timing out
> on the writes as well, and I'll see what happens with different keys.
>
> Thanks,
> --Jay
>
>
> On Mon, Nov 29, 2010 at 10:02 AM, Dan Reverri <[email protected]> wrote:
>
>> Hi Jay,
>>
>> I'm not able to reproduce the behavior you are seeing. Here is what I am
>> doing to try to reproduce the issue:
>> 1. Setup a 4 node cluster
>> 2. Continuously write a new object to Riak every 0.5 second
>> 3. Continuously read a known object (GET riak/test/1) from Riak every 0.5
>> second
>> 4. Reboot one of the nodes
>>
>> The reads and writes continue working normally when rebooting the node.
>>
>> Do you see timeouts while writing objects to Riak?
>> Can you try reading other objects from Riak during the reboot (i.e.
>> different keys)?
>>
>> Thanks,
>> Dan
>>
>> Daniel Reverri
>> Developer Advocate
>> Basho Technologies, Inc.
>> [email protected]
>>
>>
>> On Mon, Nov 29, 2010 at 9:39 AM, Jay Adkisson <[email protected]> wrote:
>>
>>> Hey Dan/Sean,
>>>
>>> Thanks for the response.  sasl-error.log on node A is completely empty,
>>> and I see this pattern in erlang.log:
>>>
>>> ===== ALIVE Tue Nov 23 12:46:57 PST 2010
>>>
>>> ===== Tue Nov 23 12:57:36 PST 2010
>>>
>>> =ERROR REPORT==== 23-Nov-2010::12:57:36 ===
>>>  ** Node 'riak@<node D>' not responding **
>>> ** Removing (timedout) connection **
>>>
>>> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
>>> Starting handoff of partition riak_kv_vnode
>>> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>>>
>>> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
>>> Handoff of partition riak_kv_vnode
>>> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>>> completed: sent 1 objects in 0.02 seconds
>>> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
>>> Starting handoff of partition riak_kv_vnode
>>> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>>>
>>> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
>>> Handoff of partition riak_kv_vnode
>>> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>>> completed: sent 5 objects in 0.03 seconds
>>> =INFO REPORT==== 23-Nov-2010::12:59:20 ===
>>> Starting handoff of partition riak_kv_vnode
>>> 525227150915793236229449236757414210188850757632 to 'riak@<node D>'
>>>
>>> <handoffs, etc...>
>>>
>>> This is my testing process: I'm doing an initial load into riak of small
>>> image files between 1 and 150K, throttled to two images per second, with
>>> W=1.  In a different terminal, I'm running a wget every second against node
>>> A of one particular image I already know to be in the cluster, again with
>>> R=1.  I'm using R,W=1 because I figured that would reduce the chance of
>>> timing out, and with my data pattern, nothing I write to the cluster will
>>> ever change, so I really don't need to wait for a quorum.
>>>
>>> In response to Sean,
>>>
>>>> 1) Riak detects node outage the same way any Erlang system does - when a
>>>> message fails to deliver, or the heartbeat maintained by epmd fails.  The
>>>> default timeout in epmd is 1 minute, which is probably why you're seeing it
>>>> take 1 minute to be detected.
>>>>
>>> Thanks, this is enlightening.
>>>
>>> 2) If it takes too long (the vnode is overloaded, perhaps, or is just
>>>> starting up as a hint partition) to retrieve from any node, the request can
>>>> time out.
>>>>
>>> That makes sense, but I still wonder why this happens even when the
>>> quorum is already met by the machines that are responding normally?
>>>
>>>
>>>> 3) You could probably configure epmd to timeout sooner, but then you
>>>> become more vulnerable to temporary partitions. YMMV
>>>>
>>> I may try that - it might be a good fit with my data pattern.
>>>
>>> Thanks again,
>>> --Jay
>>>
>>>
>>> On Mon, Nov 29, 2010 at 4:44 AM, David Smith <[email protected]> wrote:
>>>
>>>> On Tue, Nov 23, 2010 at 3:33 PM, Jay Adkisson <[email protected]>
>>>> wrote:
>>>> > (many profuse apologies to Dan - hit "reply" instead of "reply all")
>>>> > Alrighty, I've done a little more digging.  When I throttle the writes
>>>> > heavily (2/sec) and set R and W to 1 all around, the cluster works
>>>> just fine
>>>> > after I restart the node for about 15-20 seconds.  Then the read
>>>> request
>>>> > hangs for about a minute, until node D disappears from connected_nodes
>>>> in
>>>> > riak-admin status, at which point it returns the desired value
>>>> (although
>>>> > sometimes I get a 503):
>>>>
>>>> Are you seeing any error messages in log/erlang.log.* or
>>>> log/sasl-error.log?
>>>>
>>>> Can you expound on your use case a little -- are you doing a large
>>>> insert, or just a random read/write mix? Did you pre-populate the
>>>> dataset? Why are you using r=1, instead of relying on quorom for
>>>> reads?
>>>>
>>>> How are you running the riak-admin status to measure the 15-20 seconds?
>>>>
>>>> Thanks.
>>>>
>>>> D.
>>>>
>>>
>>>
>>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Whole cluster times out if one node is gone

Reply via email to