Re: What happens when a node goes down?

Jeff Peck Mon, 24 Mar 2014 11:24:27 -0700

Thank you, this has been very helpful. I appreciate all of the information and 
the quick responses.


- Jeff


On Mar 24, 2014, at 2:20 PM, Seth Thomas <[email protected]> wrote:

> To further elaborate:
> 
> When a nodes fails (or is simply too slow) for a PUT request, that data will 
> be placed on the first fallback in the preflist as per [1]. For a GET the 
> request only needs to full fill the R[2] value which for an N of 3, is 2. So 
> the GET request would succeed by simply taking the response from the other 
> two active primaries and then read-repair[3] the value to the fallback.
> 
> Hopefully that makes it a little bit more clear the moving pieces in a 
> failure scenario.
> 
> [1] 
> http://docs.basho.com/riak/latest/theory/concepts/Replication/#Processing-partition-requests
> [2] 
> http://docs.basho.com/riak/latest/theory/concepts/Eventual-Consistency/#Replication-properties-and-request-tuning
> [3] http://docs.basho.com/riak/latest/theory/concepts/Replication/#Read-Repair
> 
> 
> 
> On March 24, 2014 at 10:57:02, Jeff Peck ([email protected]) wrote:
> 
>> Aha, so if a node is detected to go down, but the cluster is not currently 
>> receiving any requests (i.e. on a  development cluster that is used to store 
>> data that is only requested periodically or in batches, etc.) then there 
>> would not be any increased I/O unless it is manually ("administratively") 
>> removed?
>> 
>> - Jeff
>> 
>> 
>> 
>> On Mar 24, 2014, at 1:53 PM, Seth Thomas <[email protected]> wrote:
>> 
>>> As immediately as the cluster can detect the node is no longer serving 
>>> requests[1]. There will likely be increased network and IO among the 
>>> remaining nodes as they will be picking up the slack. That said, the data 
>>> is not permanently reshuffled at that point - only such time as it is 
>>> administratively removed. The degree to which you’d see a spike depends on 
>>> the volume of objects, # of physical nodes, and # of partitions/vnodes.
>>> 
>>> [1] 
>>> http://docs.basho.com/riak/latest/theory/concepts/Replication/#Processing-partition-requests
>>> 
>>> On March 24, 2014 at 10:43:36, Jeff Peck ([email protected]) wrote:
>>> 
>>>> Does that happen immediately? I am basically trying to understand: When a 
>>>> physical node goes down (let's say it is temporarily restarted, or down 
>>>> for even a couple hours due to some sort of failure), will that cause an 
>>>> increase in disk and network bandwidth at the moment that it goes down as 
>>>> data is re-shuffled across the cluster?
>>>> 
>>>> Thanks,
>>>> Jeff
>>>> 
>>>> 
>>>> On Mar 24, 2014, at 1:38 PM, Seth Thomas <[email protected]> wrote:
>>>> 
>>>>> Data is redistributed temporarily (indefinitely) until the primary node 
>>>>> comes back online. So primary ownership of data would not be changed but 
>>>>> your keys could be living on another physical node if any of the primary 
>>>>> replicas were down.
>>>>> 
>>>>> So to answer your question directly: Yes (in the narrowest definition)
>>>>> 
>>>>> 
>>>>> On March 24, 2014 at 10:34:22, Jeff Peck ([email protected]) wrote:
>>>>> 
>>>>>> Thank you. So, does that mean that no redistribution of data would occur 
>>>>>> unless the node is manually removed?
>>>>>> 
>>>>>> - Jeff
>>>>>> 
>>>>>> 
>>>>>> On Mar 24, 2014, at 1:31 PM, Seth Thomas <[email protected]> wrote:
>>>>>> 
>>>>>>> Jeff,
>>>>>>> 
>>>>>>> When a node is no longer responding a process called hinted handoff[1] 
>>>>>>> takes over and ensure that your N (replication) value is met by 
>>>>>>> allowing other nodes to temporarily take responsibility for the vnodes 
>>>>>>> of the downed node. This node can return to the cluster and will resume 
>>>>>>> operations for the vnodes it’s primarily responsible for or you could 
>>>>>>> remove the node[2] from the cluster which would redistribute the 
>>>>>>> primary responsibly among the remaining nodes. I’d also give our docs 
>>>>>>> on replication[3] a look for more information.
>>>>>>> 
>>>>>>> Seth Thomas
>>>>>>> 
>>>>>>> [1] 
>>>>>>> http://docs.basho.com/riak/latest/theory/concepts/glossary/#Hinted-Handoff
>>>>>>> [2] 
>>>>>>> http://docs.basho.com/riak/latest/ops/running/nodes/adding-removing/#Removing-a-Node-From-a-Cluster
>>>>>>> [3] http://docs.basho.com/riak/latest/theory/concepts/Replication/
>>>>>>> 
>>>>>>> 
>>>>>>> On March 24, 2014 at 9:33:17, Jeff Peck ([email protected]) wrote:
>>>>>>> 
>>>>>>>> Is there a description of what happens internally when a node goes 
>>>>>>>> down? I am curious if any there would be any sort of reshuffling or 
>>>>>>>> redistribution of data in the remaining vnodes? Or would the node 
>>>>>>>> simply be unavailable until restarted? 
>>>>>>>> 
>>>>>>>> Thanks, 
>>>>>>>> Jeff 
>>>>>>>> _______________________________________________ 
>>>>>>>> riak-users mailing list 
>>>>>>>> [email protected] 
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: What happens when a node goes down?

Reply via email to