Re: Decommissioning a data node and problems bringing it back online

andrew touchet Thu, 24 Jul 2014 10:11:04 -0700

Hello Wellington,

That sounds wonderful!  I appreciate everyone's help.


Best Regards,

Andrew Touchet


On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil <
[email protected]> wrote:

> You should not face any data loss. The replicas were just moved away from
> that node to other nodes in the cluster during decommission. Once you
> recommission the node and re-balance your cluster, HDFS will re-distribute
> replicas between the nodes evenly, and the recommissioned node will receive
> replicas from other nodes, but there is no guarantee that exact the same
> replicas that were stored on this node before it was decommissioned will be
> assigned to this node again, after recommission and rebalance.
>
> Cheers,
> Wellington.
>
>
> On 24 Jul 2014, at 17:55, andrew touchet <[email protected]> wrote:
>
> Hi Mirko,
>
> Thanks for the reply!
>
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect
> any data loss due to starting the data node process before running the
> balancing tool?
>
> Best Regards,
>
> Andrew Touchet
>
>
>
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <[email protected]>
> wrote:
>
>> After you added the nodes back to your cluster you run the balancer tool,
>> but it will not bring in exactly the same blocks like before.
>>
>
>> Cheers,
>> Mirko
>>
>>
>>
>> 2014-07-24 17:34 GMT+01:00 andrew touchet <[email protected]>:
>>
>> Thanks for the reply,
>>>
>>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>>> makes a difference.
>>>
>>> Currently I really need to know how to get the data that was replicated
>>> during decommissioning back onto my two data nodes.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, July 24, 2014, Stanley Shi <[email protected]> wrote:
>>>
>>>> which distribution are you using?
>>>>
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <[email protected]>
>>>> wrote:
>>>>
>>>>> I should have added this in my first email but I do get an error in
>>>>> the data node's log file
>>>>>
>>>>> '2014-07-12 19:39:58,027 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>>> got processed in 1 msecs'
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>>> Currently, users can run jobs that use data stored on /hdfs. They are 
>>>>>> able
>>>>>> to access all datanodes/compute nodes except the one being 
>>>>>> decommissioned.
>>>>>>
>>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>>
>>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  
>>>>>> Then
>>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>>
>>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>>> Remaining memory and is now 100% Used.
>>>>>>
>>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>>
>>>>>> What steps have I missed in the decommissioning process or while
>>>>>> bringing the data node back online?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Decommissioning a data node and problems bringing it back online

Reply via email to