Re: Decommissioning a data node and problems bringing it back online

Stanley Shi Wed, 23 Jul 2014 22:54:27 -0700

which distribution are you using?

Regards,
*Stanley Shi,*




On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <adt...@latech.edu> wrote:

> I should have added this in my first email but I do get an error in the
> data node's log file
>
> '2014-07-12 19:39:58,027 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
> got processed in 1 msecs'
>
>
>
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <adt...@latech.edu> wrote:
>
>> Hello,
>>
>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>> Currently, users can run jobs that use data stored on /hdfs. They are able
>> to access all datanodes/compute nodes except the one being decommissioned.
>>
>> Is this safe to do? Will edited files affect the decommissioning node?
>>
>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
>> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
>> simply wait for log files to report completion. After upgrade, I simply
>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>
>> Also: Under the namenode web interface I just noticed that the node I
>> have decommissioned previously now has 0 Configured capacity, Used,
>> Remaining memory and is now 100% Used.
>>
>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>
>> What steps have I missed in the decommissioning process or while bringing
>> the data node back online?
>>
>>
>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Reply via email to