Hi Chathuri!

Technically there is a rollback option during upgrade. I don't know how
well it has been tested, but the idea is that old metadata is not deleted
until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
fairly confident that the HDFS upgrade will work smoothly. We have upgraded
quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
having to roll back). Its your applications that work on top of HDFS and
YARN that I'd be concerned about.

HTH
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <[email protected]>
wrote:

> Thanks for information Ravi. Is there a way that I can back up data before
> the  update ? I was thinking about this approach..
>
> Copy the current hadoop directories to a new set of directories.
> Point hadoop to this new set
> Start the migration with the backup set
>
> Please let me know if people have done this upgrade successfully. I
> believe many things can go wrong in a lengthy upgrade like this. The data
> in the cluster is very important.
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <[email protected]>
> wrote:
>
>> Hi Chathuri!
>>
>>    - When we upgrade, does it change the namenode data structures and
>>    data nodes? I assume it only changes the name node...
>>
>> It changes the NN as well as DN layout. As a matter of fact, this upgrade
>> will take a long time on Datanodes as well because of
>> https://issues.apache.org/jira/browse/HDFS-6482
>>
>>    - What are the risks with this upgrade ?
>>
>> What Hadoop applications do you run on top of your cluster? The hope is
>> that everything continues working smoothly for the most part, but
>> inevitably some backward incompatible changes creep in.
>>
>>    - Is there a place where I can review the changes made to file system
>>    from 2.5.1 to 2.7.2?
>>
>> The release notes. http://hadoop.apache.org/releases.html .You'd have to
>> accumulate all the changes in the versions.
>>
>> Practically, I'd try to run my application on your upgraded test cluster.
>>
>> HTH
>>
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> We have a hadoop production deployment with 1 name node and 10 data
>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>
>>> I followed the following link (
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>> and updated a single node system running in pseudo distributed mode and it
>>> went without any issues. But this system did not have that much data as the
>>> production system.
>>>
>>> Since this is a production system, I'm reluctant to do this update. I
>>> would like to see what other people have done in these cases and their
>>> experiences... Here are few questions I have..
>>>
>>>    - When we upgrade, does it change the namenode data structures and
>>>    data nodes? I assume it only changes the name node...
>>>    - What are the risks with this upgrade ?
>>>    - Is there a place where I can review the changes made to file
>>>    system from 2.5.1 to 2.7.2?
>>>
>>> I would really appreciate if you can share your experiences.
>>>
>>> Thanks in advance,
>>> Chathuri
>>>
>>
>>
>

Reply via email to