Hi Chathuri! Technically there is a rollback option during upgrade. I don't know how well it has been tested, but the idea is that old metadata is not deleted until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm fairly confident that the HDFS upgrade will work smoothly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never having to roll back). Its your applications that work on top of HDFS and YARN that I'd be concerned about.
HTH Ravi On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <[email protected]> wrote: > Thanks for information Ravi. Is there a way that I can back up data before > the update ? I was thinking about this approach.. > > Copy the current hadoop directories to a new set of directories. > Point hadoop to this new set > Start the migration with the backup set > > Please let me know if people have done this upgrade successfully. I > believe many things can go wrong in a lengthy upgrade like this. The data > in the cluster is very important. > Thanks, > Chathuri > > On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <[email protected]> > wrote: > >> Hi Chathuri! >> >> - When we upgrade, does it change the namenode data structures and >> data nodes? I assume it only changes the name node... >> >> It changes the NN as well as DN layout. As a matter of fact, this upgrade >> will take a long time on Datanodes as well because of >> https://issues.apache.org/jira/browse/HDFS-6482 >> >> - What are the risks with this upgrade ? >> >> What Hadoop applications do you run on top of your cluster? The hope is >> that everything continues working smoothly for the most part, but >> inevitably some backward incompatible changes creep in. >> >> - Is there a place where I can review the changes made to file system >> from 2.5.1 to 2.7.2? >> >> The release notes. http://hadoop.apache.org/releases.html .You'd have to >> accumulate all the changes in the versions. >> >> Practically, I'd try to run my application on your upgraded test cluster. >> >> HTH >> >> Ravi >> >> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena < >> [email protected]> wrote: >> >>> Hi, >>> >>> We have a hadoop production deployment with 1 name node and 10 data >>> nodes which has more than 20TB of data in HDFS. We are currently using >>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. >>> >>> I followed the following link ( >>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) >>> and updated a single node system running in pseudo distributed mode and it >>> went without any issues. But this system did not have that much data as the >>> production system. >>> >>> Since this is a production system, I'm reluctant to do this update. I >>> would like to see what other people have done in these cases and their >>> experiences... Here are few questions I have.. >>> >>> - When we upgrade, does it change the namenode data structures and >>> data nodes? I assume it only changes the name node... >>> - What are the risks with this upgrade ? >>> - Is there a place where I can review the changes made to file >>> system from 2.5.1 to 2.7.2? >>> >>> I would really appreciate if you can share your experiences. >>> >>> Thanks in advance, >>> Chathuri >>> >> >> >
