On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <[email protected]> wrote:
> Hi Ravi, > > Thank you for all the information, Our application is indexing twitter > data to HBase and then do some data analytics on top of that. That's why > HDFS data is very important to us. We cannot tolerate any data loss with > the update. Do you remember how long it took for you to upgrade it from > 2.4.1 to 2.7.1 ? > > Thanks, > Chathuri > > On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <[email protected]> > wrote: > >> Hi Chathuri! >> >> Technically there is a rollback option during upgrade. I don't know how >> well it has been tested, but the idea is that old metadata is not deleted >> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm >> fairly confident that the HDFS upgrade will work smoothly. We have upgraded >> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never >> having to roll back). Its your applications that work on top of HDFS and >> YARN that I'd be concerned about. >> >> HTH >> Ravi >> >> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena < >> [email protected]> wrote: >> >>> Thanks for information Ravi. Is there a way that I can back up data >>> before the update ? I was thinking about this approach.. >>> >>> Copy the current hadoop directories to a new set of directories. >>> Point hadoop to this new set >>> Start the migration with the backup set >>> >>> Please let me know if people have done this upgrade successfully. I >>> believe many things can go wrong in a lengthy upgrade like this. The data >>> in the cluster is very important. >>> Thanks, >>> Chathuri >>> >>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <[email protected]> >>> wrote: >>> >>>> Hi Chathuri! >>>> >>>> - When we upgrade, does it change the namenode data structures and >>>> data nodes? I assume it only changes the name node... >>>> >>>> It changes the NN as well as DN layout. As a matter of fact, this >>>> upgrade will take a long time on Datanodes as well because of >>>> https://issues.apache.org/jira/browse/HDFS-6482 >>>> >>>> - What are the risks with this upgrade ? >>>> >>>> What Hadoop applications do you run on top of your cluster? The hope is >>>> that everything continues working smoothly for the most part, but >>>> inevitably some backward incompatible changes creep in. >>>> >>>> - Is there a place where I can review the changes made to file >>>> system from 2.5.1 to 2.7.2? >>>> >>>> The release notes. http://hadoop.apache.org/releases.html .You'd have >>>> to accumulate all the changes in the versions. >>>> >>>> Practically, I'd try to run my application on your upgraded test >>>> cluster. >>>> >>>> HTH >>>> >>>> Ravi >>>> >>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> We have a hadoop production deployment with 1 name node and 10 data >>>>> nodes which has more than 20TB of data in HDFS. We are currently using >>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. >>>>> >>>>> I followed the following link ( >>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) >>>>> and updated a single node system running in pseudo distributed mode and it >>>>> went without any issues. But this system did not have that much data as >>>>> the >>>>> production system. >>>>> >>>>> Since this is a production system, I'm reluctant to do this update. I >>>>> would like to see what other people have done in these cases and their >>>>> experiences... Here are few questions I have.. >>>>> >>>>> - When we upgrade, does it change the namenode data structures and >>>>> data nodes? I assume it only changes the name node... >>>>> - What are the risks with this upgrade ? >>>>> - Is there a place where I can review the changes made to file >>>>> system from 2.5.1 to 2.7.2? >>>>> >>>>> I would really appreciate if you can share your experiences. >>>>> >>>>> Thanks in advance, >>>>> Chathuri >>>>> >>>> >>>> >>> >> >
