Moving to [email protected] as your question is CDH related. My answers inline:
On Tue, Jan 22, 2013 at 4:35 AM, Dheeren bebortha <[email protected]>wrote: > I am trying to upgrade a Hadoop Cluster with 0.20.X and MRv1 to a hadoop > Cluster with CDH412 with HA+QJM+YARN (aka Hadoop 2.0.3) without any data > loss and minimal down time. The documentation on cloudera site iis OK, but > very confusing. BTW I do not plan on using Cloudera manager. Has anyone > attempted a clean upgrade using hadoop native commands? > The upgrade process for any 0.20/1.x/CDH3 release to CDH4 is documented at https://ccp.cloudera.com/display/CDH4DOC/Upgrading+from+CDH3+to+CDH4. The only difference you may see is in use of packaging (tarballs or RPMs/DEBs?) and therefore, of usernames used in the guide. The basic process is to stop the older HDFS, remove older installation and all its traces carefully, and start the newer HDFS with the -upgrade flag. This takes care of HDFS metadata upgrades. Once done and you've verified that files/etc. are all perfectly readable and state's good, you can dfsadmin -finalizeUpgrade your cluster to commit the upgrade permanently. QJM is documented in a separate guide, found on the same portal mentioned above and can be upgraded in a second step after upgrade, to achieve full HA. For MR side, all your MR1 jobs will need to be recompiled before they may be run on the newer YARN+MR2 cluster due to some binary incompatible changes made between the versions you're upgrading. Other than a recompile, you may mostly not require to do anything else. May we also know your reason to not use CM when its aimed to make all this much easier to do and manage? We appreciate any form of feedback, thanks! -- Harsh J
