Distcp is the simplest approach you can use (it will copy data parallely using mappers and reducers).
On Thu, Mar 14, 2013 at 12:16 PM, Vinod Kumar Vavilapalli < [email protected]> wrote: > > Copy data into one of the clusters using distcp *without* downtime > (assuming you have enough capacity) and then merge the clusters? > > Thanks, > +Vinod Kumar Vavilapalli > Hortonworks Inc. > http://hortonworks.com/ > > On Mar 13, 2013, at 9:38 PM, Shashank Agarwal wrote: > > Hey Guys, > > I have two different hadoop clusters in production. One cluster is used as > backing for HBase and the other for other things. Both hadoop clusters are > using the same version 1.0 and I want to merge them and make them one. I > know, one possible solution is to copy the data across, but the data is > really huge on these clusters and it will hard for me to compromise with > huge downtime. > Is there any optimal way to merge two hadoop clusters. > > ~Shashank > > > -- Thanks and Regards, VIVEK KOUL
