Note that a long-running MR service is not a requirement, and that MR can be used just as a speedy facilitator. Nothing's gonna go wrong if you shutdown your MR services right after your parallel copy (via distcp/etc.) has completed.
On Sat, Feb 15, 2014 at 9:39 AM, divye sheth <[email protected]> wrote: > You could try the hadoop distcp command to transfer the hbase directory > from one cluster to other. This does not require u to setup mapreduce, it > will start a mapred job in local mode i.e. single mapper. When copying from > one cluster to another remember not to copy -ROOT- and .META. > I have used this method without facing any data loss. After the copy is > complete start ur new hbase it should be able to read the contents and > build region infornation from new directory. > > Thanks > D > On Feb 14, 2014 5:45 PM, "Samir Ahmic" <[email protected]> wrote: > >> Well that depends on size of your dataset. You can use hadoop -copyToLocal >> to copy /hbase directory to local disk or some other storage device that >> is mounted on your original cluster. Then you can copy /hbase dir to second >> cluster with hadoop -copyFromLocal . Of course this will require that >> source and destionation hbase cluster are offline. I have never used this >> approach but it should work. >> >> Regards >> >> >> >> >> On Fri, Feb 14, 2014 at 11:15 AM, Vimal Jain <[email protected]> wrote: >> >> > Hi Samir, >> > As far as i know all these techniques require map reduce daemons to be up >> > on source and destination cluster. >> > Is there any other solution which does not require map reduce at all ? >> > >> > >> > On Fri, Feb 14, 2014 at 2:41 PM, Samir Ahmic <[email protected]> >> > wrote: >> > >> > > Hi Vimal, >> > > >> > > I have few options how to move data from one hbase cluster to another: >> > > >> > > >> > > 1. You can use org.apache.hadoop.hbase.mapreduce.Export tool to >> export >> > > tables to HDFS and then you can use hadoop distcp to move data to >> > > another >> > > cluster. When data is place on second cluster you can use >> > > org.apache.hadoop.hbase.mapreduce.Import tool to import tables. >> Please >> > > look at http://hbase.apache.org/book.html#export. >> > > 2. Second option is to us CopyTable tool, please look at: >> > > http://hbase.apache.org/book.html#copytable >> > > 3. Third option is to enable hbase Snapshots, create table >> snapshots, >> > > and then use ExportSnapshot tool to move them to second cluster. >> When >> > > snapshots are on second cluster you can clone tables from snapshots. >> > > Please >> > > look: http://hbase.apache.org/book.html#ops.snapshots >> > > >> > > I was using 1 and 3 for moving data between clusters and i in my case 3 >> > was >> > > better solution. >> > > >> > > Regards >> > > Samir >> > > >> > > >> > > >> > > On Fri, Feb 14, 2014 at 8:33 AM, Vimal Jain <[email protected]> wrote: >> > > >> > > > Hi, >> > > > I have Hbase and Hadoop setup in pseudo distributed mode in >> production. >> > > > Now i am planning to move from pseudo distributed mode to fully >> > > distributed >> > > > mode ( 2 node cluster). >> > > > My existing Hbase and Hadoop version are 1.1.2 and 0.94.7. >> > > > And i am planning to have full distributed mode with Hbase version >> > > 0.94.16 >> > > > and Hadoop version ( either 1.X or 2.X , not yet decided ). >> > > > >> > > > What are different ways to copy data from existing setup ( pseudo >> > > > distributed mode ) to this new setup ( 2 node fully distributed >> mode). >> > > > >> > > > Please help. >> > > > >> > > > -- >> > > > Thanks and Regards, >> > > > Vimal Jain >> > > > >> > > >> > >> > >> > >> > -- >> > Thanks and Regards, >> > Vimal Jain >> > >> -- Harsh J
