Hi Stack,

finally the migration worked. We copied table data from 0.20.4 hbase cloud
to cdh3u1 cloud (0.90.3 hbase) by using the mozilla approach to copying the
files on hdfs level. (
http://blog.mozilla.com/data/2011/02/04/migrating-hbase-in-the-trenches/)

As described in the post from mozilla it minimizes the downtime. We copied
the data (3TB, 10 tables) 3 times while the old cluster was still running.
Then we disabled all writes to the old cluster and stopped hbase. The last
copy process then only took 10 minutes.

Then we deleted .META. on the new cluster and rebuild it with add_table.rb
script. (this might not be necessary, see comments on mozilla blog). Then we
executed hbase hbck. Most reported errors where related to (old) empty
tabledirs in hdfs (no data only oldlogs present). These we deleted from
hdfs. The remaining errors we fixed by hand (25 errors from 27000 regions).
Most of these where parent regions and both child regions present.

For the next larger migration we hope to use the new replication.

Cheers Matthias


On Mon, Jul 11, 2011 at 9:31 PM, Stack <[email protected]> wrote:

> On Mon, Jul 11, 2011 at 7:39 AM, Mat Hofschen <[email protected]> wrote:
> > Hi Stack,
> > the scan of META does not contain any 'offline' or 'split' attributes.
>
> OK.  So the daughters and parents are all 'online'.
>
> > After executing add_table I restart hbase. Have not used disable/enable.
> >
>
> add_table.rb seems to be picking up parent and daughters?  Remove the
> daughter from the 0.90.x copy of the data.  They should not have been
> taking writes if parent was online.
>

We have actually ended up deleting the parent because the parents where not
taking any writes any more on the old cluster.
(check by looking at dates in dfs)

>
> > What actually happens when I copy the META from old cluster to new
> cluster?
> > META table contains references to old cluster machines. Because of that
> we
> > are using add_table. Is there another way to reuse the META table between
> > the two clusters?
> >
>
> Well, are the old machines online?  The new cluster is trying to
> contact them and failing?  Can you block the new cluster talking to
> the old (IIRC, if the old cluster is reachable, we'll try and talk to
> it and fail because of mismatch in rpc versions... I don't think we
> assume it down unless we get a socket timeout or some such failure).
>
> St.Ack
>

Reply via email to