Thanks for the response.
It's working now. Turns out that the issue is a race condition between the
HBaseAdmin.unassign() call and deleting the associated records in META.
By inserting a "sleep 1" between the two calls all was cleared up.
Just a hunch, but it seems that maybe the behavior of the 'force' parameter in
the unassign() call has changed a bit? I noticed that parameter disappeared
from the corresponding assign() method. At any rate, without the sleep some of
the rows in META don't get deleted and old directories on disk don't get
cleaned up, presumably b/c they're still locked by the master.
The diff of my changes follows in case anybody would find it useful. I did just
use nil for the server info when closing regions so I didn't have to deal with
serverstartcodes which works, but I'm not sure if looking up the server from
META again at this point in the script could open up issues.
clust01:~> diff online_merge.rb online_merge_updated.rb
133c133,134
< server = HServerAddress.new(String.from_java_bytes(value))
---
> serverInfo = String.from_java_bytes(value).split(':')
> server = HServerAddress.new(serverInfo[0],Integer(serverInfo[1]))
221c222
< admin.closeRegion(row , server.getHostname + ":" +
server.getPort.to_s)
---
> admin.closeRegion(row, nil)
257c258
< newHRI = HRegionInfo.new(tableDesc, firstHRI.getStartKey, lastHRI.getEndKey)
---
> newHRI = HRegionInfo.new(tableDesc.getName, firstHRI.getStartKey,
> lastHRI.getEndKey)
269c270
< HRegion.makeColumnFamilyDirs(fs, tableDir, newHRI, family)
---
> HRegion.makeColumnFamilyDirs(fs, tableDir, newHRI, family) if normalRun
300c301,302
< admin.unassign(row, true)
---
> admin.unassign(row,true)
> sleep 1
312c314
< admin.assign(newHRI.getRegionName, true)
---
> admin.assign(newHRI.getRegionName)
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Wednesday, May 02, 2012 11:51 PM
To: [email protected]
Subject: Re: online_merge.rb from HBASE-1621 on 0.92.1
On Wed, May 2, 2012 at 1:31 PM, Karl Kuntz <[email protected]> wrote:
> After looking at this again, the data is intact (I can count/scan all rows),
> and the new regions are loaded on the different region servers, but the web
> UI doesn't show any regions for the table and warnings appear in the log:
>
> 2012-05-02 14:51:43,847 WARN org.apache.hadoop.hbase.master.CatalogJanitor:
> REGIONINFO_QUALIFIER is empty in
> keyvalues={test,,1335982638210.66ebbe65667be38836cfb9ee809b6b48./info:server/1335982868378/Put/vlen=29,
>
> test,,1335982638210.66ebbe65667be38836cfb9ee809b6b48./info:serverstartcode/1335982868378/Put/vlen=8}
>
> At this point I'm wondering what's keeping the web UI from showing the
> regions for the table. Is just the web UI out of sync, or are there other,
> potentially bigger issues as well?
>
WebUI view of cluster is produced via a scan of the .META. table so
its odd that it shows the tables as w/o regions (but you can
scan/count the table anyways -- the client uses .META. table to figure
out whats where stuff is).
Pastebin more of the master log?
St.Ack