Hi,
After an upgrade of hadoop and hbase (to 0.90.4-cdh3u2) from 0.90 hbase
and 0.20-append hadoop on a single node test installation everything
worked fine initially.
Then there was some DNS changes and host name changes which resulted in
a lot "hostname <oldname> cannot be resolved" problems in the logs and
the master web interface would only show a stack trace from a bad lookup
("hostname can't be null").
So I changed the host name back to its old name. All configuration in
hbase/hadoop points to localhost (i.e. in the *-site.xml, slaves,
masters, regionservers). We are running distributed mode (but on one
machine).
Now, the HMaster process does come up again somewhat and I get the web
interface, but it stays in
Currently running tasks:
Master Startup, Assigning ROOT region, 10050s
i.e. it never continues.
The hadoop data directory was not touched other than during the upgrade
which completed successfully and data was available after the upgrade.
No, of course no data is available anymore.
This is really scary, as we plan to do similar upgrades in production
environments and I would like to understand what could possibly screw
things up to badly.
The HMaster log shows things like:
2012-01-17 12:48:39,712 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out: -ROOT-,,0.70236052 state=OPEN, ts=1326707373329,
server=application1,60020,1326707331441
2012-01-17 12:48:39,713 ERROR
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
for too long, we don't know where region was opened so can't do anything
The region server log has a lot of these:
2012-01-17 13:03:36,774 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online: -ROOT-,,0
Help would be great!!
Thanks,
Henning