Hi,

After an upgrade of hadoop and hbase (to 0.90.4-cdh3u2) from 0.90 hbase and 0.20-append hadoop on a single node test installation everything worked fine initially.

Then there was some DNS changes and host name changes which resulted in a lot "hostname <oldname> cannot be resolved" problems in the logs and the master web interface would only show a stack trace from a bad lookup ("hostname can't be null").

So I changed the host name back to its old name. All configuration in hbase/hadoop points to localhost (i.e. in the *-site.xml, slaves, masters, regionservers). We are running distributed mode (but on one machine).

Now, the HMaster process does come up again somewhat and I get the web interface, but it stays in

Currently running tasks:

Master Startup, Assigning ROOT region, 10050s

i.e. it never continues.

The hadoop data directory was not touched other than during the upgrade which completed successfully and data was available after the upgrade. No, of course no data is available anymore.

This is really scary, as we plan to do similar upgrades in production environments and I would like to understand what could possibly screw things up to badly.

The HMaster log shows things like:


2012-01-17 12:48:39,712 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: -ROOT-,,0.70236052 state=OPEN, ts=1326707373329, server=application1,60020,1326707331441 2012-01-17 12:48:39,713 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything

The region server log has a lot of these:

2012-01-17 13:03:36,774 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: -ROOT-,,0

Help would be great!!

Thanks,
  Henning

Reply via email to