Arun, I had a very similar issue with my cluster when the regionserver with the .META. table on it crashed. It crippled the cluster for a while, but after shutting various things down and restarting them again, it seemed to work itself out eventually. I had to do this a few times and unfortunately I didn't keep a record of the order in which I shut things down and restarted them.
The problem seemed to stem from the master thinking that META was stored on a node and that node having no knowledge of ever having held it. I tried a few major_compact of META, hoping that would fix it, but each failed with the same exception as below. The weird thing was that I could see (through the web UI on master) that META was now being held on a different regionserver. I wouldn't necessarily follow my lead in randomly shutting things down and hoping for the best as it may well have been something entirely different that fixed the issue in the end. If all else fails, try restarting the master and the regionservers a few times and see if that works out the kinks. thanks Jamie On 10 July 2010 04:48, Ryan Rawson <[email protected]> wrote: > Others will have to chime in for details, but typically this means you > are having DNS issues. That is the hostname is resolving to an ip and > not resolving back to the same name or vice versa or any other combo > of non-roundtripping involving ip and dns names. > > -ryan > > On Fri, Jul 9, 2010 at 6:41 PM, Arun Ramakrishnan > <[email protected]> wrote: >> I shutdown hbase. Added some new nodes to hdfs, rebalanced. Also added those >> nodes to hbase regionservers. >> Then started hbase. >> >> I am having this strange problem where the new nodes let's say host1 thru >> host4 gets repeatedly reported/added to the regionservers list. >> >> Initially when I did a "report 'simple'" from the shell, it showed me 10 >> unique hosts. Then within a matter of minutes it grew to 17 ( with the newly >> added hosts repeating multiple times). >> >> Also, the web UI failed with the following error. >> >> ############## >> HTTP ERROR: 500 >> Trying to contact region server 192.168.130.63:60020 for region .META.,,1, >> row '', but failed after 3 attempts. >> Exceptions: >> org.apache.hadoop.hbase.NotServingRegionException: >> org.apache.hadoop.hbase.NotServingRegionException: .META.,,1 >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1845) >> at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ############### >> >> >> Any insight into why the regions get repeated multiple times. I did a >> hadoop fsck / and it reports that all the blocks have been replicated 3 >> times ( the configured value ). >> >> >> Thanks >> Arun >> >
