This problem is "normal" or at least "expected" since Hadoop doesn't support fsSync, so the last edits that went to a region server are lost. Fortunately this is finally fixed in the upcoming release.
J-D On Sat, Jul 10, 2010 at 3:15 AM, Jamie Cockrill <[email protected]> wrote: > Arun, > > I had a very similar issue with my cluster when the regionserver with > the .META. table on it crashed. It crippled the cluster for a while, > but after shutting various things down and restarting them again, it > seemed to work itself out eventually. I had to do this a few times and > unfortunately I didn't keep a record of the order in which I shut > things down and restarted them. > > The problem seemed to stem from the master thinking that META was > stored on a node and that node having no knowledge of ever having held > it. I tried a few major_compact of META, hoping that would fix it, but > each failed with the same exception as below. The weird thing was that > I could see (through the web UI on master) that META was now being > held on a different regionserver. > > I wouldn't necessarily follow my lead in randomly shutting things down > and hoping for the best as it may well have been something entirely > different that fixed the issue in the end. If all else fails, try > restarting the master and the regionservers a few times and see if > that works out the kinks. > > thanks > > Jamie > > > > On 10 July 2010 04:48, Ryan Rawson <[email protected]> wrote: >> Others will have to chime in for details, but typically this means you >> are having DNS issues. That is the hostname is resolving to an ip and >> not resolving back to the same name or vice versa or any other combo >> of non-roundtripping involving ip and dns names. >> >> -ryan >> >> On Fri, Jul 9, 2010 at 6:41 PM, Arun Ramakrishnan >> <[email protected]> wrote: >>> I shutdown hbase. Added some new nodes to hdfs, rebalanced. Also added >>> those nodes to hbase regionservers. >>> Then started hbase. >>> >>> I am having this strange problem where the new nodes let's say host1 thru >>> host4 gets repeatedly reported/added to the regionservers list. >>> >>> Initially when I did a "report 'simple'" from the shell, it showed me 10 >>> unique hosts. Then within a matter of minutes it grew to 17 ( with the >>> newly added hosts repeating multiple times). >>> >>> Also, the web UI failed with the following error. >>> >>> ############## >>> HTTP ERROR: 500 >>> Trying to contact region server 192.168.130.63:60020 for region .META.,,1, >>> row '', but failed after 3 attempts. >>> Exceptions: >>> org.apache.hadoop.hbase.NotServingRegionException: >>> org.apache.hadoop.hbase.NotServingRegionException: .META.,,1 >>> at >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1845) >>> at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) >>> at >>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >>> ############### >>> >>> >>> Any insight into why the regions get repeated multiple times. I did a >>> hadoop fsck / and it reports that all the blocks have been replicated 3 >>> times ( the configured value ). >>> >>> >>> Thanks >>> Arun >>> >> >
