Hey Vidhya, What version are you on, again? If you're on 0.89, the "hbase hbck" utility might be of use here.
Any logs in that server that pertain to the given region name? Any exceptions there? What if you run the shell with HBASE_ROOT_LOGGER=DEBUG,console set so that you see the debug output as it retries? -Todd On Thu, Jul 29, 2010 at 12:31 PM, Vidhyashankar Venkataraman < [email protected]> wrote: > I have an MR job that sends streams of updates (puts and deletes) to an > existing db and all the tasks are crashing complaining of the exceptions > similar to the following: > > > > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server Some server, retryOnlyOne=true, index=0, islastrow=false, > tries=9, numtries=10, i=78, listsize=390, > region=DocData,0000001013071992,1279835733117 for region > DocData,0000001013071992,1279835733117, row '0000001013115520', but failed > after 10 attempts. > > > > I ran this job on 180 nodes with a max of 6 tasks per node; I thought this > was possibly due to overload so I ran it with just 2 tasks per node but > again got similar exceptions.. > > Then I tried issuing a put on the hbase shell: And it complained of the > same issue.. > > I checked the meta table entry and it seems fine.. I checked the > corresponding region server (web ui) and it is indeed hosting the region. > > > > DocData,0000001013071992,12 column=info:regioninfo, > timestamp=1280305164242, value=REGION => {NAME => 'DocDat > 79835733117 a,0000001013071992,1279835733117', STARTKEY => > '0000001013071992', ENDKEY => '000 > 0001013205991', ENCODED => 1962005300, TABLE => > {{NAME => 'DocData', MAX_FILESIZE > => '4402341480', FAMILIES => [{NAME => > 'bigColumn', VERSIONS => '1', COMPRESSION > => 'NONE', TTL => '2147483647', BLOCKSIZE => > '1048576', IN_MEMORY => 'false', BL > OCKCACHE => 'false'}]}} > DocData,0000001013071992,12 column=info:server, timestamp=1280317959911, > value=63.250.207.87:60020 > 79835733117 > DocData,0000001013071992,12 column=info:serverstartcode, > timestamp=1280317959911, value=1279926520261 > 79835733117 > > > Can you see what is wrong here? > > Thank you > Vidhya > -- Todd Lipcon Software Engineer, Cloudera
