Hi, Sorry for the multiple e-mails, it seems gmail didn't send my whole message last time! Anyway here it goes again...
Whilst loading data via a mapreduce job into HBase I have started getting this error :- org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=false, tries=9, numtries=10, i=0, listsize=19, region=source_documents,ipubmed\x219915054,1274525958679 for region source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162', but failed after 10 attempts. Exceptions: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1166) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609) In the master there are the following three regions :- source_documents,ipubmed\x219859228,1274701893687 hadoop1 1825870642 ipubmed\x219859228 ipubmed\x219915054 source_documents,ipubmed\x219915054,1274525958679 hadoop4 193393334 ipubmed\x219915054 u102193588 source_documents,u102193588,1274486550122 hadoop4 2141795358 u102193588 u105043522 and on one of our 5 nodes I found a region which start with ipubmed\x219915054 and ends with u102002564 and on another I found the other half of the split which starts with u102002564 and ends with u102193588 So it seems that the middle region on the master was split apart but that failed to reach the master. We've had a few problems over the last few days with hdfs nodes failing due to lack of memory which has now been fixed but could have been a cause of this problem. What ways can a split fail to be received by the master and how long would it take for hbase to fix this? I've read it periodically will scan the META table to find problems like this but didn't say how often? It has been about 12h here and our cluster didn't appear to have fixed this missing split, is there a way to force the master to rescan the META table? Will it fix problems like this given time? Thanks, -- Dan Harvey | Datamining Engineer www.mendeley.com/profiles/dan-harvey Mendeley Limited | London, UK | www.mendeley.com Registered in England and Wales | Company Number 6419015
