hi, Keywal my hbase version is 0.94, my query is just to get limited columns of a row, I make a callable task of 1.5 seconds, so maybe it didnot fail but canceled by my process,so the region cache didnot clear after many requests happened. my question is why should it take so long time for failure? and it behave different between my servers, and there is no problem with network.
2012/8/10 N Keywal <[email protected]> > Hi, > > What are your queries exactly? What's the HBase version? > > The mechanism is: > - There is a location cache, per HConnection, on the client > - The client first tries the region server in its cache > - if it fails, the client removes this entry from the cache and enters > the retry loop > - there is a limited amount of retries and a sleep between the retries > - most of the times, the client will connect to meta to get the new > location > > When there are multiple queries, before HBASE-5924, the errors will be > analyzed after the other regions servers has returned as well. It > could be an explanation. HBASE-5877 exists as well, but only for > moves, not for splits... > > Cheers, > > N. > > > On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010 > <[email protected]> wrote: > > on the region server's log :2012-08-10 11:49:50,796 DEBUG > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > NotServingRegionException; Region is not online: > > test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b. > > > > after region split, client didnt get result after timeout setting(1.5 > > second),then the task is canceled by my program, so the > HConnectionManager > > didnt delete the cachedLocation; > > the client still query the old region id which is no more exists > > > > And more, part of my processes updated the region location info, part > > not.I'm sure the network is fine; > > > > how to fix the problem?why does it need so long time to detect the new > > regions? >
