so it is very wired that in parts of my servers, I didnot get the error and so the cache was not cleaned.
2012/8/10 N Keywal <[email protected]> > If it's a single row, I would expect the server to return the error > immediately. Then you will have the sleep I was mentioning previously, > but the cache should be cleaned before the sleep... > > On Fri, Aug 10, 2012 at 1:32 PM, deanforwever2010 > <[email protected]> wrote: > > hi, Keywal > > my hbase version is 0.94, > > my query is just to get limited columns of a row, > > I make a callable task of 1.5 seconds, so maybe it didnot fail but > > canceled by my process,so the region cache didnot clear after many > requests > > happened. > > my question is why should it take so long time for failure? and it behave > > different between my servers, and there is no problem with network. > > > > 2012/8/10 N Keywal <[email protected]> > > > >> Hi, > >> > >> What are your queries exactly? What's the HBase version? > >> > >> The mechanism is: > >> - There is a location cache, per HConnection, on the client > >> - The client first tries the region server in its cache > >> - if it fails, the client removes this entry from the cache and enters > >> the retry loop > >> - there is a limited amount of retries and a sleep between the retries > >> - most of the times, the client will connect to meta to get the new > >> location > >> > >> When there are multiple queries, before HBASE-5924, the errors will be > >> analyzed after the other regions servers has returned as well. It > >> could be an explanation. HBASE-5877 exists as well, but only for > >> moves, not for splits... > >> > >> Cheers, > >> > >> N. > >> > >> > >> On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010 > >> <[email protected]> wrote: > >> > on the region server's log :2012-08-10 11:49:50,796 DEBUG > >> > org.apache.hadoop.hbase.regionserver.HRegionServer: > >> > NotServingRegionException; Region is not online: > >> > test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b. > >> > > >> > after region split, client didnt get result after timeout setting(1.5 > >> > second),then the task is canceled by my program, so the > >> HConnectionManager > >> > didnt delete the cachedLocation; > >> > the client still query the old region id which is no more exists > >> > > >> > And more, part of my processes updated the region location info, part > >> > not.I'm sure the network is fine; > >> > > >> > how to fix the problem?why does it need so long time to detect the new > >> > regions? > >> >
