Duo Zhang created HBASE-21885:
---------------------------------

             Summary: Cancel remote procedure call if the remote procedure is 
succeeded
                 Key: HBASE-21885
                 URL: https://issues.apache.org/jira/browse/HBASE-21885
             Project: HBase
          Issue Type: Improvement
          Components: proc-v2
            Reporter: Duo Zhang


I used to think it could rarely rarely happen that a region server can report 
back to master but master can not get the response from region server, only if 
there are strange network errors. But when implementing HBASE-21875, I found a 
way to reproduce the problem without any strange network issues.

First time, we send the request to region server, and it accept the request, 
but before returning, there is a network error cause the connection to be 
broken, so master  will try to send the request to the region server again. But 
then the region server gets too busy, and always returns 
CallQueueTooBigException, then the master will retry forever, even if the 
region has already been opened on the region server.

And this is not only waste more resources, as later we may close the region on 
the region server, and if the region server is back, we will receive an open 
region requst and a close region request at the same time. Not sure if this 
will cause any problems but at least, we haven't thought this condition yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to