Tommy Li created HBASE-21775:
--------------------------------

             Summary: The BufferedMutator doesn't ever refresh region location 
cache
                 Key: HBASE-21775
                 URL: https://issues.apache.org/jira/browse/HBASE-21775
             Project: HBase
          Issue Type: Bug
          Components: Client
            Reporter: Tommy Li
            Assignee: Tommy Li
             Fix For: 3.0.0


{color:#222222}I noticed in some of my writing jobs that the BufferedMutator 
would get stuck retrying writes against a dead server.{color}
{code:java}
19/01/18 15:15:47 INFO [Executor task launch worker for task 0] 
client.AsyncRequestFutureImpl: #2, waiting for 1  actions to finish on table: 
dummy_table
19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: id=2, 
table=dummy_table, attempt=15/21, failureCount=1ops, last 
exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout 
on <SERVER>,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST 2019; 
NOT retrying, failed=1 -- final attempt!
19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] 
IngestRawData.map(): [B@258bc2c7: 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
action: Operation rpcTimeout: 1 time, servers with issues: 
<SERVER>,17020,1547848193782
{code}
 

After the single remaining action permanently failed, it would resume progress 
only to get stuck again retrying against the same dead server:
{code:java}
19/01/18 15:21:18 INFO [Executor task launch worker for task 0] 
client.AsyncRequestFutureImpl: #2, waiting for 1  actions to finish on table: 
dummy_table
19/01/18 15:21:18 INFO [Executor task launch worker for task 0] 
client.AsyncRequestFutureImpl: #2, waiting for 1  actions to finish on table: 
dummy_table
19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: id=2, 
table=dummy_table, attempt=6/21, failureCount=1ops, last 
exception=java.net.ConnectException: Call to <SERVER> failed on connection 
exception: 
org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: 
connection timed out: <SERVER> on <SERVER>,17020,1547848193782, tracking 
started null, retrying after=20089ms, operationsToReplay=1
{code}
 

Only restarting the client process to generate a new BufferedMutator instance 
would fix the issue, at least until the next regionserver crash

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to