Andrew Purtell created HBASE-18116:
--------------------------------------

             Summary: Replication buffer quota accounting should not include 
bulk transfer hfiles
                 Key: HBASE-18116
                 URL: https://issues.apache.org/jira/browse/HBASE-18116
             Project: HBase
          Issue Type: Bug
          Components: Replication
            Reporter: Andrew Purtell


In ReplicationSourceWALReaderThread we maintain a global quota on enqueued 
replication work for preventing OOM by queuing up too many edits into queues on 
heap. When calculating the size of a given replication queue entry, if it has 
associated hfiles (is a bulk load to be replicated as a batch of hfiles), we 
get the file sizes and include the sum. We then apply that result to the quota. 
This isn't quite right. Those hfiles will be transferred by the sink as a file 
copy and then the files will be bulk loaded on the sink. The cells in those 
files are not queued in memory at the source and therefore shouldn't be counted 
against the quota.

Related, the sum of the hfile sizes are also included when checking if queued 
work exceeds the configured replication queue capacity, which is by default 64 
MB. HFiles are commonly much larger than this. 

So what happens is when we encounter a bulk load replication entry typically 
both the quota and capacity limits are exceeded, we break out of loops, and 
send right away. What is transferred on the wire via HBase RPC though has only 
a partial relationship to the calculation. 

Depending how you look at it, it makes sense to factor hfile file sizes against 
replication queue capacity limits. The sink will be occupied transferring those 
files at the HDFS level. Anyway, this is how we have been doing it and it is 
too late to change now. I do not however think it is correct to apply hfile 
file sizes against a quota for in memory state on the source. The source 
doesn't queue or even transfer those bytes. 

Something I noticed while working on HBASE-18027.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to