[jira] [Assigned] (HBASE-18116) Replication source in-memory accounting should not include bulk transfer hfiles

2018-05-15 Thread Xu Cang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang reassigned HBASE-18116:
---

Assignee: Xu Cang

> Replication source in-memory accounting should not include bulk transfer 
> hfiles
> ---
>
> Key: HBASE-18116
> URL: https://issues.apache.org/jira/browse/HBASE-18116
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Xu Cang
>Priority: Major
>
> In ReplicationSourceWALReaderThread we maintain a global quota on enqueued 
> replication work for preventing OOM by queuing up too many edits into queues 
> on heap. When calculating the size of a given replication queue entry, if it 
> has associated hfiles (is a bulk load to be replicated as a batch of hfiles), 
> we get the file sizes and include the sum. We then apply that result to the 
> quota. This isn't quite right. Those hfiles will be pulled by the sink as a 
> file copy, not pushed by the source. The cells in those files are not queued 
> in memory at the source and therefore shouldn't be counted against the quota.
> Related, the sum of the hfile sizes are also included when checking if queued 
> work exceeds the configured replication queue capacity, which is by default 
> 64 MB. HFiles are commonly much larger than this. 
> So what happens is when we encounter a bulk load replication entry typically 
> both the quota and capacity limits are exceeded, we break out of loops, and 
> send right away. What is transferred on the wire via HBase RPC though has 
> only a partial relationship to the calculation. 
> Depending how you look at it, it makes sense to factor hfile file sizes 
> against replication queue capacity limits. The sink will be occupied 
> transferring those files at the HDFS level. Anyway, this is how we have been 
> doing it and it is too late to change now. I do not however think it is 
> correct to apply hfile file sizes against a quota for in memory state on the 
> source. The source doesn't queue or even transfer those bytes. 
> Something I noticed while working on HBASE-18027.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-18116) Replication source in-memory accounting should not include bulk transfer hfiles

2018-04-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-18116:
--

Assignee: (was: Geoffrey Jacoby)

> Replication source in-memory accounting should not include bulk transfer 
> hfiles
> ---
>
> Key: HBASE-18116
> URL: https://issues.apache.org/jira/browse/HBASE-18116
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Andrew Purtell
>Priority: Major
>
> In ReplicationSourceWALReaderThread we maintain a global quota on enqueued 
> replication work for preventing OOM by queuing up too many edits into queues 
> on heap. When calculating the size of a given replication queue entry, if it 
> has associated hfiles (is a bulk load to be replicated as a batch of hfiles), 
> we get the file sizes and include the sum. We then apply that result to the 
> quota. This isn't quite right. Those hfiles will be pulled by the sink as a 
> file copy, not pushed by the source. The cells in those files are not queued 
> in memory at the source and therefore shouldn't be counted against the quota.
> Related, the sum of the hfile sizes are also included when checking if queued 
> work exceeds the configured replication queue capacity, which is by default 
> 64 MB. HFiles are commonly much larger than this. 
> So what happens is when we encounter a bulk load replication entry typically 
> both the quota and capacity limits are exceeded, we break out of loops, and 
> send right away. What is transferred on the wire via HBase RPC though has 
> only a partial relationship to the calculation. 
> Depending how you look at it, it makes sense to factor hfile file sizes 
> against replication queue capacity limits. The sink will be occupied 
> transferring those files at the HDFS level. Anyway, this is how we have been 
> doing it and it is too late to change now. I do not however think it is 
> correct to apply hfile file sizes against a quota for in memory state on the 
> source. The source doesn't queue or even transfer those bytes. 
> Something I noticed while working on HBASE-18027.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-18116) Replication source in-memory accounting should not include bulk transfer hfiles

2018-02-06 Thread Geoffrey Jacoby (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geoffrey Jacoby reassigned HBASE-18116:
---

Assignee: Geoffrey Jacoby

> Replication source in-memory accounting should not include bulk transfer 
> hfiles
> ---
>
> Key: HBASE-18116
> URL: https://issues.apache.org/jira/browse/HBASE-18116
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> In ReplicationSourceWALReaderThread we maintain a global quota on enqueued 
> replication work for preventing OOM by queuing up too many edits into queues 
> on heap. When calculating the size of a given replication queue entry, if it 
> has associated hfiles (is a bulk load to be replicated as a batch of hfiles), 
> we get the file sizes and include the sum. We then apply that result to the 
> quota. This isn't quite right. Those hfiles will be pulled by the sink as a 
> file copy, not pushed by the source. The cells in those files are not queued 
> in memory at the source and therefore shouldn't be counted against the quota.
> Related, the sum of the hfile sizes are also included when checking if queued 
> work exceeds the configured replication queue capacity, which is by default 
> 64 MB. HFiles are commonly much larger than this. 
> So what happens is when we encounter a bulk load replication entry typically 
> both the quota and capacity limits are exceeded, we break out of loops, and 
> send right away. What is transferred on the wire via HBase RPC though has 
> only a partial relationship to the calculation. 
> Depending how you look at it, it makes sense to factor hfile file sizes 
> against replication queue capacity limits. The sink will be occupied 
> transferring those files at the HDFS level. Anyway, this is how we have been 
> doing it and it is too late to change now. I do not however think it is 
> correct to apply hfile file sizes against a quota for in memory state on the 
> source. The source doesn't queue or even transfer those bytes. 
> Something I noticed while working on HBASE-18027.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)