[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395517#comment-15395517 ] Guanghao Zhang commented on HBASE-9899: --- [~enis] Did you still work on this? If not, I can take this issue. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Enis Soztutar > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v1.patch > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Enis Soztutar > Attachments: HBASE-9899-v1.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Status: Patch Available (was: Open) > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Enis Soztutar > Attachments: HBASE-9899-v1.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400331#comment-15400331 ] Guanghao Zhang commented on HBASE-9899: --- There are some situation to use these non-idempotent operations (increment/append/checkAndPut/...). When use 0.94, we set not retry for these non-idempotent operations. Now we upgrade our cluster to 0.98 and found that it use nonce to solve this. But it maybe throw OperationConflictException even the increment/append success. A example (client rpc retries number set to 3) is: 1. first increment rpc request success 2. client timeout and send second rpc request success, but nonce is same and save in server. It found it succeed, so return a OperationConflictException to make sure that increment operation only be applied once in server. This patch will solve this problem by read the previous result when receive a duplicate rpc request. 1. Store the mvcc to OperationContext. When first rpc request succeed, store the mvcc for this operation nonce. 2. When there are duplicate rpc request, convert to read result by the mvcc. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Enis Soztutar > Attachments: HBASE-9899-v1.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400331#comment-15400331 ] Guanghao Zhang edited comment on HBASE-9899 at 7/30/16 12:58 AM: - There are some situation to use these non-idempotent operations (increment/append/checkAndPut/...). When use 0.94, we set not retry for these non-idempotent operations. Now we upgrade our cluster to 0.98 and found that it use nonce to solve this. But it maybe throw OperationConflictException even the increment/append success. A example (client rpc retries number set to 3) is: 1. first increment rpc request success 2. client timeout and send second rpc request, but nonce is same and save in server. The server found that it has already succeed, so return a OperationConflictException to make sure that increment operation only be applied once in server. This patch will solve this problem by read the previous result when receive a duplicate rpc request. 1. Store the mvcc to OperationContext. When first rpc request succeed, store the mvcc for this operation nonce. 2. When there are duplicate rpc request, convert to read result by the mvcc. was (Author: zghaobac): There are some situation to use these non-idempotent operations (increment/append/checkAndPut/...). When use 0.94, we set not retry for these non-idempotent operations. Now we upgrade our cluster to 0.98 and found that it use nonce to solve this. But it maybe throw OperationConflictException even the increment/append success. A example (client rpc retries number set to 3) is: 1. first increment rpc request success 2. client timeout and send second rpc request success, but nonce is same and save in server. It found it succeed, so return a OperationConflictException to make sure that increment operation only be applied once in server. This patch will solve this problem by read the previous result when receive a duplicate rpc request. 1. Store the mvcc to OperationContext. When first rpc request succeed, store the mvcc for this operation nonce. 2. When there are duplicate rpc request, convert to read result by the mvcc. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Enis Soztutar > Attachments: HBASE-9899-v1.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400927#comment-15400927 ] Guanghao Zhang commented on HBASE-9899: --- Thanks [~stack]. The fail ut TestResettingCounters seems related to this and it failed in my local machine too. I will try to fix this and upload a new patch. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Enis Soztutar > Attachments: HBASE-9899-v1.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-9899: - Assignee: Guanghao Zhang (was: Enis Soztutar) > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v2.patch Attach a v2 patch. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v3.patch Attach a v3 patch. Fix failed ut TestScannerHeartbeatMessages and TestMultiParallel. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: (was: HBASE-9899-v3.patch) > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v3.patch Attach v3 again and trigger hadoop QA run again. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401588#comment-15401588 ] Guanghao Zhang commented on HBASE-9899: --- It seems not ralated. This patch just read mvcc number from WriteEntry and it didn't change the read/write point directly. This failed ut passed in my local machine and I can't reproduce this failure. Let the hadoop QA run again to see whether it failed. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v3.patch > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401802#comment-15401802 ] Guanghao Zhang commented on HBASE-9899: --- Before this patch, the duplicate non-idempotent operation can't proceed by throw OperationConfictException. So the previous state in NonceManager always is WAIT. After this patch, the duplicate non-idempotent operation can proceed by convert it to a get operation. So the previous state maybe PROCEED (first rpc request success) when end the new get operation. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401828#comment-15401828 ] Guanghao Zhang commented on HBASE-9899: --- I found that it didn't need to call endNonceOperation for the get operation (converted from duplicate non-idempotent). The get operation is not a nonce operation, so the assertion didn't need to comment out. I will attach a new patch and add unit test about apped. Thanks for your review. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v4.patch Fix by comments and add unit test about Append. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-v4.patch Attach v4 again. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403690#comment-15403690 ] Guanghao Zhang commented on HBASE-9899: --- [~yuzhih...@gmail.com] [~stack] Please help to review the new v4 patch. Thanks. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407795#comment-15407795 ] Guanghao Zhang commented on HBASE-9899: --- [~stack] ping.. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408729#comment-15408729 ] Guanghao Zhang commented on HBASE-9899: --- Thanks. I will upload a patch for branch-1 today. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-branch-1.patch Add patch for branch-1. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, > HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, > HBASE-9899-v4.patch, HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-branch-1.patch Attath patch for branch-1 again to trigger HADOOP QA. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, > HBASE-9899-v1.patch, HBASE-9899-v2.patch, HBASE-9899-v3.patch, > HBASE-9899-v3.patch, HBASE-9899-v4.patch, HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-addendum.patch Attach an addendum for master. When get nonce from mutation, it should decide if it has a nonce first. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16368) test*WhenRegionMove in TestPartialResultsFromClientSide is flaky
Guanghao Zhang created HBASE-16368: -- Summary: test*WhenRegionMove in TestPartialResultsFromClientSide is flaky Key: HBASE-16368 URL: https://issues.apache.org/jira/browse/HBASE-16368 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 1.4.0 Reporter: Guanghao Zhang This test fail when Hadoop QA run preCommit: https://builds.apache.org/job/PreCommit-HBASE-Build/2971/testReport/org.apache.hadoop.hbase/TestPartialResultsFromClientSide/testReversedCompleteResultWhenRegionMove/. And I found it is in Flaky Tests Dashboard: http://hbase.x10host.com/flaky-tests/. I run it in my local machine and it may fail, too. Test results show that the region location didn't update when scanner callable get a NotServingRegionException or RegionMovedException. {code} org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Sat Aug 06 05:55:52 UTC 2016, null, java.net.SocketTimeoutException: callTimeout=2000, callDuration=2157: org.apache.hadoop.hbase.NotServingRegionException: testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8233) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2634) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2629) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2623) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2490) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2264) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169) row '' on table 'testReversedCompleteResultWhenRegionMove' at region=testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e., hostname=asf907.gq1.ygridcore.net,38914,1470462943053, seqNum=2 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212) at org.apache.hadoop.hbase.client.ReversedClientScanner.nextScanner(ReversedClientScanner.java:118) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166) at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161) at org.apache.hadoop.hbase.client.ReversedClientScanner.(ReversedClientScanner.java:56) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:785) at org.apache.hadoop.hbase.TestPartialResultsFromClientSide.testReversedCompleteResultWhenRegionMove(TestPartialResultsFromClientSide.java:986) {code} {code} org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Sat Aug 06 16:27:22 CST 2016, null, java.net.SocketTimeoutException: callTimeout=2000, callDuration=3035: Region moved to: hostname=localhost port=58351 startCode=1470472007714. As of locationSeqNum=6. row 'testRow0' on table 'testPartialResultWhenRegionMove' at region=testPartialResultWhenRegionMove,,1470472035048.977faf05c1d6d9990b5559b17aa18913., hostname=localhost,40425,1470472007646, seqNum=2 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301) at org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:247) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:541) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:370) at org.apache.hadoop
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410565#comment-15410565 ] Guanghao Zhang commented on HBASE-9899: --- TestPartialResultsFromClientSide is a flaky test and it didn't related to this patch. I create a new issue HBASE-16368 about it. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: (was: HBASE-9899-branch-1.patch) > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-9899: -- Attachment: HBASE-9899-branch-1.patch Retry. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, > HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, > HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411214#comment-15411214 ] Guanghao Zhang commented on HBASE-9899: --- I run the ut TestClusterId based branch-1 and it failed too. The build history in https://builds.apache.org/job/HBase-1.4/ shows that TestClusterId always failed. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, > HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, > HBASE-9899-v4.patch, HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-15588) Use nonce for checkAndMutate operation
[ https://issues.apache.org/jira/browse/HBASE-15588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-15588: -- Assignee: Guanghao Zhang > Use nonce for checkAndMutate operation > -- > > Key: HBASE-15588 > URL: https://issues.apache.org/jira/browse/HBASE-15588 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Guanghao Zhang > > Like {{increment}}/{{append}}, the {{checkAndPut}}/{{checkAndDelete}} > operation is non-idempotent, so that the client may get incorrect result if > there are retries, and such incorrect result may lead the application enter > an error state. A possible solution is using nonce for checkAndMutate > operations, discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16308) Contain protobuf references
[ https://issues.apache.org/jira/browse/HBASE-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411599#comment-15411599 ] Guanghao Zhang commented on HBASE-16308: I will pick up HBASE-15588 to add nonce for checkAnd* operations. > Contain protobuf references > --- > > Key: HBASE-16308 > URL: https://issues.apache.org/jira/browse/HBASE-16308 > Project: HBase > Issue Type: Sub-task > Components: Protobufs >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: HBASE-16308.master.001.patch, > HBASE-16308.master.002.patch, HBASE-16308.master.003.patch, > HBASE-16308.master.004.patch, HBASE-16308.master.005.patch, > HBASE-16308.master.006.patch, HBASE-16308.master.006.patch, > HBASE-16308.master.007.patch > > > Clean up our protobuf references so contained to just a few classes rather > than being spread about the codebase. Doing this work will make it easier > landing the parent issue and will make it more clear where the division > between shaded protobuf and unshaded protobuf lies (we need to continue with > unshaded protobuf for HDFS references by AsyncWAL and probably EndPoint > Coprocessors) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412948#comment-15412948 ] Guanghao Zhang commented on HBASE-9899: --- Thanks [~stack]. But the master branch need the addendum patch too.. And it has been included in patch for branch-1. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.3.0, 1.4.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, > HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, > HBASE-9899-v4.patch, HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10338) Region server fails to start with AccessController coprocessor if installed into RegionServerCoprocessorHost
[ https://issues.apache.org/jira/browse/HBASE-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414626#comment-15414626 ] Guanghao Zhang commented on HBASE-10338: We found NPE in our 0.98 production cluster because RegionServerCoprocessorHost is not initialized before RpcServer start service. {code} // Try and register with the Master; tell it we are here. Break if // server is stopped or the clusterup flag is down or hdfs went wacky. while (keepLooping()) { RegionServerStartupResponse w = reportForDuty(); if (w == null) { LOG.warn("reportForDuty failed; sleeping and then retrying."); this.sleeper.sleep(); } else { handleReportForDutyResponse(w); break; } } // Initialize the RegionServerCoprocessorHost now that our ephemeral // node was created by reportForDuty, in case any coprocessors want // to use ZooKeeper this.rsHost = new RegionServerCoprocessorHost(this, this.conf); {code} RpcServer start service in handleReportForDutyResponse(), then it can serve rpc call replicateWALEntry(). But the RegionServerCoprocessorHost is not initialized and it is used in replicateWALEntry, so it will throw a NPE. > Region server fails to start with AccessController coprocessor if installed > into RegionServerCoprocessorHost > > > Key: HBASE-10338 > URL: https://issues.apache.org/jira/browse/HBASE-10338 > Project: HBase > Issue Type: Bug > Components: Coprocessors, regionserver >Affects Versions: 0.98.0 >Reporter: Vandana Ayyalasomayajula >Assignee: Vandana Ayyalasomayajula >Priority: Minor > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: 10338.1-0.96.patch, 10338.1-0.98.patch, 10338.1.patch, > 10338.1.patch, HBASE-10338.0.patch, HBASE-10338_addendum.patch > > > Runtime exception is being thrown when AccessController CP is used with > region server. This is happening as region server co processor host is > created before zookeeper is initialized in region server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10338) Region server fails to start with AccessController coprocessor if installed into RegionServerCoprocessorHost
[ https://issues.apache.org/jira/browse/HBASE-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-10338: --- Attachment: HBASE-10338-0.98-addendum.patch Attach a addendum for 0.98 branch. > Region server fails to start with AccessController coprocessor if installed > into RegionServerCoprocessorHost > > > Key: HBASE-10338 > URL: https://issues.apache.org/jira/browse/HBASE-10338 > Project: HBase > Issue Type: Bug > Components: Coprocessors, regionserver >Affects Versions: 0.98.0 >Reporter: Vandana Ayyalasomayajula >Assignee: Vandana Ayyalasomayajula >Priority: Minor > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: 10338.1-0.96.patch, 10338.1-0.98.patch, 10338.1.patch, > 10338.1.patch, HBASE-10338-0.98-addendum.patch, HBASE-10338.0.patch, > HBASE-10338_addendum.patch > > > Runtime exception is being thrown when AccessController CP is used with > region server. This is happening as region server co processor host is > created before zookeeper is initialized in region server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16393) Improve computeHDFSBlocksDistribution
[ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416327#comment-15416327 ] Guanghao Zhang commented on HBASE-16393: +1 on this idea. We found this in our production cluster, too. The balancer is too slow when there are a lot of regions. And some default balancer configs is too small for big cluster. Maybe we can make the default config value related to regions number. > Improve computeHDFSBlocksDistribution > - > > Key: HBASE-16393 > URL: https://issues.apache.org/jira/browse/HBASE-16393 > Project: HBase > Issue Type: Improvement >Reporter: binlijin > Attachments: HBASE-16393.patch > > > With our cluster is big, i can see the balancer is slow from time to time. > And the balancer will be called on master startup, so we can see the startup > is slow also. > The first thing i think whether if we can parallel compute different region's > HDFSBlocksDistribution. > The second i think we can improve compute single region's > HDFSBlocksDistribution. > When to compute a storefile's HDFSBlocksDistribution first we call > FileSystem#getFileStatus(path) and then > FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc > call for every storefile. Instead we can use FileSystem#listLocatedStatus to > get a LocatedFileStatus for the information we need, so reduce the namenode > rpc call to one. This can speed the computeHDFSBlocksDistribution, but also > send out less rpc call to namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16402) RegionServerCoprocessorHost should be initialized before RpcServer starts
[ https://issues.apache.org/jira/browse/HBASE-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418446#comment-15418446 ] Guanghao Zhang commented on HBASE-16402: Thanks [~apurtell]. > RegionServerCoprocessorHost should be initialized before RpcServer starts > - > > Key: HBASE-16402 > URL: https://issues.apache.org/jira/browse/HBASE-16402 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 0.98.22 > > Attachments: HBASE-16402-0.98.patch > > > We found NPE in our 0.98 production cluster because > RegionServerCoprocessorHost is not initialized before RpcServer start service. > {code} > // Try and register with the Master; tell it we are here. Break if > // server is stopped or the clusterup flag is down or hdfs went wacky. > while (keepLooping()) { > RegionServerStartupResponse w = reportForDuty(); > if (w == null) { > LOG.warn("reportForDuty failed; sleeping and then retrying."); > this.sleeper.sleep(); > } else { > handleReportForDutyResponse(w); > break; > } > } > // Initialize the RegionServerCoprocessorHost now that our ephemeral > // node was created by reportForDuty, in case any coprocessors want > // to use ZooKeeper > this.rsHost = new RegionServerCoprocessorHost(this, this.conf); > {code} > RpcServer start service in handleReportForDutyResponse(), then it can serve > rpc call replicateWALEntry(). But the RegionServerCoprocessorHost is not > initialized and it is used in replicateWALEntry, so it will throw a NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
Guanghao Zhang created HBASE-16416: -- Summary: Make NoncedRegionServerCallable extends RegionServerCallable Key: HBASE-16416 URL: https://issues.apache.org/jira/browse/HBASE-16416 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 2.0.0 Reporter: Guanghao Zhang Priority: Minor After HBASE-16308, there are a new class NoncedRegionServerCallable which extends AbstractRegionServerCallable. But it have some duplicate methods with RegionServerCallable. So we can make NoncedRegionServerCallable extends RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16416: --- Status: Patch Available (was: Open) > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16416.patch > > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16416: --- Attachment: HBASE-16416.patch > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16416.patch > > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-16416: -- Assignee: Guanghao Zhang > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16416.patch > > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421961#comment-15421961 ] Guanghao Zhang commented on HBASE-16416: [~stack] Any ideas? > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16416.patch > > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16416: --- Attachment: (was: HBASE-16416.patch) > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16416: --- Attachment: HBASE-16416.patch Retry to trigger ut. > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16416.patch > > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
Guanghao Zhang created HBASE-16446: -- Summary: append_peer_tableCFs failed when there already have this table's partial cfs in the peer Key: HBASE-16446 URL: https://issues.apache.org/jira/browse/HBASE-16446 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.21, 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result doesn't right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Description: {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result is not right. It should not just contain cf A append_peer_tableCFs. was: {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result is not right. It should not just contain cf A after append. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 0.98.21 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Description: {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result is not right. It should not just contain cf A after append. was: {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result doesn't right. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 0.98.21 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Description: {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result is not right. It should not just contain cf A after append_peer_tableCFs. was: {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result is not right. It should not just contain cf A append_peer_tableCFs. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 0.98.21 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Attachment: HBASE-16446.patch Upload a little fix of this. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 0.98.21 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Status: Patch Available (was: Open) > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 0.98.21, 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Affects Version/s: 1.2.3 1.3.1 1.1.6 > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16447) Replication by namespace in peer
Guanghao Zhang created HBASE-16447: -- Summary: Replication by namespace in peer Key: HBASE-16447 URL: https://issues.apache.org/jira/browse/HBASE-16447 Project: HBase Issue Type: New Feature Components: Replication Reporter: Guanghao Zhang Now we only config table cfs in peer. But in our production cluster, there are a dozen of namespace and every namespace has dozens of tables. It was complicated to config all table cfs in peer. For some namespace, it need replication all tables to other slave cluster. It will be easy to config if we support replication by namespace. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
[ https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426304#comment-15426304 ] Guanghao Zhang commented on HBASE-16416: [~stack] take a look of this, thanks. One more question, how can i trigger the hbase-client ut? > Make NoncedRegionServerCallable extends RegionServerCallable > > > Key: HBASE-16416 > URL: https://issues.apache.org/jira/browse/HBASE-16416 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16416.patch > > > After HBASE-16308, there are a new class NoncedRegionServerCallable which > extends AbstractRegionServerCallable. But it have some duplicate methods with > RegionServerCallable. So we can make NoncedRegionServerCallable extends > RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Attachment: (was: HBASE-16446.patch) > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Attachment: HBASE-16446.patch Add ut. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428321#comment-15428321 ] Guanghao Zhang commented on HBASE-16446: It is not a bug. If a table is already in the peer config, append_peer_tableCFs means append more cfs of this table. The result should be null after append table => [], and null or empty cfs of a table in the peer config menas replication all cfs of this table. Then append a cf f1 of this table, the result should be null too because it already has all cfs of this table in the peer config. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
Guanghao Zhang created HBASE-16460: -- Summary: Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine Key: HBASE-16460 URL: https://issues.apache.org/jira/browse/HBASE-16460 Project: HBase Issue Type: Bug Components: BucketCache Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang When bucket cache use FileIOEngine, it will rebuild the bucket allocator's data structures from a persisted map. So it should first read the map from persistence file then use the map to new a BucketAllocator. But now the code has wrong sequence in retrieveFromFile() method of BucketCache.java. {code} BucketAllocator allocator = new BucketAllocator(cacheCapacity, bucketSizes, backingMap, realCacheSize); backingMap = (ConcurrentHashMap) ois.readObject(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Attachment: HBASE-16460.patch Upload a patch to fix this. > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Status: Patch Available (was: Open) > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Affects Version/s: 0.98.22 1.2.3 1.3.1 1.1.6 > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429342#comment-15429342 ] Guanghao Zhang commented on HBASE-16446: Thanks for your review. We can use set_peer_tableCFs command to change full table replication to restricted CF replication. Another way is first remove of the full table replication then append a restricted cf replication. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429346#comment-15429346 ] Guanghao Zhang commented on HBASE-16446: I took a look at the code of remove_peer_tableCFs and thought it have same bug too... {code} hbase> remove_peer_tableCFs '2', { "ns1:table1" => []} {code} I don't know cf set is empty means what in remove_peer_tableCFs command? remove all cfs of this table or remove nothing? > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429349#comment-15429349 ] Guanghao Zhang commented on HBASE-16446: If we keep it same with append_peer_tableCFs, the empty cf set in remove_peer_tableCFs should means remove all cfs of this table too. [~tedyu] [~ashish singhi] What do you think about this? If you guys agree with this, I will upload a little fix for remove_peer_tableCFs too. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Affects Version/s: (was: 1.2.3) (was: 0.98.21) (was: 1.3.1) (was: 1.1.6) > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Attachment: HBASE-16460-v1.patch Upload v1 patch. Add VisibleForTesting and delete cache file after unit test. > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430208#comment-15430208 ] Guanghao Zhang commented on HBASE-16460: bq. So how was it previously before this fix? Before this fix, the code in the constructor never be called because backing map is empty... > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430212#comment-15430212 ] Guanghao Zhang commented on HBASE-16460: bq. If so there is a chance that we will not see a suitable bucket for a block and so we will fail this L2 cache init. We throw Exception! If it can't find a suitable bucket for a block, the block can't be catch before rs restart. So after rs restart, it can't throw exception if the user don't change the bucket sizes config. > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16446: --- Attachment: HBASE-16446-v1.patch Attach v1 patch. Fix the same bug in remove table cfs. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446-v1.patch, HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430295#comment-15430295 ] Guanghao Zhang commented on HBASE-16460: Ok. I took a look about the previous code, it can cache block but can't read block from cache or read wrong data from cache when backing map is inconsistent with bucking allocator. After this fix, the backing map is also inconsistent with bucking allocator if we got exception.. Maybe we should remove the bucket entry from the backing map if we got exception? > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Attachment: HBASE-16460-v2.patch Upload v2 patch. Remove failed rebuild bucket entry from backing map when got exception. > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, > HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Attachment: HBASE-16460-v2.patch Retry. > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, > HBASE-16460-v2.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables
[ https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431886#comment-15431886 ] Guanghao Zhang commented on HBASE-16466: You mean use TableSnapshotScanner to read data from hdfs directly? > HBase snapshots support in VerifyReplication tool to reduce load on live > HBase cluster with large tables > > > Key: HBASE-16466 > URL: https://issues.apache.org/jira/browse/HBASE-16466 > Project: HBase > Issue Type: Improvement > Components: hbase >Affects Versions: 0.98.21 >Reporter: Sukumar Maddineni > > As of now VerifyReplicatin tool is running using normal HBase scanners. If > you want to run VerifyReplication multiple times on a production live > cluster with large tables then it creates extra load on HBase layer. So if we > implement snapshot based support then both in source and target we can read > data from snapshots which reduces load on HBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables
[ https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431893#comment-15431893 ] Guanghao Zhang commented on HBASE-16466: You are not in the list of contributors for the project so you can't see the "Assign to me" button.. > HBase snapshots support in VerifyReplication tool to reduce load on live > HBase cluster with large tables > > > Key: HBASE-16466 > URL: https://issues.apache.org/jira/browse/HBASE-16466 > Project: HBase > Issue Type: Improvement > Components: hbase >Affects Versions: 0.98.21 >Reporter: Sukumar Maddineni > > As of now VerifyReplicatin tool is running using normal HBase scanners. If > you want to run VerifyReplication multiple times on a production live > cluster with large tables then it creates extra load on HBase layer. So if we > implement snapshot based support then both in source and target we can read > data from snapshots which reduces load on HBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
[ https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432460#comment-15432460 ] Guanghao Zhang commented on HBASE-16446: After HBASE-11393, the append tableCfs can be empty. But HBASE-11393 were only pushed to master, so this bug doesn't exist in other branches. Thanks. > append_peer_tableCFs failed when there already have this table's partial cfs > in the peer > > > Key: HBASE-16446 > URL: https://issues.apache.org/jira/browse/HBASE-16446 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16446-v1.patch, HBASE-16446.patch > > > {code} > hbase(main):011:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0080 seconds > hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} > 0 row(s) in 0.0060 seconds > hbase(main):013:0> list_peers > PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH > 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 > 1 row(s) in 0.0030 seconds > {code} > "test_replication" => [] means replication all cf of this table,so the result > is not right. It should not just contain cf A after append_peer_tableCFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-16447) Replication by namespace in peer
[ https://issues.apache.org/jira/browse/HBASE-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-16447: -- Assignee: Guanghao Zhang > Replication by namespace in peer > > > Key: HBASE-16447 > URL: https://issues.apache.org/jira/browse/HBASE-16447 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > > Now we only config table cfs in peer. But in our production cluster, there > are a dozen of namespace and every namespace has dozens of tables. It was > complicated to config all table cfs in peer. For some namespace, it need > replication all tables to other slave cluster. It will be easy to config if > we support replication by namespace. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16460: --- Attachment: HBASE-16460-v3.patch v3 patch. Add ut for reconfig bucket sizes. > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, > HBASE-16460-v2.patch, HBASE-16460-v3.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
[ https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434463#comment-15434463 ] Guanghao Zhang commented on HBASE-16460: Thanks for your review. bq. we can collect all such failed keys into a List and log them at once at the end But we will miss the failed reason? Use a map to collect all failed keys and exceptions? > Can't rebuild the BucketAllocator's data structures when BucketCache use > FileIOEngine > - > > Key: HBASE-16460 > URL: https://issues.apache.org/jira/browse/HBASE-16460 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, > HBASE-16460-v2.patch, HBASE-16460-v3.patch, HBASE-16460.patch > > > When bucket cache use FileIOEngine, it will rebuild the bucket allocator's > data structures from a persisted map. So it should first read the map from > persistence file then use the map to new a BucketAllocator. But now the code > has wrong sequence in retrieveFromFile() method of BucketCache.java. > {code} > BucketAllocator allocator = new BucketAllocator(cacheCapacity, > bucketSizes, backingMap, realCacheSize); > backingMap = (ConcurrentHashMap) > ois.readObject(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables
[ https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434476#comment-15434476 ] Guanghao Zhang commented on HBASE-16466: If the table always have new data, how to make sure the snapshot is same in both clusters? > HBase snapshots support in VerifyReplication tool to reduce load on live > HBase cluster with large tables > > > Key: HBASE-16466 > URL: https://issues.apache.org/jira/browse/HBASE-16466 > Project: HBase > Issue Type: Improvement > Components: hbase >Affects Versions: 0.98.21 >Reporter: Sukumar Maddineni > > As of now VerifyReplicatin tool is running using normal HBase scanners. If > you want to run VerifyReplication multiple times on a production live > cluster with large tables then it creates extra load on HBase layer. So if we > implement snapshot based support then both in source and target we can read > data from snapshots which reduces load on HBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables
[ https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434476#comment-15434476 ] Guanghao Zhang edited comment on HBASE-16466 at 8/24/16 8:26 AM: - If the table always has new data, how to make sure the snapshot is same in both clusters? was (Author: zghaobac): If the table always have new data, how to make sure the snapshot is same in both clusters? > HBase snapshots support in VerifyReplication tool to reduce load on live > HBase cluster with large tables > > > Key: HBASE-16466 > URL: https://issues.apache.org/jira/browse/HBASE-16466 > Project: HBase > Issue Type: Improvement > Components: hbase >Affects Versions: 0.98.21 >Reporter: Sukumar Maddineni > > As of now VerifyReplicatin tool is running using normal HBase scanners. If > you want to run VerifyReplication multiple times on a production live > cluster with large tables then it creates extra load on HBase layer. So if we > implement snapshot based support then both in source and target we can read > data from snapshots which reduces load on HBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15496) Throw RowTooBigException only for user scan/get
[ https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15496: --- Description: When config hbase.table.max.rowsize, RowTooBigException may be thrown by StoreScanner. But region flush/compact should catch it or throw it only for user scan. Exceptions: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but row is bigger than that at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) or org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but the row is bigger than that. at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) at org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) was: When config hbase.table.max.rowsize, RowTooBigException may be thrown by StoreScanner. But region flush/compact should catch it or throw it only for user scan. org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but row is bigger than that at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) or org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but the row is bigger than that. at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) at org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) > Throw RowTooBigException only for user scan/get > --- > > Key: HBASE-15496 > URL: https://issues.apache.org/jira/browse/HBASE-15496 > Project: HBase > Issue Type: Improvement > Components: Scanners >Repor
[jira] [Created] (HBASE-15496) Throw RowTooBigException only for user scan/get
Guanghao Zhang created HBASE-15496: -- Summary: Throw RowTooBigException only for user scan/get Key: HBASE-15496 URL: https://issues.apache.org/jira/browse/HBASE-15496 Project: HBase Issue Type: Improvement Components: Scanners Reporter: Guanghao Zhang Priority: Minor Fix For: 2.0.0 When config hbase.table.max.rowsize, RowTooBigException may be thrown by StoreScanner. But region flush/compact should catch it or throw it only for user scan. org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but row is bigger than that at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) or org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but the row is bigger than that. at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) at org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15496) Throw RowTooBigException only for user scan/get
[ https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15496: --- Status: Patch Available (was: Open) > Throw RowTooBigException only for user scan/get > --- > > Key: HBASE-15496 > URL: https://issues.apache.org/jira/browse/HBASE-15496 > Project: HBase > Issue Type: Improvement > Components: Scanners >Reporter: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0 > > > When config hbase.table.max.rowsize, RowTooBigException may be thrown by > StoreScanner. But region flush/compact should catch it or throw it only for > user scan. > Exceptions: > org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size > allowed: 10485760, but row is bigger than that > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) > or > org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size > allowed: 10485760, but the row is bigger than that. > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) > at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) > at > org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15496) Throw RowTooBigException only for user scan/get
[ https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15496: --- Attachment: HBASE-15496.patch > Throw RowTooBigException only for user scan/get > --- > > Key: HBASE-15496 > URL: https://issues.apache.org/jira/browse/HBASE-15496 > Project: HBase > Issue Type: Improvement > Components: Scanners >Reporter: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-15496.patch > > > When config hbase.table.max.rowsize, RowTooBigException may be thrown by > StoreScanner. But region flush/compact should catch it or throw it only for > user scan. > Exceptions: > org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size > allowed: 10485760, but row is bigger than that > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) > or > org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size > allowed: 10485760, but the row is bigger than that. > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) > at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) > at > org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15504) Fix Balancer in 1.3 not moving regions off overloaded regionserver
[ https://issues.apache.org/jira/browse/HBASE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205597#comment-15205597 ] Guanghao Zhang commented on HBASE-15504: HBASE-14604 increase move cost when cluster regions number is bigger than maxMoves. Before HBASE-14604, it scale move cost between [0, cluster.numRegions] and it doesn't matter with maxMoves. But in our use case, we config small maxMoves because we don't want move too much region on our online serving cluster. HBASE-14604 will scale move cost between [0, Math.min(cluster.numRegions, maxMoves)]. Do you mind to share your config about maxMoves and cluster regions number? > Fix Balancer in 1.3 not moving regions off overloaded regionserver > -- > > Key: HBASE-15504 > URL: https://issues.apache.org/jira/browse/HBASE-15504 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Elliott Clark > Fix For: 1.3.0 > > > We pushed 1.3 to a couple of clusters. In some cases the regions were > assigned VERY un-evenly and the regions would not move after that. > We ended up with one rs getting thousands of regions and most servers getting > 0. Running balancer would do nothing. The balancer would say that it couldn't > find a solution with less than the current cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15504) Fix Balancer in 1.3 not moving regions off overloaded regionserver
[ https://issues.apache.org/jira/browse/HBASE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205628#comment-15205628 ] Guanghao Zhang commented on HBASE-15504: And when use StochasticLoadBalancer in our test cluster, we found some other problems which need to fix. 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should skip region which have nothing. 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it should add random operation instead of pickLowestLocalityServer(cluster). Because the search function may stuck here if it always generate the same Cluster.Action. 3. getLeastLoadedTopServerForRegion(int region) should get least loaded server which have better locality than current server. > Fix Balancer in 1.3 not moving regions off overloaded regionserver > -- > > Key: HBASE-15504 > URL: https://issues.apache.org/jira/browse/HBASE-15504 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Elliott Clark > Fix For: 1.3.0 > > > We pushed 1.3 to a couple of clusters. In some cases the regions were > assigned VERY un-evenly and the regions would not move after that. > We ended up with one rs getting thousands of regions and most servers getting > 0. Running balancer would do nothing. The balancer would say that it couldn't > find a solution with less than the current cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer
Guanghao Zhang created HBASE-15515: -- Summary: Improve LocalityBasedCandidateGenerator in Balancer Key: HBASE-15515 URL: https://issues.apache.org/jira/browse/HBASE-15515 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.3.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor Fix For: 2.0.0 There are some problems which need to fix. 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should skip empty region. 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it should add random operation instead of pickLowestLocalityServer(cluster). Because the search function may stuck here if it always generate the same Cluster.Action. 3. getLeastLoadedTopServerForRegion should get least loaded server which have better locality than current server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer
[ https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15515: --- Attachment: HBASE-15515.patch > Improve LocalityBasedCandidateGenerator in Balancer > --- > > Key: HBASE-15515 > URL: https://issues.apache.org/jira/browse/HBASE-15515 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-15515.patch > > > There are some problems which need to fix. > 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should > skip empty region. > 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it > should add random operation instead of pickLowestLocalityServer(cluster). > Because the search function may stuck here if it always generate the same > Cluster.Action. > 3. getLeastLoadedTopServerForRegion should get least loaded server which have > better locality than current server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15496) Throw RowTooBigException only for user scan/get
[ https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205741#comment-15205741 ] Guanghao Zhang commented on HBASE-15496: For huge keyvalue case, compaction may produces OOME server side. > Throw RowTooBigException only for user scan/get > --- > > Key: HBASE-15496 > URL: https://issues.apache.org/jira/browse/HBASE-15496 > Project: HBase > Issue Type: Improvement > Components: Scanners >Reporter: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-15496.patch > > > When config hbase.table.max.rowsize, RowTooBigException may be thrown by > StoreScanner. But region flush/compact should catch it or throw it only for > user scan. > Exceptions: > org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size > allowed: 10485760, but row is bigger than that > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) > or > org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size > allowed: 10485760, but the row is bigger than that. > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) > at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) > at > org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer
[ https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15515: --- Attachment: HBASE-15515-v1.patch > Improve LocalityBasedCandidateGenerator in Balancer > --- > > Key: HBASE-15515 > URL: https://issues.apache.org/jira/browse/HBASE-15515 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-15515-v1.patch, HBASE-15515.patch > > > There are some problems which need to fix. > 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should > skip empty region. > 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it > should add random operation instead of pickLowestLocalityServer(cluster). > Because the search function may stuck here if it always generate the same > Cluster.Action. > 3. getLeastLoadedTopServerForRegion should get least loaded server which have > better locality than current server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15504) Fix Balancer in 1.3 not moving regions off overloaded regionserver
[ https://issues.apache.org/jira/browse/HBASE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207753#comment-15207753 ] Guanghao Zhang commented on HBASE-15504: Upload a patch to fix it in HBASE-15515. [~eclark] Can you help to review? > Fix Balancer in 1.3 not moving regions off overloaded regionserver > -- > > Key: HBASE-15504 > URL: https://issues.apache.org/jira/browse/HBASE-15504 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Elliott Clark > Fix For: 1.3.0 > > > We pushed 1.3 to a couple of clusters. In some cases the regions were > assigned VERY un-evenly and the regions would not move after that. > We ended up with one rs getting thousands of regions and most servers getting > 0. Running balancer would do nothing. The balancer would say that it couldn't > find a solution with less than the current cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer
[ https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211268#comment-15211268 ] Guanghao Zhang commented on HBASE-15515: Unit tests in TestStochasticLoadBalancer and TestStochasticLoadBalancer2 always set hbase.master.balancer.stochastic.localityCost to 0. The unit tests didn't think about region locality when balance cluster. Maybe we should add some balancer unit tests about locality first. > Improve LocalityBasedCandidateGenerator in Balancer > --- > > Key: HBASE-15515 > URL: https://issues.apache.org/jira/browse/HBASE-15515 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0 > > Attachments: HBASE-15515-v1.patch, HBASE-15515.patch > > > There are some problems which need to fix. > 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should > skip empty region. > 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it > should add random operation instead of pickLowestLocalityServer(cluster). > Because the search function may stuck here if it always generate the same > Cluster.Action. > 3. getLeastLoadedTopServerForRegion should get least loaded server which have > better locality than current server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15529) Override needBalance in StochasticLoadBalancer
Guanghao Zhang created HBASE-15529: -- Summary: Override needBalance in StochasticLoadBalancer Key: HBASE-15529 URL: https://issues.apache.org/jira/browse/HBASE-15529 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Priority: Minor StochasticLoadBalancer includes cost functions to compute the cost of region rount, r/w qps, table load, region locality, memstore size, and storefile size. Every cost function returns a number between 0 and 1 inclusive and the computed costs are scaled by their respective multipliers. The bigger multiplier means that the respective cost function have the bigger weight. But needBalance decide whether to balance only by region count and doesn't consider r/w qps, locality even you config these cost function with bigger multiplier. StochasticLoadBalancer should override needBalance and decide whether to balance by it's configs of cost function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15529: --- Description: StochasticLoadBalancer includes cost functions to compute the cost of region rount, r/w qps, table load, region locality, memstore size, and storefile size. Every cost function returns a number between 0 and 1 inclusive and the computed costs are scaled by their respective multipliers. The bigger multiplier means that the respective cost function have the bigger weight. But needBalance decide whether to balance only by region count and doesn't consider r/w qps, locality even you config these cost function with bigger multiplier. StochasticLoadBalancer should override needBalance and decide whether to balance by it's configs of cost functions. (was: StochasticLoadBalancer includes cost functions to compute the cost of region rount, r/w qps, table load, region locality, memstore size, and storefile size. Every cost function returns a number between 0 and 1 inclusive and the computed costs are scaled by their respective multipliers. The bigger multiplier means that the respective cost function have the bigger weight. But needBalance decide whether to balance only by region count and doesn't consider r/w qps, locality even you config these cost function with bigger multiplier. StochasticLoadBalancer should override needBalance and decide whether to balance by it's configs of cost function.) > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Priority: Minor > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer
[ https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212751#comment-15212751 ] Guanghao Zhang commented on HBASE-15515: [~yuzhih...@gmail.com] [~eclark] Thanks for your review. > Improve LocalityBasedCandidateGenerator in Balancer > --- > > Key: HBASE-15515 > URL: https://issues.apache.org/jira/browse/HBASE-15515 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0 > > Attachments: HBASE-15515-v1.patch, HBASE-15515.patch > > > There are some problems which need to fix. > 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should > skip empty region. > 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it > should add random operation instead of pickLowestLocalityServer(cluster). > Because the search function may stuck here if it always generate the same > Cluster.Action. > 3. getLeastLoadedTopServerForRegion should get least loaded server which have > better locality than current server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15529: --- Attachment: HBASE-15529.patch > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Priority: Minor > Attachments: HBASE-15529.patch > > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15529: --- Description: StochasticLoadBalancer includes cost functions to compute the cost of region rount, r/w qps, table load, region locality, memstore size, and storefile size. Every cost function returns a number between 0 and 1 inclusive and the computed costs are scaled by their respective multipliers. The bigger multiplier means that the respective cost function have the bigger weight. But needBalance decide whether to balance only by region count and doesn't consider r/w qps, locality even you config these cost function with bigger multiplier. StochasticLoadBalancer should override needBalance and decide whether to balance by it's configs of cost functions. Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. was:StochasticLoadBalancer includes cost functions to compute the cost of region rount, r/w qps, table load, region locality, memstore size, and storefile size. Every cost function returns a number between 0 and 1 inclusive and the computed costs are scaled by their respective multipliers. The bigger multiplier means that the respective cost function have the bigger weight. But needBalance decide whether to balance only by region count and doesn't consider r/w qps, locality even you config these cost function with bigger multiplier. StochasticLoadBalancer should override needBalance and decide whether to balance by it's configs of cost functions. > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Priority: Minor > Attachments: HBASE-15529.patch > > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. > Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, > cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-15529: -- Assignee: Guanghao Zhang > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-15529.patch > > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. > Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, > cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15529: --- Status: Patch Available (was: Open) > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-15529.patch > > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. > Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, > cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-15529: --- Attachment: HBASE-15529-v1.patch > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-15529-v1.patch, HBASE-15529.patch > > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. > Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, > cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15529) Override needBalance in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218004#comment-15218004 ] Guanghao Zhang commented on HBASE-15529: Fix fail unit tests. > Override needBalance in StochasticLoadBalancer > -- > > Key: HBASE-15529 > URL: https://issues.apache.org/jira/browse/HBASE-15529 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-15529-v1.patch, HBASE-15529.patch > > > StochasticLoadBalancer includes cost functions to compute the cost of region > rount, r/w qps, table load, region locality, memstore size, and storefile > size. Every cost function returns a number between 0 and 1 inclusive and the > computed costs are scaled by their respective multipliers. The bigger > multiplier means that the respective cost function have the bigger weight. > But needBalance decide whether to balance only by region count and doesn't > consider r/w qps, locality even you config these cost function with bigger > multiplier. StochasticLoadBalancer should override needBalance and decide > whether to balance by it's configs of cost functions. > Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, > cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16447) Replication by namespace in peer
[ https://issues.apache.org/jira/browse/HBASE-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16447: --- Attachment: HBASE-16447-v1.patch Upload a v1 patch. > Replication by namespace in peer > > > Key: HBASE-16447 > URL: https://issues.apache.org/jira/browse/HBASE-16447 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16447-v1.patch > > > Now we only config table cfs in peer. But in our production cluster, there > are a dozen of namespace and every namespace has dozens of tables. It was > complicated to config all table cfs in peer. For some namespace, it need > replication all tables to other slave cluster. It will be easy to config if > we support replication by namespace. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16447) Replication by namespace in peer
[ https://issues.apache.org/jira/browse/HBASE-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16447: --- Affects Version/s: 2.0.0 Status: Patch Available (was: Open) > Replication by namespace in peer > > > Key: HBASE-16447 > URL: https://issues.apache.org/jira/browse/HBASE-16447 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-16447-v1.patch > > > Now we only config table cfs in peer. But in our production cluster, there > are a dozen of namespace and every namespace has dozens of tables. It was > complicated to config all table cfs in peer. For some namespace, it need > replication all tables to other slave cluster. It will be easy to config if > we support replication by namespace. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)