from:"Guanghao Zhang \(Jira\)"

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-27 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395517#comment-15395517
 ] 

Guanghao Zhang commented on HBASE-9899:
---

[~enis] Did you still work on this? If not, I can take this issue.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v1.patch

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
> Attachments: HBASE-9899-v1.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Status: Patch Available  (was: Open)

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
> Attachments: HBASE-9899-v1.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-29 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400331#comment-15400331
 ] 

Guanghao Zhang commented on HBASE-9899:
---

There are some situation to use these non-idempotent operations 
(increment/append/checkAndPut/...). When use 0.94, we set not retry for these 
non-idempotent operations. Now we upgrade our cluster to 0.98 and found that it 
use nonce to solve this. But it maybe throw OperationConflictException even the 
increment/append success. A example (client rpc retries number set to 3) is:
1. first increment rpc request success
2. client timeout and send second rpc request success, but nonce is same and 
save in server. It found it succeed, so return a OperationConflictException to 
make sure that increment operation only be applied once in server.

This patch will solve this problem by read the previous result when receive a 
duplicate rpc request.
1. Store the mvcc to OperationContext. When first rpc request succeed, store 
the mvcc for this operation nonce.
2. When there are duplicate rpc request, convert to read result by the mvcc.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
> Attachments: HBASE-9899-v1.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-29 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400331#comment-15400331
 ] 

Guanghao Zhang edited comment on HBASE-9899 at 7/30/16 12:58 AM:
-

There are some situation to use these non-idempotent operations 
(increment/append/checkAndPut/...). When use 0.94, we set not retry for these 
non-idempotent operations. Now we upgrade our cluster to 0.98 and found that it 
use nonce to solve this. But it maybe throw OperationConflictException even the 
increment/append success. A example (client rpc retries number set to 3) is:
1. first increment rpc request success
2. client timeout and send second rpc request, but nonce is same and save in 
server. The server found that it has already succeed, so return a 
OperationConflictException to make sure that increment operation only be 
applied once in server.

This patch will solve this problem by read the previous result when receive a 
duplicate rpc request.
1. Store the mvcc to OperationContext. When first rpc request succeed, store 
the mvcc for this operation nonce.
2. When there are duplicate rpc request, convert to read result by the mvcc.


was (Author: zghaobac):
There are some situation to use these non-idempotent operations 
(increment/append/checkAndPut/...). When use 0.94, we set not retry for these 
non-idempotent operations. Now we upgrade our cluster to 0.98 and found that it 
use nonce to solve this. But it maybe throw OperationConflictException even the 
increment/append success. A example (client rpc retries number set to 3) is:
1. first increment rpc request success
2. client timeout and send second rpc request success, but nonce is same and 
save in server. It found it succeed, so return a OperationConflictException to 
make sure that increment operation only be applied once in server.

This patch will solve this problem by read the previous result when receive a 
duplicate rpc request.
1. Store the mvcc to OperationContext. When first rpc request succeed, store 
the mvcc for this operation nonce.
2. When there are duplicate rpc request, convert to read result by the mvcc.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
> Attachments: HBASE-9899-v1.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-30 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400927#comment-15400927
 ] 

Guanghao Zhang commented on HBASE-9899:
---

Thanks [~stack]. The fail ut TestResettingCounters seems related to this and it 
failed in my local machine too. I will try to fix this and upload a new patch.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
> Attachments: HBASE-9899-v1.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-9899:
-

Assignee: Guanghao Zhang  (was: Enis Soztutar)

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v2.patch

Attach a v2 patch.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-31 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v3.patch

Attach a v3 patch. Fix failed ut TestScannerHeartbeatMessages and 
TestMultiParallel.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-31 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: (was: HBASE-9899-v3.patch)

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-31 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v3.patch

Attach v3 again and trigger hadoop QA run again.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-31 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401588#comment-15401588
 ] 

Guanghao Zhang commented on HBASE-9899:
---

It seems not ralated. This patch just read mvcc number from WriteEntry and it 
didn't change the read/write point directly. This failed ut passed in my local 
machine and I can't reproduce this failure. Let the hadoop QA run again to see 
whether it failed.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-07-31 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v3.patch

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-01 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401802#comment-15401802
 ] 

Guanghao Zhang commented on HBASE-9899:
---

Before this patch, the duplicate non-idempotent operation can't proceed by 
throw OperationConfictException. So the previous state in NonceManager always 
is WAIT. After this patch, the duplicate non-idempotent operation can proceed 
by convert it to a get operation. So the previous state maybe PROCEED (first 
rpc request success) when end the new get operation.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-01 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401828#comment-15401828
 ] 

Guanghao Zhang commented on HBASE-9899:
---

I found that it didn't need to call endNonceOperation for the get operation 
(converted from duplicate non-idempotent).  The get operation is not a nonce 
operation, so the assertion didn't need to comment out.

I will attach a new patch and add unit test about apped.

Thanks for your review.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-01 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v4.patch

Fix by comments and add unit test about Append.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-01 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-v4.patch

Attach v4 again.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-02 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403690#comment-15403690
 ] 

Guanghao Zhang commented on HBASE-9899:
---

[~yuzhih...@gmail.com] [~stack] Please help to review the new v4 patch. Thanks.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-04 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407795#comment-15407795
 ] 

Guanghao Zhang commented on HBASE-9899:
---

[~stack] ping..

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-04 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408729#comment-15408729
 ] 

Guanghao Zhang commented on HBASE-9899:
---

Thanks. I will upload a patch for branch-1 today.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-branch-1.patch

Add patch for branch-1.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, 
> HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, 
> HBASE-9899-v4.patch, HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-branch-1.patch

Attath patch for branch-1 again to trigger HADOOP QA.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-v1.patch, HBASE-9899-v2.patch, HBASE-9899-v3.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v4.patch, HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-addendum.patch

Attach an addendum for master. When get nonce from mutation, it should decide 
if it has a nonce first.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16368) test*WhenRegionMove in TestPartialResultsFromClientSide is flaky

2016-08-06 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16368:
--

 Summary: test*WhenRegionMove in TestPartialResultsFromClientSide 
is flaky
 Key: HBASE-16368
 URL: https://issues.apache.org/jira/browse/HBASE-16368
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 1.4.0
Reporter: Guanghao Zhang


This test fail when Hadoop QA run preCommit:
https://builds.apache.org/job/PreCommit-HBASE-Build/2971/testReport/org.apache.hadoop.hbase/TestPartialResultsFromClientSide/testReversedCompleteResultWhenRegionMove/.

And I found it is in Flaky Tests Dashboard: 
http://hbase.x10host.com/flaky-tests/. I run it in my local machine and it may 
fail, too.

Test results show that the region location didn't update when scanner callable 
get a NotServingRegionException or RegionMovedException.
{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Sat Aug 06 05:55:52 UTC 2016, null, java.net.SocketTimeoutException: 
callTimeout=2000, callDuration=2157: 
org.apache.hadoop.hbase.NotServingRegionException: 
testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e.
 is closing
at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8233)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2634)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2629)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2623)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2490)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2264)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
 row '' on table 'testReversedCompleteResultWhenRegionMove' at 
region=testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e.,
 hostname=asf907.gq1.ygridcore.net,38914,1470462943053, seqNum=2

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
at 
org.apache.hadoop.hbase.client.ReversedClientScanner.nextScanner(ReversedClientScanner.java:118)
at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
at 
org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161)
at 
org.apache.hadoop.hbase.client.ReversedClientScanner.(ReversedClientScanner.java:56)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:785)
at 
org.apache.hadoop.hbase.TestPartialResultsFromClientSide.testReversedCompleteResultWhenRegionMove(TestPartialResultsFromClientSide.java:986)
{code}

{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Sat Aug 06 16:27:22 CST 2016, null, java.net.SocketTimeoutException: 
callTimeout=2000, callDuration=3035: Region moved to: hostname=localhost 
port=58351 startCode=1470472007714. As of locationSeqNum=6. row 'testRow0' on 
table 'testPartialResultWhenRegionMove' at 
region=testPartialResultWhenRegionMove,,1470472035048.977faf05c1d6d9990b5559b17aa18913.,
 hostname=localhost,40425,1470472007646, seqNum=2

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
at 
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
at 
org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:247)
at 
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:541)
at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:370)
at 
org.apache.hadoop

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-06 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410565#comment-15410565
 ] 

Guanghao Zhang commented on HBASE-9899:
---

TestPartialResultsFromClientSide is a flaky test and it didn't related to this 
patch. I create a new issue HBASE-16368 about it.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-06 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: (was: HBASE-9899-branch-1.patch)

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-06 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-9899:
--
Attachment: HBASE-9899-branch-1.patch

Retry.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, HBASE-9899-v2.patch, 
> HBASE-9899-v3.patch, HBASE-9899-v3.patch, HBASE-9899-v4.patch, 
> HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-07 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411214#comment-15411214
 ] 

Guanghao Zhang commented on HBASE-9899:
---

I run the ut TestClusterId based branch-1 and it failed too. The build history 
in https://builds.apache.org/job/HBase-1.4/ shows that TestClusterId always 
failed.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, 
> HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, 
> HBASE-9899-v4.patch, HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-15588) Use nonce for checkAndMutate operation

2016-08-08 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-15588:
--

Assignee: Guanghao Zhang

> Use nonce for checkAndMutate operation
> --
>
> Key: HBASE-15588
> URL: https://issues.apache.org/jira/browse/HBASE-15588
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>Assignee: Guanghao Zhang
>
> Like {{increment}}/{{append}}, the {{checkAndPut}}/{{checkAndDelete}} 
> operation is non-idempotent, so that the client may get incorrect result if 
> there are retries, and such incorrect result may lead the application enter 
> an error state. A possible solution is using nonce for checkAndMutate 
> operations, discussions and suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16308) Contain protobuf references

2016-08-08 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411599#comment-15411599
 ] 

Guanghao Zhang commented on HBASE-16308:


I will pick up HBASE-15588 to add nonce for checkAnd* operations.

> Contain protobuf references
> ---
>
> Key: HBASE-16308
> URL: https://issues.apache.org/jira/browse/HBASE-16308
> Project: HBase
>  Issue Type: Sub-task
>  Components: Protobufs
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-16308.master.001.patch, 
> HBASE-16308.master.002.patch, HBASE-16308.master.003.patch, 
> HBASE-16308.master.004.patch, HBASE-16308.master.005.patch, 
> HBASE-16308.master.006.patch, HBASE-16308.master.006.patch, 
> HBASE-16308.master.007.patch
>
>
> Clean up our protobuf references so contained to just a few classes rather 
> than being spread about the codebase. Doing this work will make it easier 
> landing the parent issue and will make it more clear where the division 
> between shaded protobuf and unshaded protobuf lies (we need to continue with 
> unshaded protobuf for HDFS references by AsyncWAL and probably EndPoint 
> Coprocessors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2016-08-08 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412948#comment-15412948
 ] 

Guanghao Zhang commented on HBASE-9899:
---

Thanks [~stack]. But the master branch need the addendum patch too.. And it 
has been included in patch for branch-1.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, 
> HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, 
> HBASE-9899-v4.patch, HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-10338) Region server fails to start with AccessController coprocessor if installed into RegionServerCoprocessorHost

2016-08-09 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414626#comment-15414626
 ] 

Guanghao Zhang commented on HBASE-10338:


We found NPE in our 0.98 production cluster because RegionServerCoprocessorHost 
is not initialized before RpcServer start service.

{code}
  // Try and register with the Master; tell it we are here.  Break if
  // server is stopped or the clusterup flag is down or hdfs went wacky.
  while (keepLooping()) {
RegionServerStartupResponse w = reportForDuty();
if (w == null) {
  LOG.warn("reportForDuty failed; sleeping and then retrying.");
  this.sleeper.sleep();
} else {
  handleReportForDutyResponse(w);
  break;
}
  }

  // Initialize the RegionServerCoprocessorHost now that our ephemeral
  // node was created by reportForDuty, in case any coprocessors want
  // to use ZooKeeper
  this.rsHost = new RegionServerCoprocessorHost(this, this.conf);
{code}

RpcServer start service in handleReportForDutyResponse(), then it can serve rpc 
call replicateWALEntry(). But the RegionServerCoprocessorHost is not 
initialized and it is used in replicateWALEntry, so it will throw a NPE.

> Region server fails to start with AccessController coprocessor if installed 
> into RegionServerCoprocessorHost
> 
>
> Key: HBASE-10338
> URL: https://issues.apache.org/jira/browse/HBASE-10338
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, regionserver
>Affects Versions: 0.98.0
>Reporter: Vandana Ayyalasomayajula
>Assignee: Vandana Ayyalasomayajula
>Priority: Minor
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: 10338.1-0.96.patch, 10338.1-0.98.patch, 10338.1.patch, 
> 10338.1.patch, HBASE-10338.0.patch, HBASE-10338_addendum.patch
>
>
> Runtime exception is being thrown when AccessController CP is used with 
> region server. This is happening as region server co processor host is 
> created before zookeeper is initialized in region server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-10338) Region server fails to start with AccessController coprocessor if installed into RegionServerCoprocessorHost

2016-08-09 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-10338:
---
Attachment: HBASE-10338-0.98-addendum.patch

Attach a addendum for 0.98 branch.

> Region server fails to start with AccessController coprocessor if installed 
> into RegionServerCoprocessorHost
> 
>
> Key: HBASE-10338
> URL: https://issues.apache.org/jira/browse/HBASE-10338
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, regionserver
>Affects Versions: 0.98.0
>Reporter: Vandana Ayyalasomayajula
>Assignee: Vandana Ayyalasomayajula
>Priority: Minor
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: 10338.1-0.96.patch, 10338.1-0.98.patch, 10338.1.patch, 
> 10338.1.patch, HBASE-10338-0.98-addendum.patch, HBASE-10338.0.patch, 
> HBASE-10338_addendum.patch
>
>
> Runtime exception is being thrown when AccessController CP is used with 
> region server. This is happening as region server co processor host is 
> created before zookeeper is initialized in region server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16393) Improve computeHDFSBlocksDistribution

2016-08-10 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416327#comment-15416327
 ] 

Guanghao Zhang commented on HBASE-16393:


+1 on this idea. We found this in our production cluster, too. The balancer is 
too slow when there are a lot of regions. And some default balancer configs is 
too small for big cluster. Maybe we can make the default config value related 
to regions number.

> Improve computeHDFSBlocksDistribution
> -
>
> Key: HBASE-16393
> URL: https://issues.apache.org/jira/browse/HBASE-16393
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
> Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time. 
> And the balancer will be called on master startup, so we can see the startup 
> is slow also. 
> The first thing i think whether if we can parallel compute different region's 
> HDFSBlocksDistribution. 
> The second i think we can improve compute single region's 
> HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call 
> FileSystem#getFileStatus(path) and then 
> FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc 
> call for every storefile. Instead we can use FileSystem#listLocatedStatus to 
> get a LocatedFileStatus for the information we need, so reduce the namenode 
> rpc call to one. This can speed the computeHDFSBlocksDistribution, but also 
> send out less rpc call to namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16402) RegionServerCoprocessorHost should be initialized before RpcServer starts

2016-08-12 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418446#comment-15418446
 ] 

Guanghao Zhang commented on HBASE-16402:


Thanks [~apurtell].

> RegionServerCoprocessorHost should be initialized before RpcServer starts
> -
>
> Key: HBASE-16402
> URL: https://issues.apache.org/jira/browse/HBASE-16402
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 0.98.22
>
> Attachments: HBASE-16402-0.98.patch
>
>
> We found NPE in our 0.98 production cluster because 
> RegionServerCoprocessorHost is not initialized before RpcServer start service.
> {code}
>   // Try and register with the Master; tell it we are here.  Break if
>   // server is stopped or the clusterup flag is down or hdfs went wacky.
>   while (keepLooping()) {
> RegionServerStartupResponse w = reportForDuty();
> if (w == null) {
>   LOG.warn("reportForDuty failed; sleeping and then retrying.");
>   this.sleeper.sleep();
> } else {
>   handleReportForDutyResponse(w);
>   break;
> }
>   }
>   // Initialize the RegionServerCoprocessorHost now that our ephemeral
>   // node was created by reportForDuty, in case any coprocessors want
>   // to use ZooKeeper
>   this.rsHost = new RegionServerCoprocessorHost(this, this.conf);
> {code}
> RpcServer start service in handleReportForDutyResponse(), then it can serve 
> rpc call replicateWALEntry(). But the RegionServerCoprocessorHost is not 
> initialized and it is used in replicateWALEntry, so it will throw a NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16416:
--

 Summary: Make NoncedRegionServerCallable extends 
RegionServerCallable
 Key: HBASE-16416
 URL: https://issues.apache.org/jira/browse/HBASE-16416
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Priority: Minor


After HBASE-16308, there are a new class NoncedRegionServerCallable which 
extends AbstractRegionServerCallable. But it have some duplicate methods with 
RegionServerCallable. So we can make NoncedRegionServerCallable extends 
RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-15 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16416:
---
Status: Patch Available  (was: Open)

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16416.patch
>
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-15 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16416:
---
Attachment: HBASE-16416.patch

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16416.patch
>
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-15 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-16416:
--

Assignee: Guanghao Zhang

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16416.patch
>
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-15 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421961#comment-15421961
 ] 

Guanghao Zhang commented on HBASE-16416:


[~stack] Any ideas?

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16416.patch
>
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-16 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16416:
---
Attachment: (was: HBASE-16416.patch)

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-16 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16416:
---
Attachment: HBASE-16416.patch

Retry to trigger ut.

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16416.patch
>
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16446:
--

 Summary: append_peer_tableCFs failed when there already have this 
table's partial cfs in the peer
 Key: HBASE-16446
 URL: https://issues.apache.org/jira/browse/HBASE-16446
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.21, 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
doesn't right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Description: 
{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
is not right. It should not just contain cf A append_peer_tableCFs.

  was:
{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
is not right. It should not just contain cf A after append.


> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 0.98.21
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Description: 
{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
is not right. It should not just contain cf A after append.

  was:
{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
doesn't right.


> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 0.98.21
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Description: 
{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
is not right. It should not just contain cf A after append_peer_tableCFs.

  was:
{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
is not right. It should not just contain cf A append_peer_tableCFs.


> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 0.98.21
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Attachment: HBASE-16446.patch

Upload a little fix of this.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 0.98.21
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Status: Patch Available  (was: Open)

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.21, 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Affects Version/s: 1.2.3
   1.3.1
   1.1.6

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16447) Replication by namespace in peer

2016-08-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16447:
--

 Summary: Replication by namespace in peer
 Key: HBASE-16447
 URL: https://issues.apache.org/jira/browse/HBASE-16447
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Guanghao Zhang


Now we only config table cfs in peer. But in our production cluster, there are 
a dozen of namespace and every namespace has dozens of tables. It was 
complicated to config all table cfs in peer. For some namespace, it need 
replication all tables to other slave cluster. It will be easy to config if we 
support replication by namespace. Suggestions and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-18 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426304#comment-15426304
 ] 

Guanghao Zhang commented on HBASE-16416:


[~stack] take a look of this, thanks. One more question, how can i trigger the 
hbase-client ut? 

> Make NoncedRegionServerCallable extends RegionServerCallable
> 
>
> Key: HBASE-16416
> URL: https://issues.apache.org/jira/browse/HBASE-16416
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16416.patch
>
>
> After HBASE-16308, there are a new class NoncedRegionServerCallable which 
> extends AbstractRegionServerCallable. But it have some duplicate methods with 
> RegionServerCallable. So we can make NoncedRegionServerCallable extends 
> RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Attachment: (was: HBASE-16446.patch)

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Attachment: HBASE-16446.patch

Add ut.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-19 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428321#comment-15428321
 ] 

Guanghao Zhang commented on HBASE-16446:


It is not a bug. If a table is already in the peer config, append_peer_tableCFs 
means append more cfs of this table. The result should be null after append 
table => [], and null or empty cfs of a table in the peer config menas 
replication all cfs of this table. Then append a cf f1 of this table, the 
result should be null too because it already has all cfs of this table in the 
peer config.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16460:
--

 Summary: Can't rebuild the BucketAllocator's data structures when 
BucketCache use FileIOEngine
 Key: HBASE-16460
 URL: https://issues.apache.org/jira/browse/HBASE-16460
 Project: HBase
  Issue Type: Bug
  Components: BucketCache
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


When bucket cache use FileIOEngine, it will rebuild the bucket allocator's data 
structures from a persisted map. So it should first read the map from 
persistence file then use the map to new a BucketAllocator. But now the code 
has wrong sequence in retrieveFromFile() method of BucketCache.java.

{code}
  BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
bucketSizes, backingMap, realCacheSize);
  backingMap = (ConcurrentHashMap) 
ois.readObject();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-19 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Attachment: HBASE-16460.patch

Upload a patch to fix this.

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-19 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Status: Patch Available  (was: Open)

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-19 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Affects Version/s: 0.98.22
   1.2.3
   1.3.1
   1.1.6

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429342#comment-15429342
 ] 

Guanghao Zhang commented on HBASE-16446:


Thanks for your review. We can use set_peer_tableCFs command to change full 
table replication to restricted CF replication. Another way is first remove of 
the full table replication then append a restricted cf replication.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429346#comment-15429346
 ] 

Guanghao Zhang commented on HBASE-16446:


I took a look at the code of remove_peer_tableCFs and thought it have same bug 
too...

{code}
hbase> remove_peer_tableCFs '2',  { "ns1:table1" => []}
{code}
I don't know cf set is empty means what in remove_peer_tableCFs command? remove 
all cfs of this table or remove nothing?

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429349#comment-15429349
 ] 

Guanghao Zhang commented on HBASE-16446:


If we keep it same with append_peer_tableCFs, the empty cf set in 
remove_peer_tableCFs should means remove all cfs of this table too.  [~tedyu] 
[~ashish singhi] What do you think about this? If you guys agree with this, I 
will upload a little fix for remove_peer_tableCFs too.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 0.98.21, 1.2.3
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-20 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Affects Version/s: (was: 1.2.3)
   (was: 0.98.21)
   (was: 1.3.1)
   (was: 1.1.6)

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-21 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Attachment: HBASE-16460-v1.patch

Upload v1 patch. Add VisibleForTesting and delete cache file after unit test.

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430208#comment-15430208
 ] 

Guanghao Zhang commented on HBASE-16460:


bq. So how was it previously before this fix?
Before this fix, the code in the constructor never be called because backing 
map is empty...

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430212#comment-15430212
 ] 

Guanghao Zhang commented on HBASE-16460:


bq. If so there is a chance that we will not see a suitable bucket for a block 
and so we will fail this L2 cache init. We throw Exception!
If it can't find a suitable bucket for a block, the block can't be catch before 
rs restart. So after rs  restart, it can't throw exception if the user don't 
change the bucket sizes config.

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-22 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16446:
---
Attachment: HBASE-16446-v1.patch

Attach v1 patch. Fix the same bug in remove table cfs.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446-v1.patch, HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430295#comment-15430295
 ] 

Guanghao Zhang commented on HBASE-16460:


Ok. I took a look about the previous code, it can cache block but can't read 
block from cache or read wrong data from cache when backing map is inconsistent 
with bucking allocator. After this fix, the backing map is also inconsistent 
with bucking allocator if we got exception.. Maybe we should remove the 
bucket entry from the backing map if we got exception?

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-22 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Attachment: HBASE-16460-v2.patch

Upload v2 patch. Remove failed rebuild bucket entry from backing map when got 
exception.

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, 
> HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-22 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Attachment: HBASE-16460-v2.patch

Retry.

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, 
> HBASE-16460-v2.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables

2016-08-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431886#comment-15431886
 ] 

Guanghao Zhang commented on HBASE-16466:


You mean use TableSnapshotScanner to read data from hdfs directly?

> HBase snapshots support in VerifyReplication tool to reduce load on live 
> HBase cluster with large tables
> 
>
> Key: HBASE-16466
> URL: https://issues.apache.org/jira/browse/HBASE-16466
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 0.98.21
>Reporter: Sukumar Maddineni
>
> As of now VerifyReplicatin tool is running using normal HBase scanners. If 
> you  want to run VerifyReplication multiple times on a production live 
> cluster with large tables then it creates extra load on HBase layer. So if we 
> implement snapshot based support then both in source and target we can read 
> data from snapshots which reduces load on HBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables

2016-08-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431893#comment-15431893
 ] 

Guanghao Zhang commented on HBASE-16466:


You are not in the list of contributors for the project so you can't see the 
"Assign to me" button..

> HBase snapshots support in VerifyReplication tool to reduce load on live 
> HBase cluster with large tables
> 
>
> Key: HBASE-16466
> URL: https://issues.apache.org/jira/browse/HBASE-16466
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 0.98.21
>Reporter: Sukumar Maddineni
>
> As of now VerifyReplicatin tool is running using normal HBase scanners. If 
> you  want to run VerifyReplication multiple times on a production live 
> cluster with large tables then it creates extra load on HBase layer. So if we 
> implement snapshot based support then both in source and target we can read 
> data from snapshots which reduces load on HBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-23 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432460#comment-15432460
 ] 

Guanghao Zhang commented on HBASE-16446:


After HBASE-11393, the append tableCfs can be empty. But HBASE-11393 were only 
pushed to master, so this bug doesn't exist in other branches.  Thanks.

> append_peer_tableCFs failed when there already have this table's partial cfs 
> in the peer
> 
>
> Key: HBASE-16446
> URL: https://issues.apache.org/jira/browse/HBASE-16446
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-16446-v1.patch, HBASE-16446.patch
>
>
> {code}
> hbase(main):011:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0080 seconds
> hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
> 0 row(s) in 0.0060 seconds
> hbase(main):013:0> list_peers
>  PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
>  20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
> 1 row(s) in 0.0030 seconds
> {code}
> "test_replication" => [] means replication all cf of this table,so the result 
> is not right. It should not just contain cf A after append_peer_tableCFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-16447) Replication by namespace in peer

2016-08-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-16447:
--

Assignee: Guanghao Zhang

> Replication by namespace in peer
> 
>
> Key: HBASE-16447
> URL: https://issues.apache.org/jira/browse/HBASE-16447
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>
> Now we only config table cfs in peer. But in our production cluster, there 
> are a dozen of namespace and every namespace has dozens of tables. It was 
> complicated to config all table cfs in peer. For some namespace, it need 
> replication all tables to other slave cluster. It will be easy to config if 
> we support replication by namespace. Suggestions and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-24 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16460:
---
Attachment: HBASE-16460-v3.patch

v3 patch. Add ut for reconfig bucket sizes.

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, 
> HBASE-16460-v2.patch, HBASE-16460-v3.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-24 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434463#comment-15434463
 ] 

Guanghao Zhang commented on HBASE-16460:


Thanks for your review.
bq.  we can collect all such failed keys into a List and log them at once at 
the end
But we will miss the failed reason? Use a map to collect all failed keys and 
exceptions?

> Can't rebuild the BucketAllocator's data structures when BucketCache use 
> FileIOEngine
> -
>
> Key: HBASE-16460
> URL: https://issues.apache.org/jira/browse/HBASE-16460
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3, 0.98.22
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16460-v1.patch, HBASE-16460-v2.patch, 
> HBASE-16460-v2.patch, HBASE-16460-v3.patch, HBASE-16460.patch
>
>
> When bucket cache use FileIOEngine, it will rebuild the bucket allocator's 
> data structures from a persisted map. So it should first read the map from 
> persistence file then use the map to new a BucketAllocator. But now the code 
> has wrong sequence in retrieveFromFile() method of BucketCache.java.
> {code}
>   BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
> bucketSizes, backingMap, realCacheSize);
>   backingMap = (ConcurrentHashMap) 
> ois.readObject();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables

2016-08-24 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434476#comment-15434476
 ] 

Guanghao Zhang commented on HBASE-16466:


If the table always have new data, how to make sure the snapshot is same in 
both clusters?

> HBase snapshots support in VerifyReplication tool to reduce load on live 
> HBase cluster with large tables
> 
>
> Key: HBASE-16466
> URL: https://issues.apache.org/jira/browse/HBASE-16466
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 0.98.21
>Reporter: Sukumar Maddineni
>
> As of now VerifyReplicatin tool is running using normal HBase scanners. If 
> you  want to run VerifyReplication multiple times on a production live 
> cluster with large tables then it creates extra load on HBase layer. So if we 
> implement snapshot based support then both in source and target we can read 
> data from snapshots which reduces load on HBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-16466) HBase snapshots support in VerifyReplication tool to reduce load on live HBase cluster with large tables

2016-08-24 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434476#comment-15434476
 ] 

Guanghao Zhang edited comment on HBASE-16466 at 8/24/16 8:26 AM:
-

If the table always has new data, how to make sure the snapshot is same in both 
clusters?


was (Author: zghaobac):
If the table always have new data, how to make sure the snapshot is same in 
both clusters?

> HBase snapshots support in VerifyReplication tool to reduce load on live 
> HBase cluster with large tables
> 
>
> Key: HBASE-16466
> URL: https://issues.apache.org/jira/browse/HBASE-16466
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 0.98.21
>Reporter: Sukumar Maddineni
>
> As of now VerifyReplicatin tool is running using normal HBase scanners. If 
> you  want to run VerifyReplication multiple times on a production live 
> cluster with large tables then it creates extra load on HBase layer. So if we 
> implement snapshot based support then both in source and target we can read 
> data from snapshots which reduces load on HBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15496) Throw RowTooBigException only for user scan/get

2016-03-21 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15496:
---
Description: 
When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
StoreScanner. But region flush/compact should catch it or throw it only for 
user scan.

Exceptions:
org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but row is bigger than that
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
  at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
  at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)

or 

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but the row is bigger than that.
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
  at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
  at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
  at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
  at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
  at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
at 
org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)

  was:
When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
StoreScanner. But region flush/compact should catch it or throw it only for 
user scan.

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but row is bigger than that
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
  at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
  at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)

or 

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but the row is bigger than that.
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
  at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
  at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
  at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
  at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
  at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
at 
org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)


> Throw RowTooBigException only for user scan/get
> ---
>
> Key: HBASE-15496
> URL: https://issues.apache.org/jira/browse/HBASE-15496
> Project: HBase
>  Issue Type: Improvement
>  Components: Scanners
>Repor

[jira] [Created] (HBASE-15496) Throw RowTooBigException only for user scan/get

2016-03-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15496:
--

 Summary: Throw RowTooBigException only for user scan/get
 Key: HBASE-15496
 URL: https://issues.apache.org/jira/browse/HBASE-15496
 Project: HBase
  Issue Type: Improvement
  Components: Scanners
Reporter: Guanghao Zhang
Priority: Minor
 Fix For: 2.0.0


When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
StoreScanner. But region flush/compact should catch it or throw it only for 
user scan.

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but row is bigger than that
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
  at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
  at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)

or 

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but the row is bigger than that.
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
  at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
  at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
  at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
  at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
  at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
at 
org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15496) Throw RowTooBigException only for user scan/get

2016-03-21 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15496:
---
Status: Patch Available  (was: Open)

> Throw RowTooBigException only for user scan/get
> ---
>
> Key: HBASE-15496
> URL: https://issues.apache.org/jira/browse/HBASE-15496
> Project: HBase
>  Issue Type: Improvement
>  Components: Scanners
>Reporter: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
> StoreScanner. But region flush/compact should catch it or throw it only for 
> user scan.
> Exceptions:
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 10485760, but row is bigger than that
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
>   at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)
> or 
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 10485760, but the row is bigger than that.
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
>   at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
> at 
> org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15496) Throw RowTooBigException only for user scan/get

2016-03-21 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15496:
---
Attachment: HBASE-15496.patch

> Throw RowTooBigException only for user scan/get
> ---
>
> Key: HBASE-15496
> URL: https://issues.apache.org/jira/browse/HBASE-15496
> Project: HBase
>  Issue Type: Improvement
>  Components: Scanners
>Reporter: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-15496.patch
>
>
> When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
> StoreScanner. But region flush/compact should catch it or throw it only for 
> user scan.
> Exceptions:
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 10485760, but row is bigger than that
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
>   at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)
> or 
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 10485760, but the row is bigger than that.
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
>   at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
> at 
> org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15504) Fix Balancer in 1.3 not moving regions off overloaded regionserver

2016-03-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205597#comment-15205597
 ] 

Guanghao Zhang commented on HBASE-15504:


HBASE-14604 increase move cost when cluster regions number is bigger than 
maxMoves. Before HBASE-14604, it scale move cost between [0, 
cluster.numRegions] and it doesn't matter with maxMoves. But in our use case, 
we config small maxMoves because we don't want move too much region on our 
online serving cluster. HBASE-14604 will scale move cost between [0, 
Math.min(cluster.numRegions, maxMoves)]. Do you mind to share your config about 
maxMoves and cluster regions number?

> Fix Balancer in 1.3 not moving regions off overloaded regionserver
> --
>
> Key: HBASE-15504
> URL: https://issues.apache.org/jira/browse/HBASE-15504
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Elliott Clark
> Fix For: 1.3.0
>
>
> We pushed 1.3 to a couple of clusters. In some cases the regions were 
> assigned VERY un-evenly and the regions would not move after that.
> We ended up with one rs getting thousands of regions and most servers getting 
> 0. Running balancer would do nothing. The balancer would say that it couldn't 
> find a solution with less than the current cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15504) Fix Balancer in 1.3 not moving regions off overloaded regionserver

2016-03-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205628#comment-15205628
 ] 

Guanghao Zhang commented on HBASE-15504:


And when use StochasticLoadBalancer in our test cluster, we found some other 
problems which need to fix.
1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should skip 
region which have nothing.
2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
should add random operation instead of pickLowestLocalityServer(cluster). 
Because the search function may stuck here if it always generate the same 
Cluster.Action.
3. getLeastLoadedTopServerForRegion(int region) should get least loaded server 
which have better locality than current server.

> Fix Balancer in 1.3 not moving regions off overloaded regionserver
> --
>
> Key: HBASE-15504
> URL: https://issues.apache.org/jira/browse/HBASE-15504
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Elliott Clark
> Fix For: 1.3.0
>
>
> We pushed 1.3 to a couple of clusters. In some cases the regions were 
> assigned VERY un-evenly and the regions would not move after that.
> We ended up with one rs getting thousands of regions and most servers getting 
> 0. Running balancer would do nothing. The balancer would say that it couldn't 
> find a solution with less than the current cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer

2016-03-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15515:
--

 Summary: Improve LocalityBasedCandidateGenerator in Balancer
 Key: HBASE-15515
 URL: https://issues.apache.org/jira/browse/HBASE-15515
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.3.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor
 Fix For: 2.0.0


There are some problems which need to fix.
1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should skip 
empty region.
2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
should add random operation instead of pickLowestLocalityServer(cluster). 
Because the search function may stuck here if it always generate the same 
Cluster.Action.
3. getLeastLoadedTopServerForRegion should get least loaded server which have 
better locality than current server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer

2016-03-21 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15515:
---
Attachment: HBASE-15515.patch

> Improve LocalityBasedCandidateGenerator in Balancer
> ---
>
> Key: HBASE-15515
> URL: https://issues.apache.org/jira/browse/HBASE-15515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-15515.patch
>
>
> There are some problems which need to fix.
> 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should 
> skip empty region.
> 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
> should add random operation instead of pickLowestLocalityServer(cluster). 
> Because the search function may stuck here if it always generate the same 
> Cluster.Action.
> 3. getLeastLoadedTopServerForRegion should get least loaded server which have 
> better locality than current server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15496) Throw RowTooBigException only for user scan/get

2016-03-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205741#comment-15205741
 ] 

Guanghao Zhang commented on HBASE-15496:


For huge keyvalue case, compaction may produces OOME server side.

> Throw RowTooBigException only for user scan/get
> ---
>
> Key: HBASE-15496
> URL: https://issues.apache.org/jira/browse/HBASE-15496
> Project: HBase
>  Issue Type: Improvement
>  Components: Scanners
>Reporter: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-15496.patch
>
>
> When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
> StoreScanner. But region flush/compact should catch it or throw it only for 
> user scan.
> Exceptions:
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 10485760, but row is bigger than that
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
>   at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)
> or 
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 10485760, but the row is bigger than that.
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
>   at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
> at 
> org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer

2016-03-22 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15515:
---
Attachment: HBASE-15515-v1.patch

> Improve LocalityBasedCandidateGenerator in Balancer
> ---
>
> Key: HBASE-15515
> URL: https://issues.apache.org/jira/browse/HBASE-15515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-15515-v1.patch, HBASE-15515.patch
>
>
> There are some problems which need to fix.
> 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should 
> skip empty region.
> 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
> should add random operation instead of pickLowestLocalityServer(cluster). 
> Because the search function may stuck here if it always generate the same 
> Cluster.Action.
> 3. getLeastLoadedTopServerForRegion should get least loaded server which have 
> better locality than current server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15504) Fix Balancer in 1.3 not moving regions off overloaded regionserver

2016-03-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207753#comment-15207753
 ] 

Guanghao Zhang commented on HBASE-15504:


Upload a patch to fix it in HBASE-15515. [~eclark] Can you help to review?

> Fix Balancer in 1.3 not moving regions off overloaded regionserver
> --
>
> Key: HBASE-15504
> URL: https://issues.apache.org/jira/browse/HBASE-15504
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Elliott Clark
> Fix For: 1.3.0
>
>
> We pushed 1.3 to a couple of clusters. In some cases the regions were 
> assigned VERY un-evenly and the regions would not move after that.
> We ended up with one rs getting thousands of regions and most servers getting 
> 0. Running balancer would do nothing. The balancer would say that it couldn't 
> find a solution with less than the current cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer

2016-03-24 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211268#comment-15211268
 ] 

Guanghao Zhang commented on HBASE-15515:


Unit tests in TestStochasticLoadBalancer and TestStochasticLoadBalancer2 always 
set hbase.master.balancer.stochastic.localityCost to 0. The unit tests didn't 
think about region locality when balance cluster. Maybe we should add some 
balancer unit tests about locality first.

> Improve LocalityBasedCandidateGenerator in Balancer
> ---
>
> Key: HBASE-15515
> URL: https://issues.apache.org/jira/browse/HBASE-15515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15515-v1.patch, HBASE-15515.patch
>
>
> There are some problems which need to fix.
> 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should 
> skip empty region.
> 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
> should add random operation instead of pickLowestLocalityServer(cluster). 
> Because the search function may stuck here if it always generate the same 
> Cluster.Action.
> 3. getLeastLoadedTopServerForRegion should get least loaded server which have 
> better locality than current server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15529:
--

 Summary: Override needBalance in StochasticLoadBalancer
 Key: HBASE-15529
 URL: https://issues.apache.org/jira/browse/HBASE-15529
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Priority: Minor


StochasticLoadBalancer includes cost functions to compute the cost of region 
rount, r/w qps, table load, region locality, memstore size, and storefile size. 
Every cost function returns a number between 0 and 1 inclusive and the computed 
costs are scaled by their respective multipliers. The bigger multiplier means 
that the respective cost function have the bigger weight. But needBalance 
decide whether to balance only by region count and doesn't consider r/w qps, 
locality even you config these cost function with bigger multiplier. 
StochasticLoadBalancer should override needBalance and decide whether to 
balance by it's configs of cost function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-25 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15529:
---
Description: StochasticLoadBalancer includes cost functions to compute the 
cost of region rount, r/w qps, table load, region locality, memstore size, and 
storefile size. Every cost function returns a number between 0 and 1 inclusive 
and the computed costs are scaled by their respective multipliers. The bigger 
multiplier means that the respective cost function have the bigger weight. But 
needBalance decide whether to balance only by region count and doesn't consider 
r/w qps, locality even you config these cost function with bigger multiplier. 
StochasticLoadBalancer should override needBalance and decide whether to 
balance by it's configs of cost functions.  (was: StochasticLoadBalancer 
includes cost functions to compute the cost of region rount, r/w qps, table 
load, region locality, memstore size, and storefile size. Every cost function 
returns a number between 0 and 1 inclusive and the computed costs are scaled by 
their respective multipliers. The bigger multiplier means that the respective 
cost function have the bigger weight. But needBalance decide whether to balance 
only by region count and doesn't consider r/w qps, locality even you config 
these cost function with bigger multiplier. StochasticLoadBalancer should 
override needBalance and decide whether to balance by it's configs of cost 
function.)

> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Priority: Minor
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer

2016-03-25 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212751#comment-15212751
 ] 

Guanghao Zhang commented on HBASE-15515:


[~yuzhih...@gmail.com] [~eclark] Thanks for your review.

> Improve LocalityBasedCandidateGenerator in Balancer
> ---
>
> Key: HBASE-15515
> URL: https://issues.apache.org/jira/browse/HBASE-15515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15515-v1.patch, HBASE-15515.patch
>
>
> There are some problems which need to fix.
> 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should 
> skip empty region.
> 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
> should add random operation instead of pickLowestLocalityServer(cluster). 
> Because the search function may stuck here if it always generate the same 
> Cluster.Action.
> 3. getLeastLoadedTopServerForRegion should get least loaded server which have 
> better locality than current server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15529:
---
Attachment: HBASE-15529.patch

> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15529:
---
Description: 
StochasticLoadBalancer includes cost functions to compute the cost of region 
rount, r/w qps, table load, region locality, memstore size, and storefile size. 
Every cost function returns a number between 0 and 1 inclusive and the computed 
costs are scaled by their respective multipliers. The bigger multiplier means 
that the respective cost function have the bigger weight. But needBalance 
decide whether to balance only by region count and doesn't consider r/w qps, 
locality even you config these cost function with bigger multiplier. 
StochasticLoadBalancer should override needBalance and decide whether to 
balance by it's configs of cost functions.

Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, cluster 
need balance when (total cost / sum multiplier) > minCostNeedBalance.

  was:StochasticLoadBalancer includes cost functions to compute the cost of 
region rount, r/w qps, table load, region locality, memstore size, and 
storefile size. Every cost function returns a number between 0 and 1 inclusive 
and the computed costs are scaled by their respective multipliers. The bigger 
multiplier means that the respective cost function have the bigger weight. But 
needBalance decide whether to balance only by region count and doesn't consider 
r/w qps, locality even you config these cost function with bigger multiplier. 
StochasticLoadBalancer should override needBalance and decide whether to 
balance by it's configs of cost functions.


> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.
> Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, 
> cluster need balance when (total cost / sum multiplier) > minCostNeedBalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-15529:
--

Assignee: Guanghao Zhang

> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.
> Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, 
> cluster need balance when (total cost / sum multiplier) > minCostNeedBalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15529:
---
Status: Patch Available  (was: Open)

> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.
> Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, 
> cluster need balance when (total cost / sum multiplier) > minCostNeedBalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15529:
---
Attachment: HBASE-15529-v1.patch

> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-15529-v1.patch, HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.
> Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, 
> cluster need balance when (total cost / sum multiplier) > minCostNeedBalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-30 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218004#comment-15218004
 ] 

Guanghao Zhang commented on HBASE-15529:


Fix fail unit tests.

> Override needBalance in StochasticLoadBalancer
> --
>
> Key: HBASE-15529
> URL: https://issues.apache.org/jira/browse/HBASE-15529
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-15529-v1.patch, HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.
> Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, 
> cluster need balance when (total cost / sum multiplier) > minCostNeedBalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16447) Replication by namespace in peer

2016-08-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16447:
---
Attachment: HBASE-16447-v1.patch

Upload a v1 patch.

> Replication by namespace in peer
> 
>
> Key: HBASE-16447
> URL: https://issues.apache.org/jira/browse/HBASE-16447
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16447-v1.patch
>
>
> Now we only config table cfs in peer. But in our production cluster, there 
> are a dozen of namespace and every namespace has dozens of tables. It was 
> complicated to config all table cfs in peer. For some namespace, it need 
> replication all tables to other slave cluster. It will be easy to config if 
> we support replication by namespace. Suggestions and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16447) Replication by namespace in peer

2016-08-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16447:
---
Affects Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> Replication by namespace in peer
> 
>
> Key: HBASE-16447
> URL: https://issues.apache.org/jira/browse/HBASE-16447
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16447-v1.patch
>
>
> Now we only config table cfs in peer. But in our production cluster, there 
> are a dozen of namespace and every namespace has dozens of tables. It was 
> complicated to config all table cfs in peer. For some namespace, it need 
> replication all tables to other slave cluster. It will be easy to config if 
> we support replication by namespace. Suggestions and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 5131 matches

Mail list logo