[jira] [Commented] (HBASE-20925) Canary test to expose table availability rate

2018-07-30 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563152#comment-16563152
 ] 

Xu Cang commented on HBASE-20925:
-

[~ckulkarni] [~karanmehta93]

> Canary test to expose table availability rate 
> --
>
> Key: HBASE-20925
> URL: https://issues.apache.org/jira/browse/HBASE-20925
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Affects Versions: 3.0.0, 2.0.0, 1.4.6
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: Canary
> Attachments: HBASE-20925.master.001.patch, 
> HBASE-20925.master.002.patch, HBASE-20925.master.003.patch, 
> HBASE-20925.master.004.patch
>
>
> Canary test to expose table availability rate.
>  
> It will print table availability rate such as below. 
>  
>  
> *2018-07-27 17:11:06,823 INFO [CanaryMonitor-1532736665083] tool.Canary: 
> *
> *2018-07-27 17:11:06,824 INFO [CanaryMonitor-1532736665083] tool.Canary: 
> === Summary: ===*
> *2018-07-27 17:11:06,824 INFO [CanaryMonitor-1532736665083] tool.Canary: Read 
> success rate for table : MyTable is: 1.0 .*   
> *2018-07-27 17:11:06,824 INFO [CanaryMonitor-1532736665083] tool.Canary: Read 
> success rate for table : mytable3 is: 0.9*
> *2018-07-27 17:11:06,824 INFO [CanaryMonitor-1532736665083] tool.Canary: Read 
> success rate for table : mytable2 is: 0.8*
> *2018-07-27 17:11:06,824 INFO [CanaryMonitor-1532736665083] tool.Canary: Read 
> success rate for table : mytable4 is: 1.0*
> *2018-07-27 17:11:06,824 INFO [CanaryMonitor-1532736665083] tool.Canary: 
> ===END==*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20896) Port HBASE-20866 to branch-1 and branch-1.4

2018-07-30 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563146#comment-16563146
 ] 

Reid Chan commented on HBASE-20896:
---

ping [~apurtell], [~mdrob], [~tedyu], do you want to take a look?

> Port HBASE-20866 to branch-1 and branch-1.4 
> 
>
> Key: HBASE-20896
> URL: https://issues.apache.org/jira/browse/HBASE-20896
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Vikas Vishwakarma
>Priority: Major
> Fix For: 1.5.0, 1.4.7
>
> Attachments: HBASE-20896.branch-1.4.001.patch, 
> HBASE-20896.branch-1.4.002.patch, HBASE-20896.branch-1.4.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20896) Port HBASE-20866 to branch-1 and branch-1.4

2018-07-30 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563145#comment-16563145
 ] 

Reid Chan commented on HBASE-20896:
---

Took a deeper insight.
{quote}
make it protected
{quote}
Just keeping public is find, i could see those methods called outside.
LGTM overall, please address previous comments.

> Port HBASE-20866 to branch-1 and branch-1.4 
> 
>
> Key: HBASE-20896
> URL: https://issues.apache.org/jira/browse/HBASE-20896
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Vikas Vishwakarma
>Priority: Major
> Fix For: 1.5.0, 1.4.7
>
> Attachments: HBASE-20896.branch-1.4.001.patch, 
> HBASE-20896.branch-1.4.002.patch, HBASE-20896.branch-1.4.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20975) Lock may not be taken while rolling back procedure

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563124#comment-16563124
 ] 

Allan Yang commented on HBASE-20975:


Included a trivial change in hbase-server to trigger UT

> Lock may not be taken while rolling back procedure
> --
>
> Key: HBASE-20975
> URL: https://issues.apache.org/jira/browse/HBASE-20975
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20975.branch-2.0.001.patch, 
> HBASE-20975.branch-2.0.002.patch, HBASE-20975.branch-2.0.003.patch
>
>
> Find this one when investigating HBASE-20921, too.
> Here is some code from executeRollback in ProcedureExecutor.java.
> {code}
> boolean reuseLock = false;
> while (stackTail --> 0) {
>   final Procedure proc = subprocStack.get(stackTail);
>   LockState lockState;
>   //If reuseLock, then don't acquire the lock
>   if (!reuseLock && (lockState = acquireLock(proc)) != 
> LockState.LOCK_ACQUIRED) {
> return lockState;
>   }
>   lockState = executeRollback(proc);
>   boolean abortRollback = lockState != LockState.LOCK_ACQUIRED;
>   abortRollback |= !isRunning() || !store.isRunning();
>   //If the next procedure in the stack is the current one, then reuseLock 
> = true
>   reuseLock = stackTail > 0 && (subprocStack.get(stackTail - 1) == proc) 
> && !abortRollback;
>   //If reuseLock, don't releaseLock
>   if (!reuseLock) {
> releaseLock(proc, false);
>   }
>   if (abortRollback) {
> return lockState;
>   }
>   subprocStack.remove(stackTail);
>   if (proc.isYieldAfterExecutionStep(getEnvironment())) {
> return LockState.LOCK_YIELD_WAIT;
>   }
>   //But, here, lock is released no matter reuseLock is true or false
>   if (proc != rootProc) {
> execCompletionCleanup(proc);
>   }
> }
> {code}
> You can see my comments in the code above, reuseLock can cause the procedure 
> executing(rollback) without a lock. Though I haven't found any bugs 
> introduced by this issue, it is indeed a potential bug need to fix.
> I think we can just remove the reuseLock logic. Acquire and release lock 
> every time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20975) Lock may not be taken while rolling back procedure

2018-07-30 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20975:
---
Attachment: HBASE-20975.branch-2.0.003.patch

> Lock may not be taken while rolling back procedure
> --
>
> Key: HBASE-20975
> URL: https://issues.apache.org/jira/browse/HBASE-20975
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20975.branch-2.0.001.patch, 
> HBASE-20975.branch-2.0.002.patch, HBASE-20975.branch-2.0.003.patch
>
>
> Find this one when investigating HBASE-20921, too.
> Here is some code from executeRollback in ProcedureExecutor.java.
> {code}
> boolean reuseLock = false;
> while (stackTail --> 0) {
>   final Procedure proc = subprocStack.get(stackTail);
>   LockState lockState;
>   //If reuseLock, then don't acquire the lock
>   if (!reuseLock && (lockState = acquireLock(proc)) != 
> LockState.LOCK_ACQUIRED) {
> return lockState;
>   }
>   lockState = executeRollback(proc);
>   boolean abortRollback = lockState != LockState.LOCK_ACQUIRED;
>   abortRollback |= !isRunning() || !store.isRunning();
>   //If the next procedure in the stack is the current one, then reuseLock 
> = true
>   reuseLock = stackTail > 0 && (subprocStack.get(stackTail - 1) == proc) 
> && !abortRollback;
>   //If reuseLock, don't releaseLock
>   if (!reuseLock) {
> releaseLock(proc, false);
>   }
>   if (abortRollback) {
> return lockState;
>   }
>   subprocStack.remove(stackTail);
>   if (proc.isYieldAfterExecutionStep(getEnvironment())) {
> return LockState.LOCK_YIELD_WAIT;
>   }
>   //But, here, lock is released no matter reuseLock is true or false
>   if (proc != rootProc) {
> execCompletionCleanup(proc);
>   }
> }
> {code}
> You can see my comments in the code above, reuseLock can cause the procedure 
> executing(rollback) without a lock. Though I haven't found any bugs 
> introduced by this issue, it is indeed a potential bug need to fix.
> I think we can just remove the reuseLock logic. Acquire and release lock 
> every time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563120#comment-16563120
 ] 

Allan Yang commented on HBASE-20976:


Sorry,  HBASE-20708 has not been back-ported, so the issue still exists

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang reopened HBASE-20976:


> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20965) Separate region server report requests to new handlers

2018-07-30 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563114#comment-16563114
 ] 

Guanghao Zhang commented on HBASE-20965:


Add a unit test for this? [~Yi Mei]

> Separate region server report requests to new handlers
> --
>
> Key: HBASE-20965
> URL: https://issues.apache.org/jira/browse/HBASE-20965
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20965.master.001.patch
>
>
> In master rpc scheduler, all rpc requests are executed in a thread pool. This 
> task separates rs report requests to new handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20965) Separate region server report requests to new handlers

2018-07-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-20965:
---
Issue Type: Improvement  (was: Task)

> Separate region server report requests to new handlers
> --
>
> Key: HBASE-20965
> URL: https://issues.apache.org/jira/browse/HBASE-20965
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20965.master.001.patch
>
>
> In master rpc scheduler, all rpc requests are executed in a thread pool. This 
> task separates rs report requests to new handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20965) Separate region server report requests to new handlers

2018-07-30 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-20965:
---
Status: Patch Available  (was: Open)

> Separate region server report requests to new handlers
> --
>
> Key: HBASE-20965
> URL: https://issues.apache.org/jira/browse/HBASE-20965
> Project: HBase
>  Issue Type: Task
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20965.master.001.patch
>
>
> In master rpc scheduler, all rpc requests are executed in a thread pool. This 
> task separates rs report requests to new handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20981) Rollback stateCount accounting thrown-off when exception out of rollbackState

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563111#comment-16563111
 ] 

Allan Yang commented on HBASE-20981:


+1 if all UT pass

> Rollback stateCount accounting thrown-off when exception out of rollbackState
> -
>
> Key: HBASE-20981
> URL: https://issues.apache.org/jira/browse/HBASE-20981
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: Jack Bearden
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20981.branch-2.001.patch
>
>
> Found by might [~allan163] over in HBASE-20893. Quoting Allan:
> {code}
> But, there is truly a bug here,
>   @Override
>   protected void rollback(final TEnvironment env)
>   throws IOException, InterruptedException {
> if (isEofState()) stateCount--;
> try {
>   updateTimestamp();
>   rollbackState(env, getCurrentState());
>   stateCount--;
> } finally {
>   updateTimestamp();
> }
>   }
> We need to decrease the stateCount when rolling back, so we can rollback for 
> the previous state correctly. But. since a exception is thrown, the decrease 
> for stateCount never happen. So ProcedureExecutor will continue to rollback 
> for only one state(the one throw a exception) until the end of the execution 
> stack.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-30 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563103#comment-16563103
 ] 

Guanghao Zhang edited comment on HBASE-20657 at 7/31/18 3:27 AM:
-

bq. I don't know a reason why this shouldn't also be applied to 2.x
[~elserj] I opened a issue HBASE-20713 about this. The soluation in master 
branch is not the final soluation. So I didn't applied it to 2.* branch.


was (Author: zghaobac):
bq. I don't know a reason why this shouldn't also be applied to 2.x
[~elserj] I opened a issue HBASE-20713 about this. The soluation in master 
branch is not the final soluation. So I don't applied it to 2.* branch.

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-30 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563103#comment-16563103
 ] 

Guanghao Zhang commented on HBASE-20657:


bq. I don't know a reason why this shouldn't also be applied to 2.x
[~elserj] I opened a issue HBASE-20713 about this. The soluation in master 
branch is not the final soluation. So I don't applied it to 2.* branch.

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20976:
---
Resolution: Invalid
Status: Resolved  (was: Patch Available)

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563099#comment-16563099
 ] 

Allan Yang commented on HBASE-20976:


Seem like HBASE-20708 has been back-ported to branch-2.0. Resolving this one as 
fixed

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20886) [Auth] Support keytab login in hbase client

2018-07-30 Thread Reid Chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-20886:
--
  Resolution: Resolved
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> [Auth] Support keytab login in hbase client
> ---
>
> Key: HBASE-20886
> URL: https://issues.apache.org/jira/browse/HBASE-20886
> Project: HBase
>  Issue Type: New Feature
>  Components: asyncclient, Client, security
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20886.master.001.patch, 
> HBASE-20886.master.002.patch, HBASE-20886.master.003.patch, 
> HBASE-20886.master.004.patch, HBASE-20886.master.005.patch, 
> HBASE-20886.master.006.patch, HBASE-20886.master.007.patch, 
> HBASE-20886.master.008.patch
>
>
> There're lots of questions about how to connect to kerberized hbase cluster 
> through hbase-client api from user-mail and slack channel.
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> already existed in code base, but they are only used in {{Canary}}.
> This issue is to make use of two configs to support client-side keytab based 
> login, after this issue resolved, hbase-client should directly connect to 
> kerberized cluster without changing any code as long as 
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20886) [Auth] Support keytab login in hbase client

2018-07-30 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563081#comment-16563081
 ] 

Reid Chan commented on HBASE-20886:
---

Pushed to master and branch-2.

> [Auth] Support keytab login in hbase client
> ---
>
> Key: HBASE-20886
> URL: https://issues.apache.org/jira/browse/HBASE-20886
> Project: HBase
>  Issue Type: New Feature
>  Components: asyncclient, Client, security
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20886.master.001.patch, 
> HBASE-20886.master.002.patch, HBASE-20886.master.003.patch, 
> HBASE-20886.master.004.patch, HBASE-20886.master.005.patch, 
> HBASE-20886.master.006.patch, HBASE-20886.master.007.patch, 
> HBASE-20886.master.008.patch
>
>
> There're lots of questions about how to connect to kerberized hbase cluster 
> through hbase-client api from user-mail and slack channel.
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> already existed in code base, but they are only used in {{Canary}}.
> This issue is to make use of two configs to support client-side keytab based 
> login, after this issue resolved, hbase-client should directly connect to 
> kerberized cluster without changing any code as long as 
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20886) [Auth] Support keytab login in hbase client

2018-07-30 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563075#comment-16563075
 ] 

Reid Chan commented on HBASE-20886:
---

Thanks for pointing out, yea~, it's bad.
{quote}
 direct users of UGI should self-ensure and call the 
checkTGTAndReloginFromKeytab functionality themselves.
{quote}
That's what this patch does.

> [Auth] Support keytab login in hbase client
> ---
>
> Key: HBASE-20886
> URL: https://issues.apache.org/jira/browse/HBASE-20886
> Project: HBase
>  Issue Type: New Feature
>  Components: asyncclient, Client, security
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20886.master.001.patch, 
> HBASE-20886.master.002.patch, HBASE-20886.master.003.patch, 
> HBASE-20886.master.004.patch, HBASE-20886.master.005.patch, 
> HBASE-20886.master.006.patch, HBASE-20886.master.007.patch, 
> HBASE-20886.master.008.patch
>
>
> There're lots of questions about how to connect to kerberized hbase cluster 
> through hbase-client api from user-mail and slack channel.
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> already existed in code base, but they are only used in {{Canary}}.
> This issue is to make use of two configs to support client-side keytab based 
> login, after this issue resolved, hbase-client should directly connect to 
> kerberized cluster without changing any code as long as 
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20981) Rollback stateCount accounting thrown-off when exception out of rollbackState

2018-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562991#comment-16562991
 ] 

Hadoop QA commented on HBASE-20981:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
37s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hbase-procedure: The patch generated 1 new + 8 
unchanged - 0 fixed = 9 total (was 8) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
36s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
7m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 56s{color} 
| {color:red} hbase-procedure in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.procedure2.TestYieldProcedures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-20981 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933690/HBASE-20981.branch-2.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 1c9329ea82c1 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / 584093c23f |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13858/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13858/artifact/patchprocess/patch-unit-hbase-procedure.txt
 |

[jira] [Commented] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562979#comment-16562979
 ] 

Duo Zhang commented on HBASE-20979:
---

OK. got it. +1.

> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562972#comment-16562972
 ] 

Sean Busbey commented on HBASE-20979:
-

> We do not need to check 200 for other two requests?

We could; it would help make things like transient errors more obvious. but my 
only concern thus far has been when the job goes away entirely, since that's 
the only failure mode that's happened thus far.

> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-20979:

Status: Open  (was: Patch Available)

missed testing with the {{--is-yetus}} flag from HBASE-19382, so moving out of 
patch available

> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20967) TestFromClientSide3 fails with NPE

2018-07-30 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562970#comment-16562970
 ] 

Duo Zhang commented on HBASE-20967:
---

The jenkins node will be shared by others so the resources are limited, and 
usually tests will spend more times to finish. It will be more likely to 
reproduce race on jenkins, comparing to our local environment.

> TestFromClientSide3 fails with NPE
> --
>
> Key: HBASE-20967
> URL: https://issues.apache.org/jira/browse/HBASE-20967
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
>
>
> https://builds.apache.org/job/HBASE-Flaky-Tests/35375/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3/testLockLeakWithDelta/
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.find(TestFromClientSide3.java:995)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.find(TestFromClientSide3.java:1002)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.testLockLeakWithDelta(TestFromClientSide3.java:783)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20981) Rollback stateCount accounting thrown-off when exception out of rollbackState

2018-07-30 Thread Jack Bearden (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Bearden updated HBASE-20981:
-
  Assignee: Jack Bearden
Attachment: HBASE-20981.branch-2.001.patch
Status: Patch Available  (was: Open)

Hi [~stack]. From what I can tell, it looks like this use-case may have been 
missed due to no testing around the stateCounter bool when rollback() is called 
on StateMachineProcedure. I uploaded patch #1 in an effort to improve on this.
 * Scope of changes
 ** Moved decrementing stateCount out of the try block and into finally block 
instead.
 ** Added two tests. One test that checks stateCount for normal rollbacks and 
one check for stateCount in a rollback() that throws exceptions.
 * Places this patch could improve
 ** I had to change isEofState() and stateCount from private to protected so I 
was able to override it in the test class. I do not like that I had to 
sacrifice the encapsulation of the class for the test behavior. Maybe 
implementing a getter and setter for stateCount would be the better approach 
here?
 ** Code duplication for a new TestSMProcedure class that had the rollback() 
function with an exception in it. I was hoping to not impact the other tests, 
so I duplicated code to get the behavior for rollback() that I wanted. There 
may be a better way to do this.

> Rollback stateCount accounting thrown-off when exception out of rollbackState
> -
>
> Key: HBASE-20981
> URL: https://issues.apache.org/jira/browse/HBASE-20981
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: Jack Bearden
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20981.branch-2.001.patch
>
>
> Found by might [~allan163] over in HBASE-20893. Quoting Allan:
> {code}
> But, there is truly a bug here,
>   @Override
>   protected void rollback(final TEnvironment env)
>   throws IOException, InterruptedException {
> if (isEofState()) stateCount--;
> try {
>   updateTimestamp();
>   rollbackState(env, getCurrentState());
>   stateCount--;
> } finally {
>   updateTimestamp();
> }
>   }
> We need to decrease the stateCount when rolling back, so we can rollback for 
> the previous state correctly. But. since a exception is thrown, the decrease 
> for stateCount never happen. So ProcedureExecutor will continue to rollback 
> for only one state(the one throw a exception) until the end of the execution 
> stack.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20834) The jenkins on http://104.198.223.121:8080/job/HBASE-Flaky-Tests/ is broken

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-20834.
-
Resolution: Fixed

I removed the URL for this job from the branch-2.0 flaky finder. couldn't find 
any docs on it; looked like someone had set up a google box to run the tests 
that are marked as flaky more often.

> The jenkins on http://104.198.223.121:8080/job/HBASE-Flaky-Tests/ is broken
> ---
>
> Key: HBASE-20834
> URL: https://issues.apache.org/jira/browse/HBASE-20834
> Project: HBase
>  Issue Type: Bug
>  Components: community, test
>Reporter: Duo Zhang
>Assignee: Sean Busbey
>Priority: Major
>
> It is used by our flakey test finder to collect flakey tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20834) The jenkins on http://104.198.223.121:8080/job/HBASE-Flaky-Tests/ is broken

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HBASE-20834:
---

Assignee: Sean Busbey

> The jenkins on http://104.198.223.121:8080/job/HBASE-Flaky-Tests/ is broken
> ---
>
> Key: HBASE-20834
> URL: https://issues.apache.org/jira/browse/HBASE-20834
> Project: HBase
>  Issue Type: Bug
>  Components: community, test
>Reporter: Duo Zhang
>Assignee: Sean Busbey
>Priority: Major
>
> It is used by our flakey test finder to collect flakey tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-20834) The jenkins on http://104.198.223.121:8080/job/HBASE-Flaky-Tests/ is broken

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-20834 started by Sean Busbey.
---
> The jenkins on http://104.198.223.121:8080/job/HBASE-Flaky-Tests/ is broken
> ---
>
> Key: HBASE-20834
> URL: https://issues.apache.org/jira/browse/HBASE-20834
> Project: HBase
>  Issue Type: Bug
>  Components: community, test
>Reporter: Duo Zhang
>Assignee: Sean Busbey
>Priority: Major
>
> It is used by our flakey test finder to collect flakey tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20980) Flaky test reporting should work with yetus driven builds

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-20980.
-
Resolution: Duplicate

Looks like I missed HBASE-19382 when looking at the script this morning.

> Flaky test reporting should work with yetus driven builds
> -
>
> Key: HBASE-20980
> URL: https://issues.apache.org/jira/browse/HBASE-20980
> Project: HBase
>  Issue Type: New Feature
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> our current flaky test reporting can't consume our nightly builds because it 
> presumes surefire output will go to the console. we should update it to 
> recognize when a build used yetus and then get the data it needs out of 
> artifacts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562665#comment-16562665
 ] 

stack commented on HBASE-20930:
---

+1 for branch-2.0. Thanks [~elserj]


> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20708) Remove the usage of RecoverMetaProcedure in master startup

2018-07-30 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20708:
--
Fix Version/s: 2.0.2

> Remove the usage of RecoverMetaProcedure in master startup
> --
>
> Key: HBASE-20708
> URL: https://issues.apache.org/jira/browse/HBASE-20708
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20708-v1.patch, HBASE-20708-v2.patch, 
> HBASE-20708-v3.patch, HBASE-20708-v4.patch, HBASE-20708-v5.patch, 
> HBASE-20708-v6.patch, HBASE-20708-v7.patch, HBASE-20708-v8.patch, 
> HBASE-20708-v9.patch, HBASE-20708-v9.patch, HBASE-20708.patch
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only 
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always 
> schedule a RMP when master starting up, so we still need to make sure that 
> there is no race between this RMP and other RMPs and SCPs scheduled before 
> the master restarts.
> Please see [#[accompanying design document 
> |https://docs.google.com/document/d/1_872oHzrhJq4ck7f6zmp1J--zMhsIFvXSZyX1Mxg5MA/edit#heading=h.xy1z4alsq7uy]
>  ]where we call out the problem being addressed by this issue in more detail 
> and in which we describe our new approach to Master startup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562637#comment-16562637
 ] 

Sakthi commented on HBASE-20885:


Makes sense [~elserj]. And the way to go about it would be? ( is there a way 
for manual triggering of the QA? or waiting for auto-kick off is the only 
solution?)

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch, hbase-20885.master.004.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20894) Move BucketCache from java serialization to protobuf

2018-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562636#comment-16562636
 ] 

stack commented on HBASE-20894:
---

bq. So the old code works because there's only one id ever registered?

This code has been in there since the original commit. Probably. When I look at 
it, its like something that had a purpose once but the context was undone.

bq.  Serializing class name isn't going to help us because it will be some 
garbage anonymous name...

You are right. Class would have to be named for my nonsense to work.

> Move BucketCache from java serialization to protobuf
> 
>
> Key: HBASE-20894
> URL: https://issues.apache.org/jira/browse/HBASE-20894
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 
> 0001-Write-the-CacheableDeserializerIdManager-index-into-.patch, 
> HBASE-20894.WIP-2.patch, HBASE-20894.WIP.patch, HBASE-20894.master.001.patch, 
> HBASE-20894.master.002.patch, HBASE-20894.master.003.patch
>
>
> We should use a better serialization format instead of Java Serialization for 
> the BucketCache entry persistence.
> Suggested by Chris McCown, who does not appear to have a JIRA account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20741) Split of a region with replicas creates all daughter regions and its replica in same server

2018-07-30 Thread huaxiang sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562633#comment-16562633
 ] 

huaxiang sun commented on HBASE-20741:
--

Thanks [~ram_krish] for the patch. I looked at the patch, looks good to me. One 
minor improvement, with the current one, let's say there are two repilcas, so 
for the daughter1 and daughter2's replica region, they are always assigned to 
the same rs. Can the second 
{code}
serverIdx = 0;
{code}
be removed? With a big cluster, all daughter regions will be assigned to 4 
rses. A little  better idea, given a random startServerIdx so all split will be 
assigned to random region servers.

serverIdx = Random.(0, severSize); serverIndex = (serverIndex + 1) % severSize.

> Split of a region with replicas creates all daughter regions and its replica 
> in same server
> ---
>
> Key: HBASE-20741
> URL: https://issues.apache.org/jira/browse/HBASE-20741
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HBASE-20741.patch
>
>
> Generally it is better that the parent region when split creates the daughter 
> region in the same target server. 
> But for replicas also we do the same and all the replica regions are created 
> in the same target server. We should ideally be doing a round robin and only 
> the primary daughter region should be opened in the intended target server 
> (where the parent was previously opened).
> [~huaxiang] FYI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562629#comment-16562629
 ] 

Josh Elser commented on HBASE-20885:


Seems like something is busted. 
[https://builds.apache.org/job/PreCommit-HBASE-Build/13857/console] was kicked 
off, but couldnt' clone the git repo.

I don't think it's anything wrong with your change, but we'll want to get a QA 
run to make sure other quota stuff isn't broken.

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch, hbase-20885.master.004.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20978) [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure

2018-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562627#comment-16562627
 ] 

stack commented on HBASE-20978:
---

bq. So the problem here is that, there are some holes in the wal logging, which 
means that we have finished the sub procedure but haven't woken up the parent 
procedure?

I think there is a 'hole' but its more like if we finish the subprocedure and 
crash, its possible we don't wake the parent for some reason. Maybe its an 
issue on load of the master WAL procedures? I was going to try and repro in a 
test..

> [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure
> 
>
> Key: HBASE-20978
> URL: https://issues.apache.org/jira/browse/HBASE-20978
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
>
> Testing tip of branch-2.0, ran into this:
> {code}
> 2018-07-29 01:45:33,002 INFO  [master/ve0524:16000] master.HMaster: Master 
> has completed initialization 13.854sec
>2018-07-29 
> 01:45:33,003 INFO  [PEWorker-4] procedure.MasterProcedureScheduler: pid=1820, 
> state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031 checking lock on 
> 533fb79ba23b27e9e0715b51daeb30c1  
> 2018-07-29 01:45:33,003 
> WARN  [PEWorker-4] procedure2.ProcedureExecutor: Worker terminating 
> UNNATURALLY null
> java.lang.IllegalArgumentException: pid=1820, 
> state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031
>   at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
>   
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1249)
>   
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1763)
> {code}
> It then shows as the below in the UI:
> {code}
> IdParent  State   Owner   TypeStart Time  Last Update Errors  
> Parameters
> 1820  WAITING stack   MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031   Sun Jul 29 
> 01:33:37 PDT 2018Sun Jul 29 01:33:38 PDT 2018[ { state => [ 
> '1', '2' ] }, { regionId => '1532851768240', tableName => { namespace => 
> 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, 
> startKey => 'VttDLvXHdcmzwqNdrNoUFg==', endKey => 'WGFV8k+hFqhcIJGiKZ8L4Q==', 
> offline => 'false', split => 'false', replicaId => '0' }, { sourceServer => { 
> hostName => 've0538.halxg.cloudera.com', port => '16020', startCode => 
> '1532847421672' }, destinationServer => { hostName => 
> 've0540.halxg.cloudera.com', port => '16020', startCode => '1532853151031' } 
> } ]
> {code}
> This is what we'd just read from hbase:meta:
> {code}
> 2018-07-29 01:45:32,802 INFO  [master/ve0524:16000] 
> assignment.RegionStateStore: Load hbase:meta entry 
> region=533fb79ba23b27e9e0715b51daeb30c1, regionState=CLOSED, 
> lastHost=ve0538.halxg.cloudera.com,16020,1532847421672, 
> regionLocation=ve0538.halxg.cloudera.com,16020,1532847421672, 
> openSeqNum=1544600
> {code}
> Before this, we'd just logged this:
> 2018-07-29 01:33:39,786 INFO  [PEWorker-14] assignment.RegionStateStore: 
> pid=1823 updating hbase:meta row=533fb79ba23b27e9e0715b51daeb30c1, 
> regionState=CLOSED
> Going back in history, we do the above each time the Master gets restarted so 
> the region is offlined and never brought back online.
> It is failing here:
> {code}
>   private void execProcedure(final RootProcedureState procStack,
>   final Procedure procedure) {
> Preconditions.checkArgument(procedure.getState() == 
> ProcedureState.RUNNABLE,
> procedure.toString());
> 

[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562626#comment-16562626
 ] 

Sakthi commented on HBASE-20885:


[~elserj], there seems to be an issue with the qa? Didn't trigger till first 
.003 patches, had to upload .003 again to trigger the above qa. I didn't know 
how to go about the result (docker failure), so tried uploading .004 with same 
content to see if qa throws up same error, but it hasn't started yet.

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch, hbase-20885.master.004.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562623#comment-16562623
 ] 

Josh Elser commented on HBASE-20885:


I plan on committing .004 when qa comes back.

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch, hbase-20885.master.004.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20984) Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem

2018-07-30 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562620#comment-16562620
 ] 

Sakthi commented on HBASE-20984:


Ping [~busbey].

> Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem
> --
>
> Key: HBASE-20984
> URL: https://issues.apache.org/jira/browse/HBASE-20984
> Project: HBase
>  Issue Type: Bug
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
>
> The current setup in TestWALFactory tries to create custom WAL directory 
> outside hdfs but ends up creating a custom WAL directory inside hdfs. In 
> TestWALFactory.java:
> {code:java}
> public static void setUpBeforeClass() throws Exception {
> CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
> Path("file:///tmp/wal")); // A local filesystem WAL is attempted
> ...
> hbaseDir = TEST_UTIL.createRootDir();
> hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside 
> hdfs is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
> }
> {code}
> The change was made in HBASE-20723



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562617#comment-16562617
 ] 

Hudson commented on HBASE-19369:


Results for branch branch-2.0
[build #612 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/612/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/612//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/612//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/612//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Mike Drob
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-19369.branch-2.0.001.patch, 
> HBASE-19369.master.001.patch, HBASE-19369.master.002.patch, 
> HBASE-19369.master.003.patch, HBASE-19369.master.004.patch, 
> HBASE-19369.v10.patch, HBASE-19369.v11.patch, HBASE-19369.v12.patch, 
> HBASE-19369.v13.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch, HBASE-19369.v9.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20984) Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem

2018-07-30 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-20984:
---
Description: 
The current setup in TestWALFactory tries to create custom WAL directory 
outside hdfs but ends up creating a custom WAL directory inside hdfs. In 
TestWALFactory.java:
{code:java}
public static void setUpBeforeClass() throws Exception {
CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
Path("file:///tmp/wal")); // A local filesystem WAL is attempted
...
hbaseDir = TEST_UTIL.createRootDir();
hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside hdfs 
is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
}
{code}
The change was made in HBASE-20723

  was:
The current setup in TestWALFactory tries to create custom WAL directory 
outside hdfs but ends up creating a custom WAL directory inside hdfs. In 
TestWALFactory.java:
{code:java}
public static void setUpBeforeClass() throws Exception {
CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
Path("file:///tmp/wal")); // A local filesytem WAL is attempted
...
hbaseDir = TEST_UTIL.createRootDir();
hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside hdfs 
is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
}
{code}
The change was made in HBASE-20723


> Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem
> --
>
> Key: HBASE-20984
> URL: https://issues.apache.org/jira/browse/HBASE-20984
> Project: HBase
>  Issue Type: Bug
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
>
> The current setup in TestWALFactory tries to create custom WAL directory 
> outside hdfs but ends up creating a custom WAL directory inside hdfs. In 
> TestWALFactory.java:
> {code:java}
> public static void setUpBeforeClass() throws Exception {
> CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
> Path("file:///tmp/wal")); // A local filesystem WAL is attempted
> ...
> hbaseDir = TEST_UTIL.createRootDir();
> hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside 
> hdfs is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
> }
> {code}
> The change was made in HBASE-20723



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20984) Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem

2018-07-30 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-20984:
---
Description: 
The current setup in TestWALFactory tries to create custom WAL directory 
outside hdfs but ends up creating a custom WAL directory inside hdfs. In 
TestWALFactory.java:
{code:java}
public static void setUpBeforeClass() throws Exception {
CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
Path("file:///tmp/wal")); // A local filesytem WAL is attempted
...
hbaseDir = TEST_UTIL.createRootDir();
hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside hdfs 
is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
}
{code}
The change was made in HBASE-20723

  was:
The current setup in TestWALFactory tries to create custom WAL directory 
outside hdfs but ends up creating a custom WAL directory inside hdfs.

{code:java}
public static void setUpBeforeClass() throws Exception {
CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
Path("file:///tmp/wal")); // A local filesytem WAL is attempted
...
hbaseDir = TEST_UTIL.createRootDir();
hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside hdfs 
is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
}
{code}

The change was made in HBASE-20723




> Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem
> --
>
> Key: HBASE-20984
> URL: https://issues.apache.org/jira/browse/HBASE-20984
> Project: HBase
>  Issue Type: Bug
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
>
> The current setup in TestWALFactory tries to create custom WAL directory 
> outside hdfs but ends up creating a custom WAL directory inside hdfs. In 
> TestWALFactory.java:
> {code:java}
> public static void setUpBeforeClass() throws Exception {
> CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
> Path("file:///tmp/wal")); // A local filesytem WAL is attempted
> ...
> hbaseDir = TEST_UTIL.createRootDir();
> hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside 
> hdfs is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
> }
> {code}
> The change was made in HBASE-20723



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20984) Add/Modify test case to check custom hbase.wal.dir outside hdfs filesystem

2018-07-30 Thread Sakthi (JIRA)
Sakthi created HBASE-20984:
--

 Summary: Add/Modify test case to check custom hbase.wal.dir 
outside hdfs filesystem
 Key: HBASE-20984
 URL: https://issues.apache.org/jira/browse/HBASE-20984
 Project: HBase
  Issue Type: Bug
Reporter: Sakthi
Assignee: Sakthi


The current setup in TestWALFactory tries to create custom WAL directory 
outside hdfs but ends up creating a custom WAL directory inside hdfs.

{code:java}
public static void setUpBeforeClass() throws Exception {
CommonFSUtils.setWALRootDir(TEST_UTIL.getConfiguration(), new 
Path("file:///tmp/wal")); // A local filesytem WAL is attempted
...
hbaseDir = TEST_UTIL.createRootDir();
hbaseWALDir = TEST_UTIL.createWALRootDir(); // But a directory inside hdfs 
is created here using HBaseTestingUtility#getNewDataTestDirOnTestFS
}
{code}

The change was made in HBASE-20723





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20978) [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure

2018-07-30 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562612#comment-16562612
 ] 

Duo Zhang commented on HBASE-20978:
---

IIRC we will not add a procedure in WAITING state into the ProcedureScheduler 
when master restarts? I believe the intention there is that, for a WAITING 
procedure, we will wake it up later and then push it into the 
ProcedureScheduler? So the problem here is that, there are some holes in the 
wal logging, which means that we have finished the sub procedure but haven't 
woken up the parent procedure?

> [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure
> 
>
> Key: HBASE-20978
> URL: https://issues.apache.org/jira/browse/HBASE-20978
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
>
> Testing tip of branch-2.0, ran into this:
> {code}
> 2018-07-29 01:45:33,002 INFO  [master/ve0524:16000] master.HMaster: Master 
> has completed initialization 13.854sec
>2018-07-29 
> 01:45:33,003 INFO  [PEWorker-4] procedure.MasterProcedureScheduler: pid=1820, 
> state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031 checking lock on 
> 533fb79ba23b27e9e0715b51daeb30c1  
> 2018-07-29 01:45:33,003 
> WARN  [PEWorker-4] procedure2.ProcedureExecutor: Worker terminating 
> UNNATURALLY null
> java.lang.IllegalArgumentException: pid=1820, 
> state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031
>   at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
>   
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1249)
>   
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1763)
> {code}
> It then shows as the below in the UI:
> {code}
> IdParent  State   Owner   TypeStart Time  Last Update Errors  
> Parameters
> 1820  WAITING stack   MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031   Sun Jul 29 
> 01:33:37 PDT 2018Sun Jul 29 01:33:38 PDT 2018[ { state => [ 
> '1', '2' ] }, { regionId => '1532851768240', tableName => { namespace => 
> 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, 
> startKey => 'VttDLvXHdcmzwqNdrNoUFg==', endKey => 'WGFV8k+hFqhcIJGiKZ8L4Q==', 
> offline => 'false', split => 'false', replicaId => '0' }, { sourceServer => { 
> hostName => 've0538.halxg.cloudera.com', port => '16020', startCode => 
> '1532847421672' }, destinationServer => { hostName => 
> 've0540.halxg.cloudera.com', port => '16020', startCode => '1532853151031' } 
> } ]
> {code}
> This is what we'd just read from hbase:meta:
> {code}
> 2018-07-29 01:45:32,802 INFO  [master/ve0524:16000] 
> assignment.RegionStateStore: Load hbase:meta entry 
> region=533fb79ba23b27e9e0715b51daeb30c1, regionState=CLOSED, 
> lastHost=ve0538.halxg.cloudera.com,16020,1532847421672, 
> regionLocation=ve0538.halxg.cloudera.com,16020,1532847421672, 
> openSeqNum=1544600
> {code}
> Before this, we'd just logged this:
> 2018-07-29 01:33:39,786 INFO  [PEWorker-14] assignment.RegionStateStore: 
> pid=1823 updating hbase:meta row=533fb79ba23b27e9e0715b51daeb30c1, 
> regionState=CLOSED
> Going back in history, we do the above each time the Master gets restarted so 
> the region is offlined and never brought back online.
> It is failing here:
> {code}
>   private void execProcedure(final RootProcedureState procStack,
>   final Procedure procedure) {
> Preconditions.checkArgument(procedure.getState() == 
> ProcedureState.RUNNABLE,
> procedure.toString());
> 

[jira] [Commented] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562605#comment-16562605
 ] 

Duo Zhang commented on HBASE-20979:
---

We do not need to check 200 for other two requests?

> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20977:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks, Mike and Jack. All reviews are appreciated :)

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-18070) Enable memstore replication for meta replica

2018-07-30 Thread huaxiang sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-18070 started by huaxiang sun.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Major
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20982) [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky

2018-07-30 Thread Tak Lon (Stephen) Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562578#comment-16562578
 ] 

Tak Lon (Stephen) Wu commented on HBASE-20982:
--

yeah, meanwhile I didn't know if any of you saw this {{testExportRetry}} 
failure, is it related to timeout?

>  [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky
> -
>
> Key: HBASE-20982
> URL: https://issues.apache.org/jira/browse/HBASE-20982
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.6
>Reporter: Andrew Purtell
>Priority: Major
>
> Passes for me
> {noformat}
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 390.02 
> s - in org.apache.hadoop.hbase.snapshot.TestExportSnapshot
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
> {noformat}
> but fails or times out for others. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20894) Move BucketCache from java serialization to protobuf

2018-07-30 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562573#comment-16562573
 ] 

Mike Drob commented on HBASE-20894:
---

Thanks for the additional insight, [~stack]. Doing a further review of how we 
use instances of {{CachableDeserializer}}, I'm even more baffled about what's 
going on.

There is only one non-test usage of CachableDeserializer to be registered in 
the IdManager, and it is a static instance that comes up in HFileBlock. So the 
old code works because there's only one id ever registered? Serializing class 
name isn't going to help us because it will be some garbage anonymous name. 
I'll try pulling that out into a separate class and see if it helps.

> Move BucketCache from java serialization to protobuf
> 
>
> Key: HBASE-20894
> URL: https://issues.apache.org/jira/browse/HBASE-20894
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 
> 0001-Write-the-CacheableDeserializerIdManager-index-into-.patch, 
> HBASE-20894.WIP-2.patch, HBASE-20894.WIP.patch, HBASE-20894.master.001.patch, 
> HBASE-20894.master.002.patch, HBASE-20894.master.003.patch
>
>
> We should use a better serialization format instead of Java Serialization for 
> the BucketCache entry persistence.
> Suggested by Chris McCown, who does not appear to have a JIRA account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562561#comment-16562561
 ] 

Hudson commented on HBASE-20930:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #446 (See 
[https://builds.apache.org/job/HBase-1.3-IT/446/])
HBASE-20930 MetaScanner.metaScan should respect meta table name (Vishal 
(elserj: rev b7d2e98a6484dfd6d5e51199775aecb8d4e3f3c2)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaScanner.java


> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-20885:
---
Attachment: hbase-20885.master.004.patch

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch, hbase-20885.master.004.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562544#comment-16562544
 ] 

Hudson commented on HBASE-20974:


FAILURE: Integrated in Jenkins build HBase-1.3-IT #445 (See 
[https://builds.apache.org/job/HBase-1.3-IT/445/])
HBASE-20974 Backport HBASE-20583 (SplitLogWorker should handle (apurtell: rev 
2252ed0eea4b9a0abd5cb8b75515532e651ef902)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java


> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20974.branch-1.patch, HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20583) SplitLogWorker should handle FileNotFoundException when split a wal

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562545#comment-16562545
 ] 

Hudson commented on HBASE-20583:


FAILURE: Integrated in Jenkins build HBase-1.3-IT #445 (See 
[https://builds.apache.org/job/HBase-1.3-IT/445/])
HBASE-20974 Backport HBASE-20583 (SplitLogWorker should handle (apurtell: rev 
2252ed0eea4b9a0abd5cb8b75515532e651ef902)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java


> SplitLogWorker should handle FileNotFoundException when split a wal
> ---
>
> Key: HBASE-20583
> URL: https://issues.apache.org/jira/browse/HBASE-20583
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: HBASE-20583.master.001.patch, 
> HBASE-20583.master.001.patch
>
>
> When a split task is finished, master will delete the wal first, then remove 
> the task's zk node. So if master crashed after delelte the wal, the zk task 
> node may be leaved on zk. When master resubmit this task, the task will 
> failed by FileNotFoundException.
> We also handle FileNotFoundException in WALSplitter. But not handle this in 
> SplitLogWorker.
>  
> {code:java}
>   try {
> in = getReader(path, reporter);
>   } catch (EOFException e) {
> if (length <= 0) {
>   // TODO should we ignore an empty, not-last log file if skip.errors
>   // is false? Either way, the caller should decide what to do. E.g.
>   // ignore if this is the last log in sequence.
>   // TODO is this scenario still possible if the log has been
>   // recovered (i.e. closed)
>   LOG.warn("Could not open {} for reading. File is empty", path, e);
> }
> // EOFException being ignored
> return null;
>   }
> } catch (IOException e) {
>   if (e instanceof FileNotFoundException) {
> // A wal file may not exist anymore. Nothing can be recovered so move on
> LOG.warn("File {} does not exist anymore", path, e);
> return null;
>   }
> }{code}
> {code:java}
> // Here fs.getFileStatus may throw FileNotFoundException, too. We should 
> handle this exception as the WALSplitter.getReader.
> try {
>   if (!WALSplitter.splitLogFile(walDir, fs.getFileStatus(new Path(walDir, 
> filename)),
> fs, conf, p, sequenceIdChecker,
>   server.getCoordinatedStateManager().getSplitLogWorkerCoordination(), 
> factory)) {
> return Status.PREEMPTED;
>   }
> } 
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562538#comment-16562538
 ] 

Hudson commented on HBASE-20930:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1143 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1143/])
HBASE-20930 MetaScanner.metaScan should respect meta table name (Vishal 
(elserj: rev 83690b63b80935f5c782933ead0199e4719591a3)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaScanner.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20983) Remove dependency on HBase interfaces of type InterfaceAudience.Private

2018-07-30 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal resolved HBASE-20983.
---
Resolution: Invalid

Sorry wrong project, need to create in Phoenix

> Remove dependency on HBase interfaces of type InterfaceAudience.Private
> ---
>
> Key: HBASE-20983
> URL: https://issues.apache.org/jira/browse/HBASE-20983
> Project: HBase
>  Issue Type: Task
>Reporter: Ankit Singhal
>Priority: Major
>
> Currently, the patch upgrades in HBase can break the compatibility of Phoenix 
> released for the corresponding minor version.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20983) Remove dependency on HBase interfaces of type InterfaceAudience.Private

2018-07-30 Thread Ankit Singhal (JIRA)
Ankit Singhal created HBASE-20983:
-

 Summary: Remove dependency on HBase interfaces of type 
InterfaceAudience.Private
 Key: HBASE-20983
 URL: https://issues.apache.org/jira/browse/HBASE-20983
 Project: HBase
  Issue Type: Task
Reporter: Ankit Singhal


Currently, the patch upgrades in HBase can break the compatibility of Phoenix 
released for the corresponding minor version.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562523#comment-16562523
 ] 

Hadoop QA commented on HBASE-20885:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
3s{color} | {color:red} Docker failed to build yetus/hbase:b002b0b. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-20885 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933665/hbase-20885.master.003.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13856/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562516#comment-16562516
 ] 

Hudson commented on HBASE-20974:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1142 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1142/])
HBASE-20974 Backport HBASE-20583 (SplitLogWorker should handle (apurtell: rev 
5eb4c5668841f501481bf85d54fde34bccaa4e96)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java


> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20974.branch-1.patch, HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20583) SplitLogWorker should handle FileNotFoundException when split a wal

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562517#comment-16562517
 ] 

Hudson commented on HBASE-20583:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1142 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1142/])
HBASE-20974 Backport HBASE-20583 (SplitLogWorker should handle (apurtell: rev 
5eb4c5668841f501481bf85d54fde34bccaa4e96)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java


> SplitLogWorker should handle FileNotFoundException when split a wal
> ---
>
> Key: HBASE-20583
> URL: https://issues.apache.org/jira/browse/HBASE-20583
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: HBASE-20583.master.001.patch, 
> HBASE-20583.master.001.patch
>
>
> When a split task is finished, master will delete the wal first, then remove 
> the task's zk node. So if master crashed after delelte the wal, the zk task 
> node may be leaved on zk. When master resubmit this task, the task will 
> failed by FileNotFoundException.
> We also handle FileNotFoundException in WALSplitter. But not handle this in 
> SplitLogWorker.
>  
> {code:java}
>   try {
> in = getReader(path, reporter);
>   } catch (EOFException e) {
> if (length <= 0) {
>   // TODO should we ignore an empty, not-last log file if skip.errors
>   // is false? Either way, the caller should decide what to do. E.g.
>   // ignore if this is the last log in sequence.
>   // TODO is this scenario still possible if the log has been
>   // recovered (i.e. closed)
>   LOG.warn("Could not open {} for reading. File is empty", path, e);
> }
> // EOFException being ignored
> return null;
>   }
> } catch (IOException e) {
>   if (e instanceof FileNotFoundException) {
> // A wal file may not exist anymore. Nothing can be recovered so move on
> LOG.warn("File {} does not exist anymore", path, e);
> return null;
>   }
> }{code}
> {code:java}
> // Here fs.getFileStatus may throw FileNotFoundException, too. We should 
> handle this exception as the WALSplitter.getReader.
> try {
>   if (!WALSplitter.splitLogFile(walDir, fs.getFileStatus(new Path(walDir, 
> filename)),
> fs, conf, p, sequenceIdChecker,
>   server.getCoordinatedStateManager().getSplitLogWorkerCoordination(), 
> factory)) {
> return Status.PREEMPTED;
>   }
> } 
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-30 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-20885:
---
Attachment: hbase-20885.master.003.patch

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch, 
> hbase-20885.master.003.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20930:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the patch, Vishal.

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-30 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20974:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.7
   1.3.3
   1.2.7
   Status: Resolved  (was: Patch Available)

> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20974.branch-1.patch, HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20930:
---
Fix Version/s: 1.2.7

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20930:
---
Fix Version/s: 1.4.7
   1.5.0

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562494#comment-16562494
 ] 

Josh Elser commented on HBASE-20930:


Nevermind, not applicable to 2.x after HBASE-12990. Sorry for the noise.

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.5.0, 1.3.3, 1.4.7
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562489#comment-16562489
 ] 

Josh Elser commented on HBASE-20930:


 [~stack] very trivial change here. Would you like for branch-2.0?

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters

2018-07-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562487#comment-16562487
 ] 

Hudson commented on HBASE-18477:


Results for branch HBASE-18477
[build #280 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/280/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/280//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/280//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/280//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/280//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Umbrella JIRA for HBase Read Replica clusters
> -
>
> Key: HBASE-18477
> URL: https://issues.apache.org/jira/browse/HBASE-18477
> Project: HBase
>  Issue Type: New Feature
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase 
> Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope 
> doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf
>
>
> Recently, changes (such as HBASE-17437) have unblocked HBase to run with a 
> root directory external to the cluster (such as in Amazon S3). This means 
> that the data is stored outside of the cluster and can be accessible after 
> the cluster has been terminated. One use case that is often asked about is 
> pointing multiple clusters to one root directory (sharing the data) to have 
> read resiliency in the case of a cluster failure.
>  
> This JIRA is an umbrella JIRA to contain all the tasks necessary to create a 
> read-replica HBase cluster that is pointed at the same root directory.
>  
> This requires making the Read-Replica cluster Read-Only (no metadata 
> operation or data operations).
> Separating the hbase:meta table for each cluster (Otherwise HBase gets 
> confused with multiple clusters trying to update the meta table with their ip 
> addresses)
> Adding refresh functionality for the meta table to ensure new metadata is 
> picked up on the read replica cluster.
> Adding refresh functionality for HFiles for a given table to ensure new data 
> is picked up on the read replica cluster.
>  
> This can be used with any existing cluster that is backed by an external 
> filesystem.
>  
> Please note that this feature is still quite manual (with the potential for 
> automation later).
>  
> More information on this particular feature can be found here: 
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20982) [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky

2018-07-30 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562471#comment-16562471
 ] 

Andrew Purtell commented on HBASE-20982:


Thanks. I've seen in some environments, and while tracing on the console, that 
starting up the yarn minicluster can consume a lot of time, and also it seems 
the creation of the snapshot can be slow as evidenced by HDFS level logging. 
Should instrument this test to keep track of the amount of time taken at 
various steps and dump it at the end of the test. Can compare between 
environments and revisions.

>  [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky
> -
>
> Key: HBASE-20982
> URL: https://issues.apache.org/jira/browse/HBASE-20982
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.6
>Reporter: Andrew Purtell
>Priority: Major
>
> Passes for me
> {noformat}
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 390.02 
> s - in org.apache.hadoop.hbase.snapshot.TestExportSnapshot
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
> {noformat}
> but fails or times out for others. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20919) meta region can't be re-onlined when restarting cluster if opening rsgroup

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562461#comment-16562461
 ] 

Josh Elser commented on HBASE-20919:


Sorry for the delay on the above responses, but I'd like to get some answers to 
these questions before I see this committed.

> meta region can't be re-onlined when restarting cluster if opening rsgroup
> --
>
> Key: HBASE-20919
> URL: https://issues.apache.org/jira/browse/HBASE-20919
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer, master, rsgroup
>Affects Versions: 2.0.1
>Reporter: chenyang
>Assignee: ChenYang
>Priority: Major
> Attachments: HBASE-20919-branch-2.0-01.patch, 
> HBASE-20919-branch-2.0-02.patch, HBASE-20919-branch-2.0-02.patch, bug2.png, 
> hbase-hbase-master-bjpg-rs4729.yz02.no_02patch.log, 
> hbase-hbase-master-bjpg-rs4729.yz02.with_02patch.log, 
> hbase-hbase-master-bjpg-rs4730.yz02.log.test
>
>
> if you open rsgroup, hbase-site.xml contains  below configuration.
> {code:java}
> 
>   hbase.coprocessor.master.classes
>   org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint
> 
> 
>   hbase.master.loadbalancer.class/name>
>  org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer
> 
> {code}
> And you shut down the whole HBase cluster in the way:
>  # first shut down region server one by one
>  # shut down master
> Then you restart whole cluster in the way:
>  # start master
>  # start regionserver
> The hbase:meta region can not be re-online and the rsgroup can not be 
> initialized successfully.
>  master logs:
> {code:java}
> 2018-07-12 18:27:08,775 INFO 
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker-bjpg-rs4730.yz02,16000,1531389637409]
>  rsgroup.RSGro
> upInfoManagerImpl$RSGroupStartupWorker: Waiting for catalog tables to come 
> online
> 2018-07-12 18:27:08,876 INFO 
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker-bjpg-rs4730.yz02,16000,1531389637409]
>  zookeeper.Met
> aTableLocator: Failed verification of hbase:meta,,1 at 
> address=bjpg-rs4732.yz02,60020,1531388712053, 
> exception=org.apache.hadoop.hbase.NotServingRegionExcepti
> on: hbase:meta,,1 is not online on bjpg-rs4732.yz02,60020,1531389727928
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3249)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3226)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1729)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:28286)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logs show that hbase:meta region is not online and rsgroup keeps retrying 
> to initialize.
>   
>  but why the hbase:meta region is not online?
>  The info-level logs and jstack had not enough infomation, so I added some 
> debug logs in test-source-code. Then i checked the master`s logs and region 
> server`s logs, and found the meta region assign procedure which hold the meta 
> region lock not completed and not released the lock forever, so the 
> recoverMetaProcedure could not be executed. 
>   
>  Why the first procedure not completed and not released meta region lock?
>  In the test logs, i found when assignmentManager assigned the region, it 
> need to call the rsgroup balancer which  have not been initialized 
> completely, so throw NPE.  As a result, the procedure not completed and not 
> released the lock forever.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer.generateGroupMaps(RSGroupBasedLoadBalancer.java:262)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer.roundRobinAssignment(RSGroupBasedLoadBalancer.java:162)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignmentPlans(AssignmentManager.java:1864)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1809)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$400(AssignmentManager.java:113)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1693)
> {code}
> !bug2.png!
> As shown in the figure named bug2.png listed in attachments, when we shutdown 
> the last region server, the master 

[jira] [Commented] (HBASE-20919) meta region can't be re-onlined when restarting cluster if opening rsgroup

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562459#comment-16562459
 ] 

Josh Elser commented on HBASE-20919:


{quote}Because it need start, stop, and restart whole cluster to test the case, 
so i don`t know how to offer unit tests, do you or anyone have some suggestions?
{quote}
We have the ability to start/stop HBase services via the HBaseTestingUtility. 
Lots of examples in the codebase around this already. 
{{hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionMoveAndAbandon.java}}
 is one example that I did recently.
{quote}I debug the initialization of rsgroup and test some cases. The 
initialization process is executed in a independent Thread.
{quote}
I feel like my understanding is wrong here, then. The master must be able (in 
some case) to re-assign hbase:meta w/o consulting RSGroupLoadBalancer or the 
RSGroupLoadBalancer can get itself initialized without hbase:meta being 
available. Given my current understanding, I wouldn't know how this could ever 
work.

> meta region can't be re-onlined when restarting cluster if opening rsgroup
> --
>
> Key: HBASE-20919
> URL: https://issues.apache.org/jira/browse/HBASE-20919
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer, master, rsgroup
>Affects Versions: 2.0.1
>Reporter: chenyang
>Assignee: ChenYang
>Priority: Major
> Attachments: HBASE-20919-branch-2.0-01.patch, 
> HBASE-20919-branch-2.0-02.patch, HBASE-20919-branch-2.0-02.patch, bug2.png, 
> hbase-hbase-master-bjpg-rs4729.yz02.no_02patch.log, 
> hbase-hbase-master-bjpg-rs4729.yz02.with_02patch.log, 
> hbase-hbase-master-bjpg-rs4730.yz02.log.test
>
>
> if you open rsgroup, hbase-site.xml contains  below configuration.
> {code:java}
> 
>   hbase.coprocessor.master.classes
>   org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint
> 
> 
>   hbase.master.loadbalancer.class/name>
>  org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer
> 
> {code}
> And you shut down the whole HBase cluster in the way:
>  # first shut down region server one by one
>  # shut down master
> Then you restart whole cluster in the way:
>  # start master
>  # start regionserver
> The hbase:meta region can not be re-online and the rsgroup can not be 
> initialized successfully.
>  master logs:
> {code:java}
> 2018-07-12 18:27:08,775 INFO 
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker-bjpg-rs4730.yz02,16000,1531389637409]
>  rsgroup.RSGro
> upInfoManagerImpl$RSGroupStartupWorker: Waiting for catalog tables to come 
> online
> 2018-07-12 18:27:08,876 INFO 
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker-bjpg-rs4730.yz02,16000,1531389637409]
>  zookeeper.Met
> aTableLocator: Failed verification of hbase:meta,,1 at 
> address=bjpg-rs4732.yz02,60020,1531388712053, 
> exception=org.apache.hadoop.hbase.NotServingRegionExcepti
> on: hbase:meta,,1 is not online on bjpg-rs4732.yz02,60020,1531389727928
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3249)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3226)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1729)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:28286)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logs show that hbase:meta region is not online and rsgroup keeps retrying 
> to initialize.
>   
>  but why the hbase:meta region is not online?
>  The info-level logs and jstack had not enough infomation, so I added some 
> debug logs in test-source-code. Then i checked the master`s logs and region 
> server`s logs, and found the meta region assign procedure which hold the meta 
> region lock not completed and not released the lock forever, so the 
> recoverMetaProcedure could not be executed. 
>   
>  Why the first procedure not completed and not released meta region lock?
>  In the test logs, i found when assignmentManager assigned the region, it 
> need to call the rsgroup balancer which  have not been initialized 
> completely, so throw NPE.  As a result, the procedure not completed and not 
> released the lock forever.
> {code:java}
> java.lang.NullPointerException
> at 
> 

[jira] [Commented] (HBASE-20968) list_procedures_test fails due to no matching regex

2018-07-30 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562418#comment-16562418
 ] 

Ted Yu commented on HBASE-20968:


There are several tests written in ruby:
{code}
ls hbase-shell/src/test/ruby/shell/
commands_test.rb  converter_test.rb  formatter_test.rb  list_locks_test.rb  
list_procedures_test.rb  noninteractive_test.rb  rsgroup_shell_test.rb  
shell_test.rb
{code}
What I did when trying to reproduce test failure was to sideline all .rb files 
except for list_procedures_test.rb

This way TestShell would run just the test we're interested in.

> list_procedures_test fails due to no matching regex
> ---
>
> Key: HBASE-20968
> URL: https://issues.apache.org/jira/browse/HBASE-20968
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jack Bearden
>Priority: Major
>
> From test output against hadoop3:
> {code}
> 2018-07-28 12:04:24,838 DEBUG [Time-limited test] 
> procedure2.ProcedureExecutor(948): Stored pid=12, state=RUNNABLE, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.  
> ShellTestProcedure
> 2018-07-28 12:04:24,864 INFO  [RS-EventLoopGroup-1-3] 
> ipc.ServerRpcConnection(556): Connection from 172.18.128.12:46918, 
> version=3.0.0-SNAPSHOT, sasl=false, ugi=hbase (auth: SIMPLE), 
> service=MasterService
> 2018-07-28 12:04:24,900 DEBUG [Thread-114] master.MasterRpcServices(1157): 
> Checking to see if procedure is done pid=11
> ^[[38;5;196mF^[[0m
> ===
> Failure: 
> ^[[48;5;124;38;5;231;1mtest_list_procedures(Hbase::ListProceduresTest)^[[0m
> src/test/ruby/shell/list_procedures_test.rb:65:in `block in 
> test_list_procedures'
>  62: end
>  63:   end
>  64:
> ^[[48;5;124;38;5;231;1m  => 65:   assert_equal(1, matching_lines)^[[0m
>  66: end
>  67:   end
>  68: end
> <^[[48;5;34;38;5;231;1m1^[[0m> expected but was
> <^[[48;5;124;38;5;231;1m0^[[0m>
> ===
> ...
> 2018-07-28 12:04:25,374 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1316): Finished pid=12, state=SUCCESS, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.   
> ShellTestProcedure in 336msec
> {code}
> The completion of the ShellTestProcedure was after the assertion was raised.
> {code}
> def create_procedure_regexp(table_name)
>   regexp_string = '[0-9]+ .*ShellTestProcedure SUCCESS.*' \
> {code}
> The regex used by the test isn't found in test output either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20968) list_procedures_test fails due to no matching regex

2018-07-30 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562413#comment-16562413
 ] 

Ted Yu commented on HBASE-20968:


The test failure happens for hadoop2 as well:

https://builds.apache.org/job/HBase%20Nightly/job/master/413//testReport/junit/org.apache.hadoop.hbase.client/TestShell/health_checks___yetus_jdk8_hadoop2_checks___testRunShellTests/

Can you see if you can find some clue from test output ?

> list_procedures_test fails due to no matching regex
> ---
>
> Key: HBASE-20968
> URL: https://issues.apache.org/jira/browse/HBASE-20968
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jack Bearden
>Priority: Major
>
> From test output against hadoop3:
> {code}
> 2018-07-28 12:04:24,838 DEBUG [Time-limited test] 
> procedure2.ProcedureExecutor(948): Stored pid=12, state=RUNNABLE, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.  
> ShellTestProcedure
> 2018-07-28 12:04:24,864 INFO  [RS-EventLoopGroup-1-3] 
> ipc.ServerRpcConnection(556): Connection from 172.18.128.12:46918, 
> version=3.0.0-SNAPSHOT, sasl=false, ugi=hbase (auth: SIMPLE), 
> service=MasterService
> 2018-07-28 12:04:24,900 DEBUG [Thread-114] master.MasterRpcServices(1157): 
> Checking to see if procedure is done pid=11
> ^[[38;5;196mF^[[0m
> ===
> Failure: 
> ^[[48;5;124;38;5;231;1mtest_list_procedures(Hbase::ListProceduresTest)^[[0m
> src/test/ruby/shell/list_procedures_test.rb:65:in `block in 
> test_list_procedures'
>  62: end
>  63:   end
>  64:
> ^[[48;5;124;38;5;231;1m  => 65:   assert_equal(1, matching_lines)^[[0m
>  66: end
>  67:   end
>  68: end
> <^[[48;5;34;38;5;231;1m1^[[0m> expected but was
> <^[[48;5;124;38;5;231;1m0^[[0m>
> ===
> ...
> 2018-07-28 12:04:25,374 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1316): Finished pid=12, state=SUCCESS, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.   
> ShellTestProcedure in 336msec
> {code}
> The completion of the ShellTestProcedure was after the assertion was raised.
> {code}
> def create_procedure_regexp(table_name)
>   regexp_string = '[0-9]+ .*ShellTestProcedure SUCCESS.*' \
> {code}
> The regex used by the test isn't found in test output either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20982) [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky

2018-07-30 Thread Tak Lon (Stephen) Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562414#comment-16562414
 ] 

Tak Lon (Stephen) Wu commented on HBASE-20982:
--

this is what I saw when building with JDK 8u122 for failed TestExportSnapshot
{noformat}
  [ERROR] 
testSnapshotWithRefsExportFileSystemState(org.apache.hadoop.hbase.snapshot.TestExportSnapshot)
  Time elapsed: 180.038 s  <<< ERROR!
  org.junit.runners.model.TestTimedOutException: test timed out after 180 
seconds
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:291)
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:265)
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testSnapshotWithRefsExportFileSystemState(TestExportSnapshot.java:259)
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testSnapshotWithRefsExportFileSystemState(TestExportSnapshot.java:243)
  
  [ERROR] 
testExportFileSystemStateWithSkipTmp(org.apache.hadoop.hbase.snapshot.TestExportSnapshot)
  Time elapsed: 180.023 s  <<< ERROR!
  org.junit.runners.model.TestTimedOutException: test timed out after 180 
seconds
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:291)
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:265)
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemStateWithSkipTmp(TestExportSnapshot.java:203)
  
  [ERROR] testExportRetry(org.apache.hadoop.hbase.snapshot.TestExportSnapshot)  
Time elapsed: 5.505 s  <<< FAILURE!
  java.lang.AssertionError: expected:<0> but was:<1>
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportRetry(TestExportSnapshot.java:328)
  
  [ERROR] 
testExportFailure(org.apache.hadoop.hbase.snapshot.TestExportSnapshot)  Time 
elapsed: 180.024 s  <<< ERROR!
  org.junit.runners.model.TestTimedOutException: test timed out after 180 
seconds
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.runExportAndInjectFailures(TestExportSnapshot.java:347)
  at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFailure(TestExportSnapshot.java:320)

 

{noformat}

>  [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky
> -
>
> Key: HBASE-20982
> URL: https://issues.apache.org/jira/browse/HBASE-20982
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.6
>Reporter: Andrew Purtell
>Priority: Major
>
> Passes for me
> {noformat}
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 390.02 
> s - in org.apache.hadoop.hbase.snapshot.TestExportSnapshot
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
> {noformat}
> but fails or times out for others. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20935) HStore.removeCompactedfiles should log incase it unable to delete a file

2018-07-30 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562401#comment-16562401
 ] 

Andrew Purtell commented on HBASE-20935:


This looks ready to go. Committing today where relevant.

> HStore.removeCompactedfiles should log incase it unable to delete a file
> 
>
> Key: HBASE-20935
> URL: https://issues.apache.org/jira/browse/HBASE-20935
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20935.branch-1.3.patch, 
> HBASE-20935.branch-1.3.v2.patch, HBASE-20935.patch, HBASE-20935.v2.patch
>
>
> if (r != null && r.isCompactedAway() && !r.isReferencedInReads())
> If above check fails then there will be some files which are compacted but 
> not getting cleaned up. It is good to log which helps in debugging the issue. 
> This would let us know why is getting cleaned. either with reference pending 
> or compatedaway is not set.
> This will help debug issues like :
>  # HBASE-20933



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-30 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562400#comment-16562400
 ] 

Andrew Purtell commented on HBASE-20974:


Thanks for the patch and the review. The precommit results are showing 
environmental problems. Let me try this out locally and commit if it looks 
good. Will pick back to all relevant 1.x branches. 

> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-20974.branch-1.patch, HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19316) Direct invocation Client-Server short-circuit without having to pass through the eye of a protobuf stub

2018-07-30 Thread Sahil Aggarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562396#comment-16562396
 ] 

Sahil Aggarwal commented on HBASE-19316:


[~stack] the description you added finally lead me to 
ShortCircuitingClusterConnection in ConnectionUtils. Now, what i understood is 
that it still uses protobuf service stubs for admin and client services and 
also returns the same for master. Here, probably we don't need to go through 
protobuf and can bypass it. 

If i got it right, any top of the head possible ways here?  As I see that it is 
tied to the ClusterConnection interface which has methods declared to return 
such stubs. Connection seem to be independent of such stubs and can be easily 
decoupled i guess.

 

Thanks!

 

> Direct invocation Client-Server short-circuit without having to pass through 
> the eye of a protobuf stub
> ---
>
> Key: HBASE-19316
> URL: https://issues.apache.org/jira/browse/HBASE-19316
> Project: HBase
>  Issue Type: Improvement
>  Components: rpc
>Reporter: stack
>Priority: Major
>  Labels: beginner
>
> In hbase, on server-side, we have a short-circuit facility that bypasses RPC 
> by directly hooking the client and server protobuf Services up to each other.
> Passing through the Protobuf Service stub requires that the invocation be 
> cast as protobufs -- the invocation itself and all params are converted to 
> protobuf to pass through the eye of the protobuf Service stub. Can we do 
> better and make direct invocations w/o having to do the protobuf 
> marshalling/unmarshalling? (Can we do it in a way that is not brittle in need 
> of careful repair whenever a change is made?). It would make for some nice 
> savings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20982) [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky

2018-07-30 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20982:
---
Summary:  [branch-1] TestExportSnapshot and TestSecureExportSnapshot are 
flaky  (was:  [branch-1] TestExportSnapshot is flaky)

>  [branch-1] TestExportSnapshot and TestSecureExportSnapshot are flaky
> -
>
> Key: HBASE-20982
> URL: https://issues.apache.org/jira/browse/HBASE-20982
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.6
>Reporter: Andrew Purtell
>Priority: Major
>
> Passes for me
> {noformat}
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 390.02 
> s - in org.apache.hadoop.hbase.snapshot.TestExportSnapshot
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
> {noformat}
> but fails or times out for others. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20982) [branch-1] TestExportSnapshot is flaky

2018-07-30 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-20982:
--

 Summary:  [branch-1] TestExportSnapshot is flaky
 Key: HBASE-20982
 URL: https://issues.apache.org/jira/browse/HBASE-20982
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.4.6
Reporter: Andrew Purtell


Passes for me
{noformat}
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 390.02 s 
- in org.apache.hadoop.hbase.snapshot.TestExportSnapshot
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
{noformat}

but fails or times out for others. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Jack Bearden (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562383#comment-16562383
 ] 

Jack Bearden commented on HBASE-20977:
--

Okay, thanks [~mdrob]. I should clarify as well that I haven't actually checked 
that the code matches the new description. I simply find that the new 
documentation makes way more sense now than it did before.

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562380#comment-16562380
 ] 

Hadoop QA commented on HBASE-20977:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m  
1s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
18s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
38s{color} | {color:blue} branch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
33s{color} | {color:blue} patch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20977 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933628/HBASE-20977.001.patch 
|
| Optional Tests |  asflicense  refguide  |
| uname | Linux db9ecdc3f008 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / c075f33fc7 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13855/artifact/patchprocess/branch-site/book.html
 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13855/artifact/patchprocess/patch-site/book.html
 |
| Max. process+thread count | 93 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13855/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562375#comment-16562375
 ] 

Mike Drob edited comment on HBASE-20977 at 7/30/18 7:12 PM:


+1 binding, if it's good enough for Jack it's good enough for me.

Edit: To clarify, I think in this particular case a non-binding vote is _more_ 
important than a binding vote because of the specific audience of the 
documentation provided.


was (Author: mdrob):
+1 binding, if it's good enough for Jack it's good enough for me.

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562378#comment-16562378
 ] 

Josh Elser commented on HBASE-20930:


I can't get TestSplitTransactionOnCluster to fail for me (with or without your 
patch). I'm guessing it's just flaky. Let me see if this applies to all 
branches.

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562375#comment-16562375
 ] 

Mike Drob commented on HBASE-20977:
---

+1 binding, if it's good enough for Jack it's good enough for me.

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Jack Bearden (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562374#comment-16562374
 ] 

Jack Bearden commented on HBASE-20977:
--

+1 (non-binding)

Looks great! Thanks for this

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20977:
---
Attachment: HBASE-20977.001.patch

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20977:
---
Fix Version/s: 3.0.0

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20977) Don't use the word "Snapshot" when defining "HBase Snapshots"

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20977:
---
Status: Patch Available  (was: Open)

> Don't use the word "Snapshot" when defining "HBase Snapshots"
> -
>
> Key: HBASE-20977
> URL: https://issues.apache.org/jira/browse/HBASE-20977
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-20977.001.patch
>
>
> [From 
> http://hbase.apache.org/book.html#ops.snapshots|http://hbase.apache.org/book.html#ops.snapshots]
> {quote}HBase Snapshots allow you to take a snapshot of a table without too 
> much impact on Region Servers
> {quote}
> We should change this to not use the word "snapshot" when defining what HBase 
> Snapshots are. It's confusing enough to English-as-a-first-language 
> individuals; I imagine it's even more cyclical to ESL individuals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20886) [Auth] Support keytab login in hbase client

2018-07-30 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562329#comment-16562329
 ] 

Wei-Chiu Chuang commented on HBASE-20886:
-

It's just too bad HADOOP-9567 never completed. User identity is a tricky & 
sensitive issue and it should ideally be handled within Hadoop.

> [Auth] Support keytab login in hbase client
> ---
>
> Key: HBASE-20886
> URL: https://issues.apache.org/jira/browse/HBASE-20886
> Project: HBase
>  Issue Type: New Feature
>  Components: asyncclient, Client, security
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20886.master.001.patch, 
> HBASE-20886.master.002.patch, HBASE-20886.master.003.patch, 
> HBASE-20886.master.004.patch, HBASE-20886.master.005.patch, 
> HBASE-20886.master.006.patch, HBASE-20886.master.007.patch, 
> HBASE-20886.master.008.patch
>
>
> There're lots of questions about how to connect to kerberized hbase cluster 
> through hbase-client api from user-mail and slack channel.
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> already existed in code base, but they are only used in {{Canary}}.
> This issue is to make use of two configs to support client-side keytab based 
> login, after this issue resolved, hbase-client should directly connect to 
> kerberized cluster without changing any code as long as 
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20968) list_procedures_test fails due to no matching regex

2018-07-30 Thread Jack Bearden (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562325#comment-16562325
 ] 

Jack Bearden commented on HBASE-20968:
--

Yes, I tried the following:

1) Without profile, without version
2) With profile, without version

What does your install command look like?

> list_procedures_test fails due to no matching regex
> ---
>
> Key: HBASE-20968
> URL: https://issues.apache.org/jira/browse/HBASE-20968
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jack Bearden
>Priority: Major
>
> From test output against hadoop3:
> {code}
> 2018-07-28 12:04:24,838 DEBUG [Time-limited test] 
> procedure2.ProcedureExecutor(948): Stored pid=12, state=RUNNABLE, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.  
> ShellTestProcedure
> 2018-07-28 12:04:24,864 INFO  [RS-EventLoopGroup-1-3] 
> ipc.ServerRpcConnection(556): Connection from 172.18.128.12:46918, 
> version=3.0.0-SNAPSHOT, sasl=false, ugi=hbase (auth: SIMPLE), 
> service=MasterService
> 2018-07-28 12:04:24,900 DEBUG [Thread-114] master.MasterRpcServices(1157): 
> Checking to see if procedure is done pid=11
> ^[[38;5;196mF^[[0m
> ===
> Failure: 
> ^[[48;5;124;38;5;231;1mtest_list_procedures(Hbase::ListProceduresTest)^[[0m
> src/test/ruby/shell/list_procedures_test.rb:65:in `block in 
> test_list_procedures'
>  62: end
>  63:   end
>  64:
> ^[[48;5;124;38;5;231;1m  => 65:   assert_equal(1, matching_lines)^[[0m
>  66: end
>  67:   end
>  68: end
> <^[[48;5;34;38;5;231;1m1^[[0m> expected but was
> <^[[48;5;124;38;5;231;1m0^[[0m>
> ===
> ...
> 2018-07-28 12:04:25,374 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1316): Finished pid=12, state=SUCCESS, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.   
> ShellTestProcedure in 336msec
> {code}
> The completion of the ShellTestProcedure was after the assertion was raised.
> {code}
> def create_procedure_regexp(table_name)
>   regexp_string = '[0-9]+ .*ShellTestProcedure SUCCESS.*' \
> {code}
> The regex used by the test isn't found in test output either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20967) TestFromClientSide3 fails with NPE

2018-07-30 Thread Jack Bearden (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562315#comment-16562315
 ] 

Jack Bearden commented on HBASE-20967:
--

Ok, thank you for the clarification [~Apache9]. What is the difference between 
the two environments (local and jenkins) that would cause this kind of 
behavior? Namely, tests passing locally but not on Jenkins. What is Jenkins 
doing differently?

> TestFromClientSide3 fails with NPE
> --
>
> Key: HBASE-20967
> URL: https://issues.apache.org/jira/browse/HBASE-20967
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
>
>
> https://builds.apache.org/job/HBASE-Flaky-Tests/35375/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3/testLockLeakWithDelta/
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.find(TestFromClientSide3.java:995)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.find(TestFromClientSide3.java:1002)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.testLockLeakWithDelta(TestFromClientSide3.java:783)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562289#comment-16562289
 ] 

Josh Elser commented on HBASE-20657:


Also, fyi [~stack] since you seem to have stumbled onto HBASE-20569 as a part 
of the out-of-order proc log.

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-30 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20657:
---
Fix Version/s: 2.1.1

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-30 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562239#comment-16562239
 ] 

Josh Elser commented on HBASE-20657:


{quote}To reproduce the problem - make MTP holding the lock (as the patch does) 
and run the provided test. Everything will end up with a number of regions 
stuck in RiT state forever
{quote}
Nice!
{code:java}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
index 69a6e8f..52217f1 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
@@ -197,7 +197,8 @@ public class MasterProcedureScheduler extends 
AbstractProcedureScheduler {
   // check if the next procedure is still a child.
   // if not, remove the rq from the fairq and go back to the xlock state
   Procedure nextProc = rq.peek();
-  if (nextProc != null && !Procedure.haveSameParent(nextProc, pollResult)) 
{
+  if (nextProc != null && !Procedure.haveSameParent(nextProc, pollResult)
+  && nextProc.getRootProcId() != pollResult.getRootProcId()) {
 removeFromRunQueue(fairq, rq);
   }
 }{code}
I don't know a reason why this shouldn't also be applied to 2.x, but maybe 
[~zghaobac] or [~Apache9] know of a reason (after looking at HBASE-20569) that 
we don't want this change in 2.x?

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562229#comment-16562229
 ] 

Hadoop QA commented on HBASE-20979:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
1s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} pylint {color} | {color:red}  0m  
3s{color} | {color:red} The patch generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  0m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20979 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933618/HBASE-20979.0.txt |
| Optional Tests |  asflicense  pylint  |
| uname | Linux 8bfe84bb207e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / c075f33fc7 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| pylint | v1.6.5 |
| pylint | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13854/artifact/patchprocess/diff-patch-pylint.txt
 |
| Max. process+thread count | 43 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13854/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562226#comment-16562226
 ] 

stack commented on HBASE-20893:
---

Thanks for the reminder [~allan163]. I filed HBASE-20981 quoting you from above.

> Data loss if splitting region while ServerCrashProcedure executing
> --
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893-branch-2.0.addendum.patch, 
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, 
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, 
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20981) Rollback stateCount accounting thrown-off when exception out of rollbackState

2018-07-30 Thread stack (JIRA)
stack created HBASE-20981:
-

 Summary: Rollback stateCount accounting thrown-off when exception 
out of rollbackState
 Key: HBASE-20981
 URL: https://issues.apache.org/jira/browse/HBASE-20981
 Project: HBase
  Issue Type: Bug
  Components: amv2
Affects Versions: 2.0.1
Reporter: stack
 Fix For: 2.0.2


Found by might [~allan163] over in HBASE-20893. Quoting Allan:

{code}
But, there is truly a bug here,

  @Override
  protected void rollback(final TEnvironment env)
  throws IOException, InterruptedException {
if (isEofState()) stateCount--;
try {
  updateTimestamp();
  rollbackState(env, getCurrentState());
  stateCount--;
} finally {
  updateTimestamp();
}
  }
We need to decrease the stateCount when rolling back, so we can rollback for 
the previous state correctly. But. since a exception is thrown, the decrease 
for stateCount never happen. So ProcedureExecutor will continue to rollback for 
only one state(the one throw a exception) until the end of the execution stack.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-20979:

Status: Patch Available  (was: In Progress)

-v0
  - enumerate which JSON fields we'll read when making the request
  - in our first pull for information about a given job url, look for non 
HTTP-200 responses and error out. (instead of erroring out on "can't get a JSON 
object")


Tested this with some of our jobs and an internal build system at my employer. 
Note that our current nightly jobs will just report "no tests" for every build 
due to HBASE-20980.

> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-20979:

Attachment: HBASE-20979.0.txt

> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Attachments: HBASE-20979.0.txt
>
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20980) Flaky test reporting should work with yetus driven builds

2018-07-30 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-20980:
---

 Summary: Flaky test reporting should work with yetus driven builds
 Key: HBASE-20980
 URL: https://issues.apache.org/jira/browse/HBASE-20980
 Project: HBase
  Issue Type: New Feature
  Components: test
Reporter: Sean Busbey
Assignee: Sean Busbey


our current flaky test reporting can't consume our nightly builds because it 
presumes surefire output will go to the console. we should update it to 
recognize when a build used yetus and then get the data it needs out of 
artifacts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20978) [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure

2018-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562191#comment-16562191
 ] 

stack commented on HBASE-20978:
---

I should be able to manufacture this condition in a unit test... Will be back.

> [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure
> 
>
> Key: HBASE-20978
> URL: https://issues.apache.org/jira/browse/HBASE-20978
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
>
> Testing tip of branch-2.0, ran into this:
> {code}
> 2018-07-29 01:45:33,002 INFO  [master/ve0524:16000] master.HMaster: Master 
> has completed initialization 13.854sec
>2018-07-29 
> 01:45:33,003 INFO  [PEWorker-4] procedure.MasterProcedureScheduler: pid=1820, 
> state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031 checking lock on 
> 533fb79ba23b27e9e0715b51daeb30c1  
> 2018-07-29 01:45:33,003 
> WARN  [PEWorker-4] procedure2.ProcedureExecutor: Worker terminating 
> UNNATURALLY null
> java.lang.IllegalArgumentException: pid=1820, 
> state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031
>   at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
>   
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1249)
>   
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1763)
> {code}
> It then shows as the below in the UI:
> {code}
> IdParent  State   Owner   TypeStart Time  Last Update Errors  
> Parameters
> 1820  WAITING stack   MoveRegionProcedure 
> hri=533fb79ba23b27e9e0715b51daeb30c1, 
> source=ve0538.halxg.cloudera.com,16020,1532847421672, 
> destination=ve0540.halxg.cloudera.com,16020,1532853151031   Sun Jul 29 
> 01:33:37 PDT 2018Sun Jul 29 01:33:38 PDT 2018[ { state => [ 
> '1', '2' ] }, { regionId => '1532851768240', tableName => { namespace => 
> 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, 
> startKey => 'VttDLvXHdcmzwqNdrNoUFg==', endKey => 'WGFV8k+hFqhcIJGiKZ8L4Q==', 
> offline => 'false', split => 'false', replicaId => '0' }, { sourceServer => { 
> hostName => 've0538.halxg.cloudera.com', port => '16020', startCode => 
> '1532847421672' }, destinationServer => { hostName => 
> 've0540.halxg.cloudera.com', port => '16020', startCode => '1532853151031' } 
> } ]
> {code}
> This is what we'd just read from hbase:meta:
> {code}
> 2018-07-29 01:45:32,802 INFO  [master/ve0524:16000] 
> assignment.RegionStateStore: Load hbase:meta entry 
> region=533fb79ba23b27e9e0715b51daeb30c1, regionState=CLOSED, 
> lastHost=ve0538.halxg.cloudera.com,16020,1532847421672, 
> regionLocation=ve0538.halxg.cloudera.com,16020,1532847421672, 
> openSeqNum=1544600
> {code}
> Before this, we'd just logged this:
> 2018-07-29 01:33:39,786 INFO  [PEWorker-14] assignment.RegionStateStore: 
> pid=1823 updating hbase:meta row=533fb79ba23b27e9e0715b51daeb30c1, 
> regionState=CLOSED
> Going back in history, we do the above each time the Master gets restarted so 
> the region is offlined and never brought back online.
> It is failing here:
> {code}
>   private void execProcedure(final RootProcedureState procStack,
>   final Procedure procedure) {
> Preconditions.checkArgument(procedure.getState() == 
> ProcedureState.RUNNABLE,
> procedure.toString());
> {code}
> Its the parent move region that is trying to run and failing. It is not 
> RUNNABLE? Because the subprocedure was 'done' but not fully?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-20979) Flaky test reporting should specify what JSON it needs and handle HTTP errors

2018-07-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-20979 started by Sean Busbey.
---
> Flaky test reporting should specify what JSON it needs and handle HTTP errors
> -
>
> Key: HBASE-20979
> URL: https://issues.apache.org/jira/browse/HBASE-20979
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
>
> Current flaky test report should be including the {{tree=}} parameter in its 
> Jenkins API calls (see 
> https://support.cloudbees.com/hc/en-us/articles/217911388-Best-Practice-For-Using-Jenkins-REST-API).
> Also should provide some info on failure so that when jobs change or go away 
> we don't get blank failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >