[jira] [Assigned] (HBASE-21325) Add a max wait time for waitOnAllRegionsToClose

2018-10-17 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-21325:
--

Assignee: Guanghao Zhang

> Add a max wait time for waitOnAllRegionsToClose
> ---
>
> Key: HBASE-21325
> URL: https://issues.apache.org/jira/browse/HBASE-21325
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> When testing sync replication, I found that, if I transit the remote cluster 
> to DA, while the local cluster is still in A, the region server will hang 
> when shutdown. As the fsOk flag only test the local cluster(which is 
> reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is 
> broken(the remote wal directory is gone)  so we will never succeed. And this 
> lead to an infinite wait inside waitOnAllRegionsToClose.
> So I think here we should have an upper bound for the wait time in 
> waitOnAllRegionsToClose method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-10-17 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654664#comment-16654664
 ] 

Sean Busbey commented on HBASE-19682:
-

Thanks for re-running the qabot. it looks like my rerun of v5 got about the 
same.

could you regenerate the patch using {{git format-patch}} instead of {{git 
diff}}?

> Use Collections.emptyList() For Empty List Values
> -
>
> Key: HBASE-19682
> URL: https://issues.apache.org/jira/browse/HBASE-19682
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19682.1.patch, HBASE-19682.2.patch, 
> HBASE-19682.3.1.patch, HBASE-19682.4.patch, HBASE-19682.5.patch, 
> HBASE-19682.6.patch, example.patch
>
>
> Use {{Collection.emptyList()}} for returning an empty list instead of 
> {{return new ArrayList<> ()}}.  The default constructor creates a buffer of 
> size 10 for _ArrayList_ therefore, returning this static value saves on some 
> memory and GC pressure and saves time not having to allocate a new internally 
> buffer for each instantiation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654657#comment-16654657
 ] 

Hadoop QA commented on HBASE-19682:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
36s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} hbase-server: The patch generated 0 new + 157 
unchanged - 1 fixed = 157 total (was 158) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
27s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}127m 
26s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-19682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944463/HBASE-19682.6.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 6d175940fdb8 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3a75505cf2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14739/testReport/ |
| Max. process+thread count | 4647 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console 

[jira] [Commented] (HBASE-21269) Forward-port to branch-2 " HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign"

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654648#comment-16654648
 ] 

stack commented on HBASE-21269:
---

[~tianjingyun] Will do. Let me rerun to see... 

> Forward-port to branch-2 " HBASE-21213 [hbck2] bypass leaves behind state 
> in RegionStates when assign/unassign"
> ---
>
> Key: HBASE-21269
> URL: https://issues.apache.org/jira/browse/HBASE-21269
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21269.branch-2.001.patch, 
> HBASE-21269.master.001.patch, HBASE-21269.master.003.patch, 
> HBASE-21269.master.004.patch, HBASE-21269.master.004.patch
>
>
> A bunch of this patch does not apply to branch-2 and master now we don't have 
> AP or UP anymore. Need to figure if we need override in branch-2 and master. 
> Let me upload the forward-port done so far. Can finish this when move to 
> branch-2.2 exercise. FYI [~Apache9]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21269) Forward-port to branch-2 " HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign"

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21269:
--
Attachment: HBASE-21269.master.004.patch

> Forward-port to branch-2 " HBASE-21213 [hbck2] bypass leaves behind state 
> in RegionStates when assign/unassign"
> ---
>
> Key: HBASE-21269
> URL: https://issues.apache.org/jira/browse/HBASE-21269
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21269.branch-2.001.patch, 
> HBASE-21269.master.001.patch, HBASE-21269.master.003.patch, 
> HBASE-21269.master.004.patch, HBASE-21269.master.004.patch
>
>
> A bunch of this patch does not apply to branch-2 and master now we don't have 
> AP or UP anymore. Need to figure if we need override in branch-2 and master. 
> Let me upload the forward-port done so far. Can finish this when move to 
> branch-2.2 exercise. FYI [~Apache9]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21073) "Maintenance mode" master

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654646#comment-16654646
 ] 

Hadoop QA commented on HBASE-21073:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 11s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hbase-zookeeper in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}127m 
37s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
50s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}176m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21073 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944456/HBASE-21073.master.009.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 8d76f0034950 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3a75505cf2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |

[jira] [Commented] (HBASE-21322) Add a scheduleServerCrashProcedure() API to HbckService

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654640#comment-16654640
 ] 

stack commented on HBASE-21322:
---

On startup, Master tries to read content in HDFS. Interesting that it is not 
finding the -splitting.

I do not object to our being able to schedule an SCP. It might be needed at 
some point.

> Add a scheduleServerCrashProcedure() API to HbckService
> ---
>
> Key: HBASE-21322
> URL: https://issues.apache.org/jira/browse/HBASE-21322
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: Screenshot from 2018-10-17 13-35-58.png, Screenshot from 
> 2018-10-17 13-38-41.png, Screenshot from 2018-10-17 13-47-06.png
>
>
> According to my test, if one RS is down, then all procedure logs are deleted, 
> it will lead to that no ServerCrashProcedure is scheduled. And restarting 
> master cannot help. Thus we need to schedule a ServerCrashProcedure manually 
> to solve the problem. I plan to add a scheduleServerCrashProcedure() API to 
> HbckService, then add this API to HBCK2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654639#comment-16654639
 ] 

stack commented on HBASE-18792:
---

If OPENING, try assigning. If CLOSING, try unassigning, and then assigning once 
CLOSED if meant to be open.

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654628#comment-16654628
 ] 

stack commented on HBASE-21291:
---

So, if override is set, make the waitTime some nominal amount -- say 10ms? This 
way we wait on the lock for a little while but will proceed after 10ms even if 
we don't get the lock?

So, doing bypass on an AssignProcedure, I see that it reports this:

2424243 2341740 RUNNABLE(Bypass)

... but it still has exclusive lock held.

{code}
REGION: 2651cb48574979f2dccc64e3c02ad5e0

Lock type: EXCLUSIVE

Owner procedure: { ID => '2424243', PARENT_ID => '2341740', STATE => 
'RUNNABLE', OWNER => 'hbase', TYPE => 'AssignProcedure 
table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0', START_TIME => 'Wed Oct 17 12:41:37 
PDT 2018', LAST_UPDATE => 'Wed Oct 17 15:29:53 PDT 2018', PARAMETERS => [ { 
transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => 
'1534357375831', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 
'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdF8yMDE4MDgxNTEwNDA0NA==' }, startKey => 
'Np+lMnWysqQ=', endKey => 'NqGeiPHP0F8=', offline => 'false', split => 'false', 
replicaId => '0' } } ] }
{code}

Maybe the lock is held because we are not running running this Procedure...  PE 
is jammed up?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654621#comment-16654621
 ] 

stack commented on HBASE-21291:
---

Also interesting is that I must pass waitTime of non-zero even when trying to 
bypass an Assign Procedure even though it is not a state machine procedure.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654619#comment-16654619
 ] 

stack commented on HBASE-21291:
---

Is it possible that bypass no longer works. IIRC, I could bypass a stuck 
Assign... now it does this but it stays stuck. Says it is bypassed but lock is 
still held:

{code}
2018-10-17 21:12:21,051 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Begin bypass pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0 with lockWait=1000, override=true, 
recursive=true
2018-10-17 21:12:21,051 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0
2018-10-17 21:12:21,260 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2341740, 
state=WAITING:SERVER_CRASH_HANDLE_RIT2, bypass=LOG-REDACTED 
ServerCrashProcedure server=vb1406.halxg.cloudera.com,22101,1539750561781, 
splitWal=true, meta=false
2018-10-17 21:12:21,386 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0 and its ancestors successfully, adding 
to queue
{code}

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654618#comment-16654618
 ] 

Hudson commented on HBASE-21198:


Results for branch branch-2.0
[build #966 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/966/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/966//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/966//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/966//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Exclude dependency on net.minidev:json-smart
> 
>
> Key: HBASE-21198
> URL: https://issues.apache.org/jira/browse/HBASE-21198
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21198.v01.patch, HBASE-21198.v01.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
>  :
> {code}
> [ERROR] Failed to execute goal on project hbase-common: Could not resolve 
> dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
> Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 
> -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor 
> for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
> net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
> (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied 
> to: 
> https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
>  , ReasonPhrase:Forbidden. -> [Help 1]
> {code}
> We should exclude dependency on net.minidev:json-smart
> hbase-common/bin/pom.xml has done so.
> The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654614#comment-16654614
 ] 

StephenLu commented on HBASE-18792:
---

Yes.In the Hbase master UI,regions in transition.

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654612#comment-16654612
 ] 

stack commented on HBASE-18792:
---

hbck1 is unreliable reading hbase2 clusters. Are you seeing a problem in your 
region deploy?

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654610#comment-16654610
 ] 

StephenLu edited comment on HBASE-18792 at 10/18/18 4:05 AM:
-

I use hbase hbck  to found this problem
{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole. ERROR: Found 
inconsistency in table data_set_test
{code}
some table region always in opening.


was (Author: stephenlu):
I use hbase hbck  to found this problem
{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole. ERROR: Found 
inconsistency in table data_set_test
{code}

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20716) Unsafe access cleanup

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20716:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 2.0.3)
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.1+. Nice one [~awked06] Thanks for the persistence.

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: stack
>Assignee: Sahil Aggarwal
>Priority: Critical
>  Labels: beginner
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-20716.master.001.patch, 
> HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, 
> HBASE-20716.master.004.patch, HBASE-20716.master.005.patch, 
> HBASE-20716.master.006.patch, HBASE-20716.master.007.patch, 
> HBASE-20716.master.008.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654610#comment-16654610
 ] 

StephenLu commented on HBASE-18792:
---

I use hbase hbck  to found this problem
{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole. ERROR: Found 
inconsistency in table data_set_test
{code}

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654608#comment-16654608
 ] 

stack commented on HBASE-21291:
---

0 means wait forever I suppose? But I don't want to wait at all (especially if 
I have 10k regions that need bypassing). What should we do in this case?

I also notice that hbck2 calls this param waitTime but exception says lockWait. 
I need to make them match.

Thanks.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654608#comment-16654608
 ] 

stack edited comment on HBASE-21291 at 10/18/18 3:54 AM:
-

0 means wait forever I suppose? But I don't want to wait at all (especially if 
I have 10k regions that need bypassing). What should we do in this case?

I also notice that hbck2 calls this param waitTime but exception says lockWait. 
I need to make them match.

Thanks.

[~tianjingyun]


was (Author: stack):
0 means wait forever I suppose? But I don't want to wait at all (especially if 
I have 10k regions that need bypassing). What should we do in this case?

I also notice that hbck2 calls this param waitTime but exception says lockWait. 
I need to make them match.

Thanks.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654607#comment-16654607
 ] 

stack commented on HBASE-18792:
---

hbase hbck doesn't work against hbase2. It is for hbase1.

What problem are you seeing?

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-17 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21334:
--
Attachment: 
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654606#comment-16654606
 ] 

Duo Zhang commented on HBASE-21334:
---

{noformat}
2018-10-17 21:47:44,999 ERROR [Time-limited test] 
procedure2.ProcedureExecutor$2(444): Corrupt pid=22, ppid=19, state=RUNNABLE, 
hasLock=false; org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure
2018-10-17 21:47:45,007 ERROR [Time-limited test] 
procedure2.ProcedureExecutor$2(444): Corrupt pid=19, ppid=18, 
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_CLOSED, hasLock=false; 
TransitRegionStateProcedure table=testMergeWithoutPONR, 
region=3b7371ecf932aa0f7fa0b9a03df56bf2, UNASSIGN
2018-10-17 21:47:45,008 ERROR [Time-limited test] 
procedure2.ProcedureExecutor$2(444): Corrupt pid=20, ppid=18, 
state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_CLOSED, hasLock=false; 
TransitRegionStateProcedure table=testMergeWithoutPONR, 
region=55f7f23154a35661b02f91ff421b58d5, UNASSIGN
2018-10-17 21:47:45,009 ERROR [Time-limited test] 
procedure2.ProcedureExecutor$2(444): Corrupt pid=21, ppid=20, state=SUCCESS, 
hasLock=false; org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure
2018-10-17 21:47:45,010 ERROR [Time-limited test] 
procedure2.ProcedureExecutor$2(444): Corrupt pid=18, 
state=WAITING:MERGE_TABLE_REGIONS_CHECK_CLOSED_REGIONS, hasLock=false; 
MergeTableRegionsProcedure table=testMergeWithoutPONR, 
regions=[3b7371ecf932aa0f7fa0b9a03df56bf2, 55f7f23154a35661b02f91ff421b58d5], 
forcibly=false
{noformat}

It is a bit strange, we have the root procedure there(pid = 18). Let me dig 
more...

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21323) Should not skip force updating for a sub procedure even if it has been finished

2018-10-17 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654596#comment-16654596
 ] 

Guanghao Zhang commented on HBASE-21323:


+1.

> Should not skip force updating for a sub procedure even if it has been 
> finished
> ---
>
> Key: HBASE-21323
> URL: https://issues.apache.org/jira/browse/HBASE-21323
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21323.patch, HBASE-21323.patch
>
>
> Keep seeing this
> {noformat}
> 2018-10-16,20:03:02,027 WARN [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: procedure 
> WALs count=340 above the warning threshold 10. check running procedures to 
> see if something is stuck.
> 2018-10-16,20:03:02,027 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Rolled new 
> Procedure Store WAL, id=343
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=991, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=992, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=994, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=995, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 WARN [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: procedure 
> WALs count=341 above the warning threshold 10. check running procedures to 
> see if something is stuck.
> 2018-10-16,20:03:02,870 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Rolled new 
> Procedure Store WAL, id=344
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=991, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=992, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=994, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=995, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:03,816 WARN [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: procedure 
> WALs count=342 above the warning threshold 10. check running procedures to 
> see if something is stuck.
> 2018-10-16,20:03:03,816 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Rolled new 
> Procedure Store WAL, id=345
> 2018-10-16,20:03:03,816 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=991, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:03,816 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=992, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:03,816 DEBUG [Force-Update-PEWorker-0] 
> 

[jira] [Updated] (HBASE-21330) ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP at the same time

2018-10-17 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21330:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-2. Thanks [~zghaobac] for reviewing.

> ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP 
> at the same time
> --
>
> Key: HBASE-21330
> URL: https://issues.apache.org/jira/browse/HBASE-21330
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21330.patch
>
>
> The problem is that, if there are regions which already have TRSPs with them, 
> we will give up and just to REOPEN_TABLE_REGIONS_CONFIRM_REOPENED directly. 
> And in REOPEN_TABLE_REGIONS_CONFIRM_REOPENED, if there are still regions need 
> to be reopened, we will just back to REOPEN_TABLE_REGIONS_REOPEN_REGIONS 
> directly. And since ReopenTableRegionsProcedure holds the exclusive lock on 
> table, the TRSP will not have chance to execute, and the result is that, the 
> ReopenTableRegionsProcedure just keep looping in the two states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654589#comment-16654589
 ] 

Duo Zhang commented on HBASE-21334:
---

I've also seen the 'corrupted procedure(s)' on our testing cluster. Let me dig.

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-17 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21334:
-

 Summary: TestMergeTableRegionsProcedure is flakey
 Key: HBASE-21334
 URL: https://issues.apache.org/jira/browse/HBASE-21334
 Project: HBase
  Issue Type: Bug
  Components: amv2, proc-v2
Reporter: Duo Zhang


{noformat}
Error Message
found 5 corrupted procedure(s) on replay
Stacktrace
java.io.IOException: found 5 corrupted procedure(s) on replay
at 
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21330) ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP at the same time

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654582#comment-16654582
 ] 

Duo Zhang commented on HBASE-21330:
---

No, I've seen it failed several times in the past. Tried locally it passed. 
There should be other problems which cause it to fail. Let me commit.

> ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP 
> at the same time
> --
>
> Key: HBASE-21330
> URL: https://issues.apache.org/jira/browse/HBASE-21330
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21330.patch
>
>
> The problem is that, if there are regions which already have TRSPs with them, 
> we will give up and just to REOPEN_TABLE_REGIONS_CONFIRM_REOPENED directly. 
> And in REOPEN_TABLE_REGIONS_CONFIRM_REOPENED, if there are still regions need 
> to be reopened, we will just back to REOPEN_TABLE_REGIONS_REOPEN_REGIONS 
> directly. And since ReopenTableRegionsProcedure holds the exclusive lock on 
> table, the TRSP will not have chance to execute, and the result is that, the 
> ReopenTableRegionsProcedure just keep looping in the two states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21330) ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP at the same time

2018-10-17 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654580#comment-16654580
 ] 

Guanghao Zhang commented on HBASE-21330:


+1. Failed ut is related?

> ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP 
> at the same time
> --
>
> Key: HBASE-21330
> URL: https://issues.apache.org/jira/browse/HBASE-21330
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21330.patch
>
>
> The problem is that, if there are regions which already have TRSPs with them, 
> we will give up and just to REOPEN_TABLE_REGIONS_CONFIRM_REOPENED directly. 
> And in REOPEN_TABLE_REGIONS_CONFIRM_REOPENED, if there are still regions need 
> to be reopened, we will just back to REOPEN_TABLE_REGIONS_REOPEN_REGIONS 
> directly. And since ReopenTableRegionsProcedure holds the exclusive lock on 
> table, the TRSP will not have chance to execute, and the result is that, the 
> ReopenTableRegionsProcedure just keep looping in the two states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654576#comment-16654576
 ] 

StephenLu edited comment on HBASE-18792 at 10/18/18 3:27 AM:
-

when I use command 
{code:java}
hbase hbck 
{code}
{{to check hbase region,it prompt for problem}}
{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
{code}
{code:java}
.regioninfo
{code}
{{ already in hdfs dir}}
{code:java}
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566 
Found 3 items rw-rr- 2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
{code}
so I use command to fix region. 
{code:java}
hbase hbck -fixAssignments
{code}
{{but It didn't work with error}}
{code:java}
ERROR: option '-fixAssignments' is not supportted!.
{code}
How can I fix the region is not online problem?thanks.


was (Author: stephenlu):
when I use command 
{code:java}
hbase hbck 
{code}
{{}}{{to check hbase region,it prompt for problem}}
{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
{code}
{code:java}
.regioninfo
{code}
{{ already in hdfs dir}}
{code:java}
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566 
Found 3 items rw-rr- 2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
{code}
so I use command to fix region. 
{code:java}
hbase hbck -fixAssignments
{code}
{{but It didn't work with error}}
{code:java}
ERROR: option '-fixAssignments' is not supportted!.
{code}
How can I fix the region is not online problem?thanks.

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654576#comment-16654576
 ] 

StephenLu edited comment on HBASE-18792 at 10/18/18 3:25 AM:
-

when I use command 
{code:java}
hbase hbck 
{code}
{{}}{{to check hbase region,it prompt for problem}}
{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
{code}
{code:java}
.regioninfo
{code}
{{ already in hdfs dir}}
{code:java}
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566 
Found 3 items rw-rr- 2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
{code}
so I use command to fix region. 
{code:java}
hbase hbck -fixAssignments
{code}
{{but It didn't work with error}}
{code:java}
ERROR: option '-fixAssignments' is not supportted!.
{code}
How can I fix the region is not online problem?thanks.


was (Author: stephenlu):
{{when I use command }}

{{hbase hbck}}

{{to check hbase region,it prompt for problem}}


{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
{code}
 
{code:java}
.regioninfo
{code}
{{ already in hdfs dir}}
{code:java}
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566 
Found 3 items rw-rr- 2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
{code}
{{ so I use command to fix region.}}

 

 
{code:java}
hbase hbck -fixAssignments
{code}
 

{{but It didn't work with error}}
{code:java}
ERROR: option '-fixAssignments' is not supportted!.
{code}
 

 

How can I fix the region is not online problem?thanks.

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654576#comment-16654576
 ] 

StephenLu edited comment on HBASE-18792 at 10/18/18 3:24 AM:
-

{{when I use command }}

{{hbase hbck}}

{{to check hbase region,it prompt for problem}}


{code:java}
ERROR: There is a hole in the region chain between and . You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
{code}
 
{code:java}
.regioninfo
{code}
{{ already in hdfs dir}}
{code:java}
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566 
Found 3 items rw-rr- 2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
{code}
{{ so I use command to fix region.}}

 

 
{code:java}
hbase hbck -fixAssignments
{code}
 

{{but It didn't work with error}}
{code:java}
ERROR: option '-fixAssignments' is not supportted!.
{code}
 

 

How can I fix the region is not online problem?thanks.


was (Author: stephenlu):
when I use command ```hbase  hbck ``` to check hbase region,it prompt for 
problem
```
ERROR: There is a hole in the region chain between  and .  You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
```
```.regioninfo``` already in hdfs dir
```
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566
Found 3 items
-rw-r--r--   2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
```
so I use command to fix region hbase hbck -fixAssignments.but It didn't work 
with error ```ERROR: option '-fixAssignments' is not supportted!.```

How can I fix the region is not online problem?thanks.

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18792) hbase-2 needs to defend against hbck operations

2018-10-17 Thread StephenLu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654576#comment-16654576
 ] 

StephenLu commented on HBASE-18792:
---

when I use command ```hbase  hbck ``` to check hbase region,it prompt for 
problem
```
ERROR: There is a hole in the region chain between  and .  You need to create a 
new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table data_set_test
```
```.regioninfo``` already in hdfs dir
```
hdfs dfs -ls /hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566
Found 3 items
-rw-r--r--   2 hadoop supergroup 48 2018-10-11 16:51 
/hbase/data/default/data_set_test/69db2c240bb89b2131de1d15e3c71566/.regioninfo
```
so I use command to fix region hbase hbck -fixAssignments.but It didn't work 
with error ```ERROR: option '-fixAssignments' is not supportted!.```

How can I fix the region is not online problem?thanks.

> hbase-2 needs to defend against hbck operations
> ---
>
> Key: HBASE-18792
> URL: https://issues.apache.org/jira/browse/HBASE-18792
> Project: HBase
>  Issue Type: Task
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18792.branch-1.001.patch, 
> hbase-18792.master.001.patch, hbase-18792.master.002.patch
>
>
> hbck needs updating to run against hbase2. Meantime, if an hbck from hbase1 
> is run against hbck2, it may do damage. hbase2 should defend itself against 
> hbck1 ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654574#comment-16654574
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #21 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/21/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/21//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/21//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/21//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654561#comment-16654561
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] hbase-operator-tools set the lockwait to 0 if we don't set any value 
for it. Let me change the default value for lockWait of hbase-operator-tools?
{code}
long waitTime = 0;
if (commandLine.hasOption(wait.getOpt())) {
  waitTime = Integer.valueOf(commandLine.getOptionValue(wait.getOpt()));
  waitTime *= 1000; // Because time is in seconds.
}
{code}

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654549#comment-16654549
 ] 

Hadoop QA commented on HBASE-19682:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
38s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} hbase-server: The patch generated 0 new + 157 
unchanged - 1 fixed = 157 total (was 158) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m  6s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}201m 16s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}265m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaSameHosts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-19682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12916901/HBASE-19682.5.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 38d8f3a2a5fe 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8cb28ce4b9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654551#comment-16654551
 ] 

stack commented on HBASE-21291:
---

i.e. previously, I did not have to pass a lockWait value...

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654548#comment-16654548
 ] 

stack commented on HBASE-21291:
---

[~tianjingyun] With this patch applied,  now when I do bypass it does the 
below

{code}
18/10/17 19:36:20 ERROR client.HBaseHbck: 2441732
org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: lockWait should be positive
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.lang.IllegalArgumentException: lockWait should be positive
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1050)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1043)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.bypassProcedure(MasterRpcServices.java:2421)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
... 3 more

at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.bypassProcedure(MasterProtos.java)
at org.apache.hadoop.hbase.client.HBaseHbck$1.call(HBaseHbck.java:145)
at org.apache.hadoop.hbase.client.HBaseHbck$1.call(HBaseHbck.java:141)
at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.call(ProtobufUtil.java:2945)
at 
org.apache.hadoop.hbase.client.HBaseHbck.bypassProcedure(HBaseHbck.java:140)
at org.apache.hbase.HBCK2.bypass(HBCK2.java:183)
at org.apache.hbase.HBCK2.run(HBCK2.java:342)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hbase.HBCK2.main(HBCK2.java:389)
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: lockWait should be positive
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.lang.IllegalArgumentException: lockWait should be positive
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1050)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1043)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.bypassProcedure(MasterRpcServices.java:2421)
...
{code}

That what you expect sir?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
>  

[jira] [Commented] (HBASE-21323) Should not skip force updating for a sub procedure even if it has been finished

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654545#comment-16654545
 ] 

Hadoop QA commented on HBASE-21323:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
19s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
23s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m  2s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
4s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21323 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944459/HBASE-21323.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 0fe8986b67e5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3a75505cf2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14737/testReport/ |
| Max. process+thread count | 273 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14737/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Should not skip force 

[jira] [Commented] (HBASE-20716) Unsafe access cleanup

2018-10-17 Thread Sahil Aggarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654544#comment-16654544
 ] 

Sahil Aggarwal commented on HBASE-20716:


[~stack] Good to go if you don't have any comments/suggestions. 

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: stack
>Assignee: Sahil Aggarwal
>Priority: Critical
>  Labels: beginner
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-20716.master.001.patch, 
> HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, 
> HBASE-20716.master.004.patch, HBASE-20716.master.005.patch, 
> HBASE-20716.master.006.patch, HBASE-20716.master.007.patch, 
> HBASE-20716.master.008.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21073) "Maintenance mode" master

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654536#comment-16654536
 ] 

stack commented on HBASE-21073:
---

bq. ...but not sure where to add them in the ref guide. 

Operator Chapter? Its own little section?

> "Maintenance mode" master
> -
>
> Key: HBASE-21073
> URL: https://issues.apache.org/jira/browse/HBASE-21073
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2, master
>Reporter: stack
>Assignee: Mike Drob
>Priority: Major
> Attachments: HBASE-21073.master.001.patch, 
> HBASE-21073.master.002.patch, HBASE-21073.master.003.patch, 
> HBASE-21073.master.004.patch, HBASE-21073.master.005.patch, 
> HBASE-21073.master.006.patch, HBASE-21073.master.007.patch, 
> HBASE-21073.master.008.patch, HBASE-21073.master.009.patch
>
>
> Make it so we can bring up a Master in "maintenance mode". This is parse of 
> master wal procs but not taking on regionservers. It would be in a state 
> where "repair" Procedures could run; e.g. a Procedure that could recover meta 
> by looking for meta WALs, splitting them, dropping recovered.edits, and even 
> making it so meta is readable. See parent issue for why needed (disaster 
> recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-10-17 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19682:

Attachment: HBASE-19682.6.patch

> Use Collections.emptyList() For Empty List Values
> -
>
> Key: HBASE-19682
> URL: https://issues.apache.org/jira/browse/HBASE-19682
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19682.1.patch, HBASE-19682.2.patch, 
> HBASE-19682.3.1.patch, HBASE-19682.4.patch, HBASE-19682.5.patch, 
> HBASE-19682.6.patch, example.patch
>
>
> Use {{Collection.emptyList()}} for returning an empty list instead of 
> {{return new ArrayList<> ()}}.  The default constructor creates a buffer of 
> size 10 for _ArrayList_ therefore, returning this static value saves on some 
> memory and GC pressure and saves time not having to allocate a new internally 
> buffer for each instantiation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-10-17 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19682:

Status: Patch Available  (was: Open)

Re-test patch

> Use Collections.emptyList() For Empty List Values
> -
>
> Key: HBASE-19682
> URL: https://issues.apache.org/jira/browse/HBASE-19682
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19682.1.patch, HBASE-19682.2.patch, 
> HBASE-19682.3.1.patch, HBASE-19682.4.patch, HBASE-19682.5.patch, 
> HBASE-19682.6.patch, example.patch
>
>
> Use {{Collection.emptyList()}} for returning an empty list instead of 
> {{return new ArrayList<> ()}}.  The default constructor creates a buffer of 
> size 10 for _ArrayList_ therefore, returning this static value saves on some 
> memory and GC pressure and saves time not having to allocate a new internally 
> buffer for each instantiation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-10-17 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19682:

Status: Open  (was: Patch Available)

> Use Collections.emptyList() For Empty List Values
> -
>
> Key: HBASE-19682
> URL: https://issues.apache.org/jira/browse/HBASE-19682
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19682.1.patch, HBASE-19682.2.patch, 
> HBASE-19682.3.1.patch, HBASE-19682.4.patch, HBASE-19682.5.patch, example.patch
>
>
> Use {{Collection.emptyList()}} for returning an empty list instead of 
> {{return new ArrayList<> ()}}.  The default constructor creates a buffer of 
> size 10 for _ArrayList_ therefore, returning this static value saves on some 
> memory and GC pressure and saves time not having to allocate a new internally 
> buffer for each instantiation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654526#comment-16654526
 ] 

Duo Zhang commented on HBASE-21246:
---

So maybe we should add more comments to say that this method is temporary, and 
will be removed after the whole system refactoring has done?

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-10-17 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654524#comment-16654524
 ] 

Josh Elser commented on HBASE-21246:


{quote}in the replication tracking system, we need to serialize the WALIdentity?
{quote}
Precisely.
{quote}we plan to let the WAL system to provide a subscribe/consume style API, 
and maybe we could abstract concepts other than the wal file
{quote}
That's something we would want to enable, but I don't think we expect to build 
fully. That'd be a massive undertaking on top of a massive undertaking already. 
Having looked into this, I think we would want to just be making changes to 
ReplicationSourceWALReader and leave the rest of the control-flow the same. I 
know that Ted is thinking about this now.
{quote}For regionserver, it only needs the WAL instance? And for replaying 
recover edits for a region, I think our decision is to provide a method in the 
WAL system API to get a stream of the recovered edits for a region?
{quote}
Right and right :)
{quote}I do not think the upper layer needs to know whether we have multiple 
wal files, or multiple topics or something other such things...
{quote}
If I'm interpreting correctly, yes. Really, all parts of hbase don't need to 
know about the physical storage details (much like Multi-WAL did compared to 
FSHLog).

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21322) Add a scheduleServerCrashProcedure() API to HbckService

2018-10-17 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654523#comment-16654523
 ] 

Jingyun Tian commented on HBASE-21322:
--

[~stack] Let me add a tool for this, we need a tool to be our last choice.

> Add a scheduleServerCrashProcedure() API to HbckService
> ---
>
> Key: HBASE-21322
> URL: https://issues.apache.org/jira/browse/HBASE-21322
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: Screenshot from 2018-10-17 13-35-58.png, Screenshot from 
> 2018-10-17 13-38-41.png, Screenshot from 2018-10-17 13-47-06.png
>
>
> According to my test, if one RS is down, then all procedure logs are deleted, 
> it will lead to that no ServerCrashProcedure is scheduled. And restarting 
> master cannot help. Thus we need to schedule a ServerCrashProcedure manually 
> to solve the problem. I plan to add a scheduleServerCrashProcedure() API to 
> HbckService, then add this API to HBCK2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21323) Should not skip force updating for a sub procedure even if it has been finished

2018-10-17 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21323:
--
Attachment: HBASE-21323.patch

> Should not skip force updating for a sub procedure even if it has been 
> finished
> ---
>
> Key: HBASE-21323
> URL: https://issues.apache.org/jira/browse/HBASE-21323
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21323.patch, HBASE-21323.patch
>
>
> Keep seeing this
> {noformat}
> 2018-10-16,20:03:02,027 WARN [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: procedure 
> WALs count=340 above the warning threshold 10. check running procedures to 
> see if something is stuck.
> 2018-10-16,20:03:02,027 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Rolled new 
> Procedure Store WAL, id=343
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=991, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=992, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=994, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,027 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=995, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 WARN [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: procedure 
> WALs count=341 above the warning threshold 10. check running procedures to 
> see if something is stuck.
> 2018-10-16,20:03:02,870 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Rolled new 
> Procedure Store WAL, id=344
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=991, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=992, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=994, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:02,870 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=995, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:03,816 WARN [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: procedure 
> WALs count=342 above the warning threshold 10. check running procedures to 
> see if something is stuck.
> 2018-10-16,20:03:03,816 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Rolled new 
> Procedure Store WAL, id=345
> 2018-10-16,20:03:03,816 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=991, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:03,816 DEBUG [Force-Update-PEWorker-0] 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Procedure pid=992, 
> ppid=990, state=SUCCESS; 
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure has already 
> been finished, skip force updating.
> 2018-10-16,20:03:03,816 DEBUG [Force-Update-PEWorker-0] 
> 

[jira] [Commented] (HBASE-21073) "Maintenance mode" master

2018-10-17 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654513#comment-16654513
 ] 

Mike Drob commented on HBASE-21073:
---

v9: updated based on Stack's feedback on RB.

Going to write up some instructional notes but not sure where to add them in 
the ref guide. looked at troubleshooting section and didn't see a good place to 
slot it in. Any other suggestions?

> "Maintenance mode" master
> -
>
> Key: HBASE-21073
> URL: https://issues.apache.org/jira/browse/HBASE-21073
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2, master
>Reporter: stack
>Assignee: Mike Drob
>Priority: Major
> Attachments: HBASE-21073.master.001.patch, 
> HBASE-21073.master.002.patch, HBASE-21073.master.003.patch, 
> HBASE-21073.master.004.patch, HBASE-21073.master.005.patch, 
> HBASE-21073.master.006.patch, HBASE-21073.master.007.patch, 
> HBASE-21073.master.008.patch, HBASE-21073.master.009.patch
>
>
> Make it so we can bring up a Master in "maintenance mode". This is parse of 
> master wal procs but not taking on regionservers. It would be in a state 
> where "repair" Procedures could run; e.g. a Procedure that could recover meta 
> by looking for meta WALs, splitting them, dropping recovered.edits, and even 
> making it so meta is readable. See parent issue for why needed (disaster 
> recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21322) Add a scheduleServerCrashProcedure() API to HbckService

2018-10-17 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654508#comment-16654508
 ] 

Jingyun Tian commented on HBASE-21322:
--

{quote}
Do you delete while the Master is up?
{quote}
Yes.
bq. You mean, have Master fail over?

Yes. I restarted the master and no SCP is scheduled for -splitting items. Seems 
Master only check the procedure logs.

> Add a scheduleServerCrashProcedure() API to HbckService
> ---
>
> Key: HBASE-21322
> URL: https://issues.apache.org/jira/browse/HBASE-21322
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: Screenshot from 2018-10-17 13-35-58.png, Screenshot from 
> 2018-10-17 13-38-41.png, Screenshot from 2018-10-17 13-47-06.png
>
>
> According to my test, if one RS is down, then all procedure logs are deleted, 
> it will lead to that no ServerCrashProcedure is scheduled. And restarting 
> master cannot help. Thus we need to schedule a ServerCrashProcedure manually 
> to solve the problem. I plan to add a scheduleServerCrashProcedure() API to 
> HbckService, then add this API to HBCK2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21269) Forward-port to branch-2 " HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign"

2018-10-17 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654506#comment-16654506
 ] 

Jingyun Tian commented on HBASE-21269:
--

[~stack] Failed UTs are not related to my patch. Could you help commit this 
patch?

> Forward-port to branch-2 " HBASE-21213 [hbck2] bypass leaves behind state 
> in RegionStates when assign/unassign"
> ---
>
> Key: HBASE-21269
> URL: https://issues.apache.org/jira/browse/HBASE-21269
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21269.branch-2.001.patch, 
> HBASE-21269.master.001.patch, HBASE-21269.master.003.patch, 
> HBASE-21269.master.004.patch
>
>
> A bunch of this patch does not apply to branch-2 and master now we don't have 
> AP or UP anymore. Need to figure if we need override in branch-2 and master. 
> Let me upload the forward-port done so far. Can finish this when move to 
> branch-2.2 exercise. FYI [~Apache9]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21307) [amv2] Deadlock when we move a Region from a not-online RegionServer

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654500#comment-16654500
 ] 

Duo Zhang commented on HBASE-21307:
---

I was focusing on TRSP recently and I think the fencing for TRSP is fine. But 
for branch-2.1 and branch-2.0 it maybe a different story, as the MRP can not be 
interrupted by SCP but it will schedule UP and AP, so I'm afraid leaving the 
RTP there can not always work... One simple solution is just let it go, without 
any fencing, but I'm afraid there will be other problem if we still keep 
scheduling MRP and SCP...

Anyway, I do not think there will be a perfect solution for this scenario, as 
in the normal code path, we do not code for this scenario... Maybe we need to 
make use of the 'maintenance mode'? Where we do not load any existing 
procedures, and only HBCK2 can schedule new procedures, this maybe the safest 
way?

> [amv2] Deadlock when we move a Region from a not-online RegionServer
> 
>
> Key: HBASE-21307
> URL: https://issues.apache.org/jira/browse/HBASE-21307
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.1.1
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.1.1
>
>
> Perhaps this doesn't happen in branch-2, but its problem in branch-2.1.
> Highlevel, we go to move a region, its unassign subprocedure fails its 
> dispatch because the server is not online so it queues a SCP and waits on it 
> to break the RPC. The SCP can't run though because the MRP holds lock on the 
> region.
> I can bypass the MRP but then the SCP fails because Region is 'owned' by the 
> MRP. See below:
> {code}
> 2018-10-12 16:29:53,423 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Begin bypass 
> pid=411982, ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 with lockWait=0, 
> override=true, recursive=true
> 2018-10-12 16:29:53,424 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=411982, 
> ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649
> 2018-10-12 16:29:53,712 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=411981, 
> state=WAITING:MOVE_REGION_ASSIGN, locked=true; MoveRegionProcedure 
> hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897
> 2018-10-12 16:29:53,838 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=411982, 
> ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 and its ancestors 
> successfully, adding to queue
> 2018-10-12 16:29:53,839 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=411982, ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true, bypass=LOG-REDACTED UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 bypassed, returning null 
> to finish it
> 2018-10-12 16:29:53,954 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished subprocedure 
> pid=411982, resume processing parent pid=411981, 
> state=RUNNABLE:MOVE_REGION_ASSIGN, locked=true, bypass=LOG-REDACTED 
> MoveRegionProcedure hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897
> 2018-10-12 16:29:53,954 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=411981, state=RUNNABLE:MOVE_REGION_ASSIGN, locked=true, 
> bypass=LOG-REDACTED MoveRegionProcedure hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897 bypassed, returning 
> null to finish it
> 2018-10-12 16:29:53,956 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=411982, 
> ppid=411981, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> 

[jira] [Updated] (HBASE-21073) "Maintenance mode" master

2018-10-17 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-21073:
--
Attachment: HBASE-21073.master.009.patch

> "Maintenance mode" master
> -
>
> Key: HBASE-21073
> URL: https://issues.apache.org/jira/browse/HBASE-21073
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2, master
>Reporter: stack
>Assignee: Mike Drob
>Priority: Major
> Attachments: HBASE-21073.master.001.patch, 
> HBASE-21073.master.002.patch, HBASE-21073.master.003.patch, 
> HBASE-21073.master.004.patch, HBASE-21073.master.005.patch, 
> HBASE-21073.master.006.patch, HBASE-21073.master.007.patch, 
> HBASE-21073.master.008.patch, HBASE-21073.master.009.patch
>
>
> Make it so we can bring up a Master in "maintenance mode". This is parse of 
> master wal procs but not taking on regionservers. It would be in a state 
> where "repair" Procedures could run; e.g. a Procedure that could recover meta 
> by looking for meta WALs, splitting them, dropping recovered.edits, and even 
> making it so meta is readable. See parent issue for why needed (disaster 
> recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654491#comment-16654491
 ] 

Duo Zhang commented on HBASE-21246:
---

So maybe another problem is that, where do we need the WALIdentity outside the 
WAL system? For regionserver, it only needs the WAL instance? And for replaying 
recover edits for a region, I think our decision is to provide a method in the 
WAL system API to get a stream of the recovered edits for a region? FWIW, I do 
not think the upper layer needs to know whether we have multiple wal files, or 
multiple topics or something other such things...

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-10-17 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654485#comment-16654485
 ] 

Duo Zhang commented on HBASE-21246:
---

So the point here is that, FWIW, in the replication tracking system, we need to 
serialize the WALIdentity? IIRC, for replication, we plan to let the WAL system 
to provide a subscribe/consume style API, and maybe we could abstract concepts 
other than the wal file. Maybe for the wals for a region server, we can have 
different queues, and for each queue, we store the replicated offset. The 
identifier for the queue could simply be a String?

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21298) Improved scheduling ("Muzzled HBase")

2018-10-17 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654483#comment-16654483
 ] 

Andrew Purtell commented on HBASE-21298:


Looks interesting, but complex reading. I've only skimmed it at this point.

The modeling tool has not been released yet, but the authors claim it will be 
dropped soon to http://research.cs.wisc.edu/adsl/Software/TAM. Should this 
become available I would be interested in trying to reproduce these results, 
with a focus on branch-1, though.

No modified HBase source has been made available to my knowledge.

> Improved scheduling ("Muzzled HBase")
> -
>
> Key: HBASE-21298
> URL: https://issues.apache.org/jira/browse/HBASE-21298
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Priority: Major
>
> Interesting paper on Thread scheduling with hbase for illustration. After 
> study using interesting Thread analysis tooling, authors were able to improve 
> HBase throughputs ("Muzzled-HBase").
> https://www.usenix.org/system/files/osdi18-yang.pdf
> (Via [~ebort...@oath.com])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654473#comment-16654473
 ] 

Hudson commented on HBASE-21198:


Results for branch branch-2.1
[build #480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Exclude dependency on net.minidev:json-smart
> 
>
> Key: HBASE-21198
> URL: https://issues.apache.org/jira/browse/HBASE-21198
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21198.v01.patch, HBASE-21198.v01.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
>  :
> {code}
> [ERROR] Failed to execute goal on project hbase-common: Could not resolve 
> dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
> Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 
> -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor 
> for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
> net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
> (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied 
> to: 
> https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
>  , ReasonPhrase:Forbidden. -> [Help 1]
> {code}
> We should exclude dependency on net.minidev:json-smart
> hbase-common/bin/pom.xml has done so.
> The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21311) Split TestRestoreSnapshotFromClient

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654472#comment-16654472
 ] 

Hudson commented on HBASE-21311:


Results for branch branch-2.1
[build #480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Split TestRestoreSnapshotFromClient
> ---
>
> Key: HBASE-21311
> URL: https://issues.apache.org/jira/browse/HBASE-21311
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21311.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21327) Fix minor logging issue where we don't report servername if no associated SCP

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654470#comment-16654470
 ] 

Hudson commented on HBASE-21327:


Results for branch branch-2.1
[build #480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Fix minor logging issue where we don't report servername if no associated SCP
> -
>
> Key: HBASE-21327
> URL: https://issues.apache.org/jira/browse/HBASE-21327
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Trivial
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21327.branch-2.1.001.patch
>
>
> When reporting on whether an associated SCP, this is what we log:
> {code}
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: true has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> {code}
> i.e. we are supposed to log servername but we don't. Minor issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21310) Split TestCloneSnapshotFromClient

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654471#comment-16654471
 ] 

Hudson commented on HBASE-21310:


Results for branch branch-2.1
[build #480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/480//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Split TestCloneSnapshotFromClient
> -
>
> Key: HBASE-21310
> URL: https://issues.apache.org/jira/browse/HBASE-21310
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21310-branch-2.1-addendum.patch, 
> HBASE-21310-v1.patch, HBASE-21310.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21279:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Artem.

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, HBASE-21279.v05.patch, 
> testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654362#comment-16654362
 ] 

Hudson commented on HBASE-21198:


Results for branch branch-2
[build #1404 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Exclude dependency on net.minidev:json-smart
> 
>
> Key: HBASE-21198
> URL: https://issues.apache.org/jira/browse/HBASE-21198
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21198.v01.patch, HBASE-21198.v01.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
>  :
> {code}
> [ERROR] Failed to execute goal on project hbase-common: Could not resolve 
> dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
> Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 
> -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor 
> for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
> net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
> (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied 
> to: 
> https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
>  , ReasonPhrase:Forbidden. -> [Help 1]
> {code}
> We should exclude dependency on net.minidev:json-smart
> hbase-common/bin/pom.xml has done so.
> The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21327) Fix minor logging issue where we don't report servername if no associated SCP

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654361#comment-16654361
 ] 

Hudson commented on HBASE-21327:


Results for branch branch-2
[build #1404 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1404//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Fix minor logging issue where we don't report servername if no associated SCP
> -
>
> Key: HBASE-21327
> URL: https://issues.apache.org/jira/browse/HBASE-21327
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Trivial
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21327.branch-2.1.001.patch
>
>
> When reporting on whether an associated SCP, this is what we log:
> {code}
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: true has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> {code}
> i.e. we are supposed to log servername but we don't. Minor issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21310) Split TestCloneSnapshotFromClient

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654340#comment-16654340
 ] 

Hudson commented on HBASE-21310:


Results for branch branch-2.0
[build #965 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Split TestCloneSnapshotFromClient
> -
>
> Key: HBASE-21310
> URL: https://issues.apache.org/jira/browse/HBASE-21310
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21310-branch-2.1-addendum.patch, 
> HBASE-21310-v1.patch, HBASE-21310.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21327) Fix minor logging issue where we don't report servername if no associated SCP

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654339#comment-16654339
 ] 

Hudson commented on HBASE-21327:


Results for branch branch-2.0
[build #965 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Fix minor logging issue where we don't report servername if no associated SCP
> -
>
> Key: HBASE-21327
> URL: https://issues.apache.org/jira/browse/HBASE-21327
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Trivial
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21327.branch-2.1.001.patch
>
>
> When reporting on whether an associated SCP, this is what we log:
> {code}
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,607 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: true has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> 2018-10-16 14:05:57,608 ERROR 
> org.apache.hadoop.hbase.master.RegionServerTracker: false has no matching 
> ServerCrashProcedure
> {code}
> i.e. we are supposed to log servername but we don't. Minor issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21311) Split TestRestoreSnapshotFromClient

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654341#comment-16654341
 ] 

Hudson commented on HBASE-21311:


Results for branch branch-2.0
[build #965 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/965//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Split TestRestoreSnapshotFromClient
> ---
>
> Key: HBASE-21311
> URL: https://issues.apache.org/jira/browse/HBASE-21311
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21311.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21281) Update bouncycastle dependency.

2018-10-17 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21281:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed addendum.
Thanks Josh for review.

> Update bouncycastle dependency.
> ---
>
> Key: HBASE-21281
> URL: https://issues.apache.org/jira/browse/HBASE-21281
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, test
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21281.addendum.patch, HBASE-21281.001.branch-2.0.patch
>
>
> Looks like we still depend on bcprov-jdk16 for some x509 certificate 
> generation in our tests. Bouncycastle has moved beyond this in 1.47, changing 
> the artifact names.
> [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later]
> There are some API changes too, but it looks like we don't use any of these.
> It seems like we also have vestiges in the POMs from when we were depending 
> on a specific BC version that came in from Hadoop. We now have a 
> KeyStoreTestUtil class in HBase, which makes me think we can also clean up 
> some dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21281) Update bouncycastle dependency.

2018-10-17 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654332#comment-16654332
 ] 

Josh Elser commented on HBASE-21281:


+1 on the addendum. Verified we have an explicit test-dependency on 
bcprov-jdk15on in hbase-server now. Want to push [~yuzhih...@gmail.com] or 
should I?

> Update bouncycastle dependency.
> ---
>
> Key: HBASE-21281
> URL: https://issues.apache.org/jira/browse/HBASE-21281
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, test
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21281.addendum.patch, HBASE-21281.001.branch-2.0.patch
>
>
> Looks like we still depend on bcprov-jdk16 for some x509 certificate 
> generation in our tests. Bouncycastle has moved beyond this in 1.47, changing 
> the artifact names.
> [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later]
> There are some API changes too, but it looks like we don't use any of these.
> It seems like we also have vestiges in the POMs from when we were depending 
> on a specific BC version that came in from Hadoop. We now have a 
> KeyStoreTestUtil class in HBase, which makes me think we can also clean up 
> some dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21281) Update bouncycastle dependency.

2018-10-17 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654330#comment-16654330
 ] 

Ted Yu commented on HBASE-21281:


Thanks for checking, Josh.

Will integrate addendum later today.

> Update bouncycastle dependency.
> ---
>
> Key: HBASE-21281
> URL: https://issues.apache.org/jira/browse/HBASE-21281
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, test
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21281.addendum.patch, HBASE-21281.001.branch-2.0.patch
>
>
> Looks like we still depend on bcprov-jdk16 for some x509 certificate 
> generation in our tests. Bouncycastle has moved beyond this in 1.47, changing 
> the artifact names.
> [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later]
> There are some API changes too, but it looks like we don't use any of these.
> It seems like we also have vestiges in the POMs from when we were depending 
> on a specific BC version that came in from Hadoop. We now have a 
> KeyStoreTestUtil class in HBase, which makes me think we can also clean up 
> some dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21281) Update bouncycastle dependency.

2018-10-17 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654325#comment-16654325
 ] 

Josh Elser commented on HBASE-21281:


Had to go digging into this because your explanation didn't actually tell me 
what the cause was, [~yuzhih...@gmail.com]. The root cause is that *both* 
Hadoop 2 and Hadoop 3 don't explicitly depend on bouncycastle in hbase-server 
like they should.

Circumstantially, on hadoop 2, this doesn't cause an error because we happen to 
get the dependency via kerb-client:
{noformat}
[INFO] +- org.apache.kerby:kerb-client:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerby-config:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerb-core:jar:1.0.1:test
[INFO] |  |  \- org.apache.kerby:kerby-pkix:jar:1.0.1:test
[INFO] |  | +- org.apache.kerby:kerby-asn1:jar:1.0.1:test
[INFO] |  | \- org.apache.kerby:kerby-util:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerb-common:jar:1.0.1:test
[INFO] |  |  \- org.apache.kerby:kerb-crypto:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerb-util:jar:1.0.1:test
[INFO] |  \- org.apache.kerby:token-provider:jar:1.0.1:test
[INFO] | \- com.nimbusds:nimbus-jose-jwt:jar:3.10:test
[INFO] |    +- net.jcip:jcip-annotations:jar:1.0:test
[INFO] |    +- net.minidev:json-smart:jar:1.3.1:test
[INFO] |    \- org.bouncycastle:bcprov-jdk15on:jar:1.60:test
{noformat}
In Hadoop 3, we can see this dependency missing:
{noformat}
[INFO] +- org.apache.kerby:kerb-client:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerby-config:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerb-core:jar:1.0.1:test
[INFO] |  |  \- org.apache.kerby:kerby-pkix:jar:1.0.1:test
[INFO] |  | +- org.apache.kerby:kerby-asn1:jar:1.0.1:test
[INFO] |  | \- org.apache.kerby:kerby-util:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerb-common:jar:1.0.1:test
[INFO] |  |  \- org.apache.kerby:kerb-crypto:jar:1.0.1:test
[INFO] |  +- org.apache.kerby:kerb-util:jar:1.0.1:test
[INFO] |  \- org.apache.kerby:token-provider:jar:1.0.1:test
{noformat}
All that said, the addendum (explicitly adding the test-dependency) is the 
right one, but the path to that addendum was unnecessarily muddy.

> Update bouncycastle dependency.
> ---
>
> Key: HBASE-21281
> URL: https://issues.apache.org/jira/browse/HBASE-21281
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, test
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21281.addendum.patch, HBASE-21281.001.branch-2.0.patch
>
>
> Looks like we still depend on bcprov-jdk16 for some x509 certificate 
> generation in our tests. Bouncycastle has moved beyond this in 1.47, changing 
> the artifact names.
> [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later]
> There are some API changes too, but it looks like we don't use any of these.
> It seems like we also have vestiges in the POMs from when we were depending 
> on a specific BC version that came in from Hadoop. We now have a 
> KeyStoreTestUtil class in HBase, which makes me think we can also clean up 
> some dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654289#comment-16654289
 ] 

Hadoop QA commented on HBASE-21279:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
34s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m 
11s{color} | {color:red} The patch generated 79 new + 221 unchanged - 81 fixed 
= 300 total (was 302) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m 23s{color} | {color:orange} The patch generated 185 new + 418 unchanged - 
184 fixed = 603 total (was 602) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
42s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
17m 57s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
1s{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21279 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944416/HBASE-21279.v05.patch 
|
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  rubocop  ruby_lint  |
| uname | Linux 16698f63d6a8 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8cb28ce4b9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| rubocop | v0.59.2 |
| rubocop | 

[jira] [Commented] (HBASE-21333) [amv2] large cluster startup is slow

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654253#comment-16654253
 ] 

stack commented on HBASE-21333:
---

Checking, we're doing a steady 40 regions a second over hours. Should be easy 
enough going up from here.

> [amv2] large cluster startup is slow
> 
>
> Key: HBASE-21333
> URL: https://issues.apache.org/jira/browse/HBASE-21333
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Priority: Major
> Attachments: 2.1.1.129578.alloc.svg, 2.1.1.129578.cpu.svg, 
> 2.1.1.129578.lock.svg
>
>
> Testing startup of cluster with 500+ nodes and .5M regions takes a few hours.
> This is a 2.1.x cluster with batching disabled.
> Looking at what the Master is doing, its mostly just parsing regionserver 
> reports.
> Stats to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21333) [amv2] large cluster startup is slow

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654240#comment-16654240
 ] 

stack commented on HBASE-21333:
---

CPU about 1/4 way through an assign:
 
[^2.1.1.129578.cpu.svg] 

Its all processing regionserver reports doing lookups in CSLMs. Will try 
enabling batching to see if difference.

> [amv2] large cluster startup is slow
> 
>
> Key: HBASE-21333
> URL: https://issues.apache.org/jira/browse/HBASE-21333
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Priority: Major
> Attachments: 2.1.1.129578.alloc.svg, 2.1.1.129578.cpu.svg, 
> 2.1.1.129578.lock.svg
>
>
> Testing startup of cluster with 500+ nodes and .5M regions takes a few hours.
> This is a 2.1.x cluster with batching disabled.
> Looking at what the Master is doing, its mostly just parsing regionserver 
> reports.
> Stats to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21333) [amv2] large cluster startup is slow

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21333:
--
Attachment: 2.1.1.129578.lock.svg

> [amv2] large cluster startup is slow
> 
>
> Key: HBASE-21333
> URL: https://issues.apache.org/jira/browse/HBASE-21333
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Priority: Major
> Attachments: 2.1.1.129578.alloc.svg, 2.1.1.129578.cpu.svg, 
> 2.1.1.129578.lock.svg
>
>
> Testing startup of cluster with 500+ nodes and .5M regions takes a few hours.
> This is a 2.1.x cluster with batching disabled.
> Looking at what the Master is doing, its mostly just parsing regionserver 
> reports.
> Stats to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21333) [amv2] large cluster startup is slow

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21333:
--
Attachment: 2.1.1.129578.alloc.svg

> [amv2] large cluster startup is slow
> 
>
> Key: HBASE-21333
> URL: https://issues.apache.org/jira/browse/HBASE-21333
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Priority: Major
> Attachments: 2.1.1.129578.alloc.svg, 2.1.1.129578.cpu.svg, 
> 2.1.1.129578.lock.svg
>
>
> Testing startup of cluster with 500+ nodes and .5M regions takes a few hours.
> This is a 2.1.x cluster with batching disabled.
> Looking at what the Master is doing, its mostly just parsing regionserver 
> reports.
> Stats to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21333) [amv2] large cluster startup is slow

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21333:
--
Attachment: 2.1.1.129578.cpu.svg

> [amv2] large cluster startup is slow
> 
>
> Key: HBASE-21333
> URL: https://issues.apache.org/jira/browse/HBASE-21333
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Priority: Major
> Attachments: 2.1.1.129578.alloc.svg, 2.1.1.129578.cpu.svg, 
> 2.1.1.129578.lock.svg
>
>
> Testing startup of cluster with 500+ nodes and .5M regions takes a few hours.
> This is a 2.1.x cluster with batching disabled.
> Looking at what the Master is doing, its mostly just parsing regionserver 
> reports.
> Stats to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21333) [amv2] large cluster startup is slow

2018-10-17 Thread stack (JIRA)
stack created HBASE-21333:
-

 Summary: [amv2] large cluster startup is slow
 Key: HBASE-21333
 URL: https://issues.apache.org/jira/browse/HBASE-21333
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.1.0
Reporter: stack


Testing startup of cluster with 500+ nodes and .5M regions takes a few hours.

This is a 2.1.x cluster with batching disabled.

Looking at what the Master is doing, its mostly just parsing regionserver 
reports.

Stats to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21307) [amv2] Deadlock when we move a Region from a not-online RegionServer

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654227#comment-16654227
 ] 

stack commented on HBASE-21307:
---

[~Apache9] You familiar w/ the scenario above? Thanks.

> [amv2] Deadlock when we move a Region from a not-online RegionServer
> 
>
> Key: HBASE-21307
> URL: https://issues.apache.org/jira/browse/HBASE-21307
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.1.1
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.1.1
>
>
> Perhaps this doesn't happen in branch-2, but its problem in branch-2.1.
> Highlevel, we go to move a region, its unassign subprocedure fails its 
> dispatch because the server is not online so it queues a SCP and waits on it 
> to break the RPC. The SCP can't run though because the MRP holds lock on the 
> region.
> I can bypass the MRP but then the SCP fails because Region is 'owned' by the 
> MRP. See below:
> {code}
> 2018-10-12 16:29:53,423 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Begin bypass 
> pid=411982, ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 with lockWait=0, 
> override=true, recursive=true
> 2018-10-12 16:29:53,424 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=411982, 
> ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649
> 2018-10-12 16:29:53,712 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=411981, 
> state=WAITING:MOVE_REGION_ASSIGN, locked=true; MoveRegionProcedure 
> hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897
> 2018-10-12 16:29:53,838 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=411982, 
> ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 and its ancestors 
> successfully, adding to queue
> 2018-10-12 16:29:53,839 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=411982, ppid=411981, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true, bypass=LOG-REDACTED UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 bypassed, returning null 
> to finish it
> 2018-10-12 16:29:53,954 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished subprocedure 
> pid=411982, resume processing parent pid=411981, 
> state=RUNNABLE:MOVE_REGION_ASSIGN, locked=true, bypass=LOG-REDACTED 
> MoveRegionProcedure hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897
> 2018-10-12 16:29:53,954 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=411981, state=RUNNABLE:MOVE_REGION_ASSIGN, locked=true, 
> bypass=LOG-REDACTED MoveRegionProcedure hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897 bypassed, returning 
> null to finish it
> 2018-10-12 16:29:53,956 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=411982, 
> ppid=411981, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180709093726, 
> region=f5f9ff1e4b0f2d9555dabfcca71df568, override=true, 
> server=va1002.halxg.cloudera.com,22101,1539368318649 in 3hrs, 49mins, 
> 12.419sec, unfinishedSiblingCount=0
> 2018-10-12 16:29:54,058 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=411981, 
> state=SUCCESS, bypass=LOG-REDACTED MoveRegionProcedure 
> hri=f5f9ff1e4b0f2d9555dabfcca71df568, 
> source=va1002.halxg.cloudera.com,22101,1539368318649, 
> destination=vd1021.halxg.cloudera.com,22101,1539368317897 in 3hrs, 49mins, 
> 12.878sec
> 2018-10-12 16:29:54,059 INFO 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: xlock for 
> pid=412210, ppid=411983, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
> AssignProcedure table=IntegrationTestBigLinkedList_20180709093726, 
> 

[jira] [Commented] (HBASE-21322) Add a scheduleServerCrashProcedure() API to HbckService

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654226#comment-16654226
 ] 

stack commented on HBASE-21322:
---

bq.  delete all MasterProcWALs immediately.

Do you delete while the Master is up?

bq. 3. check if the cluster can fail over.

You mean, have Master fail over?

The scenario you describe is extreme. The new Master does not pick up the 
-splitting items? Or it skips them because it notices that the cluster is 'up'?

I do not have an objection to being able to schedule an SCP. It could be 
useful. I'm trying to figure what the real world scenario you are simulating 
is... and why a new Master coming online doesn't recognize need for an SCP.

Thanks.

> Add a scheduleServerCrashProcedure() API to HbckService
> ---
>
> Key: HBASE-21322
> URL: https://issues.apache.org/jira/browse/HBASE-21322
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: Screenshot from 2018-10-17 13-35-58.png, Screenshot from 
> 2018-10-17 13-38-41.png, Screenshot from 2018-10-17 13-47-06.png
>
>
> According to my test, if one RS is down, then all procedure logs are deleted, 
> it will lead to that no ServerCrashProcedure is scheduled. And restarting 
> master cannot help. Thus we need to schedule a ServerCrashProcedure manually 
> to solve the problem. I plan to add a scheduleServerCrashProcedure() API to 
> HbckService, then add this API to HBCK2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21288) HostingServer in UnassignProcedure is not accurate

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654202#comment-16654202
 ] 

stack commented on HBASE-21288:
---

[~allan163] Ok. +1. Let me know if you want me to commit.

> HostingServer in UnassignProcedure is not accurate
> --
>
> Key: HBASE-21288
> URL: https://issues.apache.org/jira/browse/HBASE-21288
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Balancer
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21288.branch-2.0.001.patch, 
> HBASE-21288.branch-2.0.002.patch
>
>
> We have a case that a region shows status OPEN on a already dead server in 
> meta table(it is hard to trace how this happen), meaning this region is 
> actually not online. But balance came and scheduled a MoveReionProcedure for 
> this region, which created a mess:
> The balancer 'thought' this region was on the server which has the same 
> address(but with different startcode). So it schedules a MRP from this online 
> server to another, but the UnassignProcedure dispatch the unassign call to 
> the dead server according to regionstate, which then found the server dead 
> and schedule a SCP for the dead server. But since the UnassignProcedure's 
> hostingServer is not accurate, the SCP can't interrupt it.
> So, in the end, the SCP can't finish since the UnassignProcedure has the 
> region' lock, the UnassignProcedure can not finish since no one wake it, thus 
> stuck.
> Here is log, notice that the server of the UnassignProcedure is 
> 'hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584' but it was 
> dispatch to 'hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964'
> {code}
> 2018-10-10 14:34:50,011 INFO  [PEWorker-4] 
> assignment.RegionTransitionProcedure(252): Dispatch pid=13, ppid=12, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
> table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584; rit=CLOSING, 
> location=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964
> 2018-10-10 14:34:50,011 WARN  [PEWorker-4] 
> assignment.RegionTransitionProcedure(230): Remote call failed 
> hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; pid=13, ppid=12, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
> table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584; rit=CLOSING, 
> location=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; pid=13, ppid=12, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
> table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584
> //Then a SCP was scheduled
> 2018-10-10 14:34:50,012 WARN  [PEWorker-4] master.ServerManager(635): 
> Expiration of hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964 but 
> server not online
> 2018-10-10 14:34:50,012 INFO  [PEWorker-4] master.ServerManager(615): 
> Processing expiration of hb-uf6oyi699w8h700f0-003.hbase.rds. 
> ,16020,1539076734964 on hb-uf6oyi699w8h700f0-001.hbase.rds. 
> ,16000,1539088156164
> 2018-10-10 14:34:50,017 DEBUG [PEWorker-4] 
> procedure2.ProcedureExecutor(1089): Stored pid=14, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=false; ServerCrashProcedure 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964, 
> splitWal=true, meta=false
> //The SCP did not interrupt the UnassignProcedure but schedule new 
> AssignProcedure for this region
> 2018-10-10 14:34:50,043 DEBUG [PEWorker-6] 
> procedure.ServerCrashProcedure(250): Done splitting WALs pid=14, 
> state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, hasLock=true; ServerCrashProcedure 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964, 
> splitWal=true, meta=false
> 2018-10-10 14:34:50,054 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1691): Initialized subprocedures=[{pid=15, 
> ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
> AssignProcedure table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f}, 
> {pid=16, ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
> AssignProcedure table=hbase:req_intercept_rule, 
> region=460481706415d776b3742f428a6f579b}, {pid=17, ppid=14, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
> table=hbase:namespace, region=ec7a965e7302840120a5d8289947c40b}]
> {code}
> Here I also added a safe fence in balancer, if such regions are found, 
> balancing is skipped 

[jira] [Comment Edited] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Artem Ervits (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654191#comment-16654191
 ] 

Artem Ervits edited comment on HBASE-21279 at 10/17/18 9:16 PM:


v. 05 patch removes a tab character that was inherited from earlier script

https://github.com/apache/hbase/blob/master/hbase-shell/src/test/ruby/hbase/admin_test.rb#L690


was (Author: dbist13):
v. 05 patch removes a tab character that was inherited from earlier script

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, HBASE-21279.v05.patch, 
> testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Artem Ervits (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654191#comment-16654191
 ] 

Artem Ervits edited comment on HBASE-21279 at 10/17/18 9:14 PM:


v. 05 patch removes a tab character that was inherited from earlier script


was (Author: dbist13):
v. 05 patch removes a tab character

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, HBASE-21279.v05.patch, 
> testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Artem Ervits (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654191#comment-16654191
 ] 

Artem Ervits commented on HBASE-21279:
--

v. 05 patch removes a tab character

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, HBASE-21279.v05.patch, 
> testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Artem Ervits (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Ervits updated HBASE-21279:
-
Attachment: HBASE-21279.v05.patch

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, HBASE-21279.v05.patch, 
> testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20716) Unsafe access cleanup

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654158#comment-16654158
 ] 

stack commented on HBASE-20716:
---

Nice work. Looks great. Makes sense. Will commit in a while unless objection.

What else is to be done in here [~awked06]?

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: stack
>Assignee: Sahil Aggarwal
>Priority: Critical
>  Labels: beginner
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-20716.master.001.patch, 
> HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, 
> HBASE-20716.master.004.patch, HBASE-20716.master.005.patch, 
> HBASE-20716.master.006.patch, HBASE-20716.master.007.patch, 
> HBASE-20716.master.008.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654159#comment-16654159
 ] 

Hadoop QA commented on HBASE-21279:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
40s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m 
10s{color} | {color:red} The patch generated 82 new + 221 unchanged - 81 fixed 
= 303 total (was 302) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m 24s{color} | {color:orange} The patch generated 185 new + 418 unchanged - 
184 fixed = 603 total (was 602) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
58s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
9s{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21279 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944400/HBASE-21279.v04.patch 
|
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  rubocop  ruby_lint  |
| uname | Linux 9d8bdf28cdf4 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8cb28ce4b9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| rubocop | v0.59.2 |
| rubocop | 

[jira] [Updated] (HBASE-20716) Unsafe access cleanup

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20716:
--
Fix Version/s: 2.0.3
   2.1.1
   2.2.0
   3.0.0

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: stack
>Assignee: Sahil Aggarwal
>Priority: Critical
>  Labels: beginner
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-20716.master.001.patch, 
> HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, 
> HBASE-20716.master.004.patch, HBASE-20716.master.005.patch, 
> HBASE-20716.master.006.patch, HBASE-20716.master.007.patch, 
> HBASE-20716.master.008.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Artem Ervits (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654087#comment-16654087
 ] 

Artem Ervits commented on HBASE-21279:
--

[~yuzhih...@gmail.com] addressed all your comments, duration is more or less 
split even now
{code:java}
[INFO] Running org.apache.hadoop.hbase.client.TestAdminShell
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 216.568 
s - in org.apache.hadoop.hbase.client.TestAdminShell
[INFO] Running org.apache.hadoop.hbase.client.TestAdminShell2
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 162.588 
s - in org.apache.hadoop.hbase.client.TestAdminShell2{code}
The table in admin2_test.rb is named after the script as well
{code:java}
create 'hbase_shell_admin2_test_table'{code}
please review

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Artem Ervits (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Ervits updated HBASE-21279:
-
Attachment: HBASE-21279.v04.patch

> Split TestAdminShell into several tests
> ---
>
> Key: HBASE-21279
> URL: https://issues.apache.org/jira/browse/HBASE-21279
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21279.v01.patch, HBASE-21279.v02.patch, 
> HBASE-21279.v03.patch, HBASE-21279.v04.patch, testAdminShell-output.tar.gz
>
>
> In the flaky test board, TestAdminShell often timed out 
> (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).
> I ran the test on Linux with SSD and reproduced the timeout (see attached 
> test output).
> {code}
> 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
> hbase.rootdir to 
> /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
> ...
> 2018-10-08 02:49:09,093 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
> master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
> Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
> util.FSTableDescriptors(684): Wrote into 
> hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
> d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
> 2018-10-08 02:49:09,328 INFO  
> [RegionOpenAndInitThread-hbase_shell_tests_table-1] 
> regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD ==   
>   'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE',   CACHE_DATA_ON_WRITE => 
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => 
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'},  {NAME => 'y', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
> user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
> hbase_shell_tests_table
> ^[[38;5;226mE^[[0m
> ===
> Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
> Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
> CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
> 2018-10-08 02:49:09,361 INFO  [Block report processor] 
> blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
> 127.0.0.1:41338 is added to   
> blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
>41338|RBW]]} size 58
> > TEST TIMED OUT. PRINTING THREAD DUMP. <
> {code}
> We can see that the procedure #871 wasn't stuck - the timeout cut in and 
> stopped the test.
> We should separate the current test into two (or more) test files (with 
> corresponding .rb) so that the execution time consistently would not exceed 
> limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-10-17 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654055#comment-16654055
 ] 

Ted Yu commented on HBASE-21198:


Thanks Artem for working on this.

> Exclude dependency on net.minidev:json-smart
> 
>
> Key: HBASE-21198
> URL: https://issues.apache.org/jira/browse/HBASE-21198
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21198.v01.patch, HBASE-21198.v01.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
>  :
> {code}
> [ERROR] Failed to execute goal on project hbase-common: Could not resolve 
> dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
> Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 
> -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor 
> for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
> net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
> (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied 
> to: 
> https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
>  , ReasonPhrase:Forbidden. -> [Help 1]
> {code}
> We should exclude dependency on net.minidev:json-smart
> hbase-common/bin/pom.xml has done so.
> The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-10-17 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654043#comment-16654043
 ] 

Andrew Purtell commented on HBASE-21200:


I'm not super familiar with branch-2 memstore code, SegmentScanner is new to me 
for example. I think Ram's review is good, Anoop too if he can chime in. Happy 
to review any branch-1 backport.

> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: HBASE-21200-UT.patch, HBASE-21200.master.001.patch, 
> HBASE-21200.master.002.patch, RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)*
>     at 
> 

[jira] [Resolved] (HBASE-21324) [balancer] Gives up too soon when lots of regions

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21324.
---
Resolution: Not A Problem

Resolving as not a problem.  The balancer needs to be configured appropriately 
to run against a big cluster. Upping timeouts and max steps as well as downing 
per region steps got the balancer to make progress. That this config should be 
dynamic not requiring a restart and that the logging on balancer gives little 
indication that reconfig is needed are other issues.

> [balancer] Gives up too soon when lots of regions
> -
>
> Key: HBASE-21324
> URL: https://issues.apache.org/jira/browse/HBASE-21324
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> When there a few regions, the balancer seems to do a poorish job. My cluster 
> is all out of whack and while the balancer runs, it doesn't do anything. So, 
> I need to configure the balancer to run against large number of regions. 
> There is no doc in reguide to help and there is no logging coming out of the 
> balancer to help either. Let me fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21198:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.3
   2.1.1
   2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. Thanks for patch [~dbist13]

> Exclude dependency on net.minidev:json-smart
> 
>
> Key: HBASE-21198
> URL: https://issues.apache.org/jira/browse/HBASE-21198
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21198.v01.patch, HBASE-21198.v01.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
>  :
> {code}
> [ERROR] Failed to execute goal on project hbase-common: Could not resolve 
> dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
> Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 
> -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor 
> for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
> net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
> (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied 
> to: 
> https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
>  , ReasonPhrase:Forbidden. -> [Help 1]
> {code}
> We should exclude dependency on net.minidev:json-smart
> hbase-common/bin/pom.xml has done so.
> The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21275) Thrift Server (branch 1 fix) -> Disable TRACE HTTP method for thrift http server (branch 1 only)

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653999#comment-16653999
 ] 

stack commented on HBASE-21275:
---

Does not compile for me [~wchevreuil]. For you sir?

> Thrift Server (branch 1 fix) -> Disable TRACE HTTP method for thrift http 
> server (branch 1 only)
> 
>
> Key: HBASE-21275
> URL: https://issues.apache.org/jira/browse/HBASE-21275
> Project: HBase
>  Issue Type: Bug
>  Components: Thrift
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 1.4.8, 1.2.7
>
> Attachments: HBASE-21275-branch-1.001.patch, 
> HBASE-21275-branch-1.2.001.patch, HBASE-21275-branch-1.2.002.patch, 
> HBASE-21275-branch-1.2.003.patch, HBASE-21275-branch-1.2.003.patch, 
> HBASE-21275-branch-1.4.001.patch
>
>
> There's been a reasonable number of users running thrift http server on hbase 
> 1.x suffering with security audit tests pointing thrift server allows TRACE 
> requests.
> After doing some search, I can see HBASE-20406 added restrictions for 
> TRACE/OPTIONS method when Thrift is running over http, but it relies on many 
> other commits applied to thrift http server. This patch was later reverted 
> from master. Then again later, HBASE-20004 had made TRACE/OPTIONS 
> configurable via "*hbase.thrift.http.allow.options.method*" property, with 
> both methods being disabled by default. This also seems to rely on many 
> changes applied to thrift http server, and a branch 1 compatible patch does 
> not seem feasible.
> A solution for branch 1 is pretty simple though, am proposing a patch that 
> simply uses *WebAppContext*, instead of *Context*, as the context for the 
> *HttpServer* instance. *WebAppContext* will already restrict TRACE methods by 
> default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-10-17 Thread Artem Ervits (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653980#comment-16653980
 ] 

Artem Ervits commented on HBASE-21198:
--

[~stack] branches running on hadoop 3 so 2.x and 3.x.

> Exclude dependency on net.minidev:json-smart
> 
>
> Key: HBASE-21198
> URL: https://issues.apache.org/jira/browse/HBASE-21198
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
> Attachments: HBASE-21198.v01.patch, HBASE-21198.v01.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
>  :
> {code}
> [ERROR] Failed to execute goal on project hbase-common: Could not resolve 
> dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
> Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 
> -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor 
> for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
> net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
> (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied 
> to: 
> https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
>  , ReasonPhrase:Forbidden. -> [Help 1]
> {code}
> We should exclude dependency on net.minidev:json-smart
> hbase-common/bin/pom.xml has done so.
> The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21310) Split TestCloneSnapshotFromClient

2018-10-17 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21310:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed [~Apache9] addendum to branch-2.0 and branch-2.1 (while Duo sleeping).

> Split TestCloneSnapshotFromClient
> -
>
> Key: HBASE-21310
> URL: https://issues.apache.org/jira/browse/HBASE-21310
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21310-branch-2.1-addendum.patch, 
> HBASE-21310-v1.patch, HBASE-21310.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21301) Heatmap for key access patterns

2018-10-17 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21301:
---
Fix Version/s: 2.2.0
   1.5.0
   3.0.0

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-10-17 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653913#comment-16653913
 ] 

Andrew Purtell commented on HBASE-21301:


bq. 

We will need more than 1 byte for table_name_uid and region_name_uid. Assume 
for design purposes the number of regions or tables can be on the order of 
millions. I think that calls for 4 bytes, maybe 3 for table_name_uid (unsigned 
24 bits allows for 16777215 unique values). This will still be a lot more 
efficient than putting the table name and region name as strings into the key. 

What is region_uid? Isn't the region uniquely identified by region_name_uid? 

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-10-17 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653893#comment-16653893
 ] 

Andrew Purtell commented on HBASE-21301:


Above Archana references another internal discussion as "see W-5473921 Enhance 
compaction upgrade decision to consider file statistics". The idea there is
{quote}
Currently we can decide to upgrade a minor compaction to major if the data 
locality of the store files is below a threshold. There are other reasons we 
may want to upgrade the compaction. For example, the largest store file might 
be full of deleted cells. 
{quote}
While originally formulated to use statistics embedded in the hfiles, actually 
it seems a lot more natural to do this with time series data kept in a system 
table. What we want to know is, for a given store file, what percentage of its 
cells are covered by tombstones, but we only know that for a given store file 
by looking at the state of things at some later time well past when the store 
file itself is written. It's easy to see how timeseries data calculated and 
queried during compaction could support the use case, a lot harder to see how 
store file metadata could. I mention this only as an example of future work 
that would be enabled by the system metrics table proposed on this issue. I 
will file another issue about this at some point.

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21279) Split TestAdminShell into several tests

2018-10-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653878#comment-16653878
 ] 

Hadoop QA commented on HBASE-21279:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
23s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m  
7s{color} | {color:red} The patch generated 5 new + 123 unchanged - 179 fixed = 
128 total (was 302) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m 14s{color} | {color:orange} The patch generated 1 new + 274 unchanged - 328 
fixed = 275 total (was 602) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
46s{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21279 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944364/HBASE-21279.v03.patch 
|
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  rubocop  ruby_lint  |
| uname | Linux 3f30939a8db9 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8cc56bd18c |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| rubocop | v0.59.2 |
| rubocop | 

[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-10-17 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653869#comment-16653869
 ] 

Andrew Purtell commented on HBASE-21301:


Me
{blockquote}
Maybe we could sample reads and writes at the HRegion level and keep the 
derived stats in an in-memory data structure in the region. (Much lower 
overhead to keep it in-memory and local than attempt to persist to a real 
table.) We would persist relevant stats from this datastructure into the store 
files written during flushes and compactions."
{blockquote}

[~allan163] 
bq.  For example, we can record the hit count for a certain data block and keep 
the data in a memory structure. So that we can generate a heatmap for data 
block. I think it can narrow down the hot key in a smaller granularity than 
hfile range,which is too big.

I agree it can be done at the block granularity. We could store hit counts per 
block in meta blocks. 

Overall with the approach that records low level fine grained statistics into 
hfiles, it's easy to see how reads can be tracked this way, less clear what to 
do about writes. 

I advised [~archana.katiyar] to start with region granularity, building on the 
region level metrics for reads and writes that are already available, to lower 
the implementation effort for the first version of this. I also advised using 
the OpenTSDB schema as inspiration for efficient storage and extensibility. At 
this time this table would only store region read and write metrics to support 
this use case, but going forward the stats table will be available and 
potentially very useful for other use cases. I think this is another point in 
favor of using a table here. Above suggestions are great, especially enabling 
the date tiered compaction policy on the table by default. 

Also, we don't need to auto create the table if it doesn't exist, if that is 
going to be a problem. This is expected to be a one time only operation over 
the lifetime of a cluster. An admin can do it when setting up the cluster. We 
can document how to execute a small hbase shell script that creates the table 
where we also document how to enable the feature. 

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >