[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631375#comment-16631375
 ] 

stack commented on HBASE-21213:
---

An opinion please [~allan163].

In my testing doing stuff like purging all MasterProcWALs testing hbck2 fixup, 
I've manufactured a few odd cases where I want to be able to bypass a procedure 
even though it has children: i.e. in PE, I'd add something like this:

  if (!force) {
if (procedure.hasChildren()) {
  LOG.info("{} has children, skipping bypass", procedure);
  return false;
}
  } else {
LOG.info("Bypassing child check!");
  }

Cases are a MoveProcedure that has a lock on a region but its UnassignProcedure 
is no longer in the record pushed out because millions of procedures have 
passed through the system since. In the meantime, this stuck MoveProcedure is 
making it so master proc wals are starting to backup.

I can figure why it got stuck and fix the issue later but it illustrates a case 
where in hbck2 I will want to bypass a procedure even though it has children 
unfinished supposedly. It is dangerous bypassing such a procedure since it 
could make for hanging procedures... but in some cases I need to be able to do 
it. I should probably add a special flag...  Just wondering if you have any 
thing you'd add here. Thanks.

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21249:
--
Fix Version/s: 2.0.3
   2.1.1
   2.2.0
   3.0.0

> Add jitter for ProcedureUtil.getBackoffTimeMs
> -
>
> Key: HBASE-21249
> URL: https://issues.apache.org/jira/browse/HBASE-21249
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS

2018-09-27 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631367#comment-16631367
 ] 

Allan Yang commented on HBASE-21237:


{quote}
Wait a minute. Have you tried hadoop qa for branch-2.1? The procedure based 
replication peer modification need the executeProcedures call...
{quote}
Sorry, I thought they are the same, so share we revert this on branch-2.1?

> Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
> --
>
> Key: HBASE-21237
> URL: https://issues.apache.org/jira/browse/HBASE-21237
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21237.branch-2.0.001.patch
>
>
> As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use  
> CompatRemoteProcedureResolver  instead of ExecuteProceduresRemoteCall to 
> dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall  
> will group all the open/close operations in one call and execute them 
> sequentially on the target RS. If one operation fails, all the operation will 
> be marked as failure. Actually, some of the operations(like open region) is 
> already executing in the open region handler thread. But master thinks these 
> operations fails and reassign the regions to another RS. So when the previous 
> RS report to the master that the region is online, master will kill the RS 
> since it already assign the region to another RS.
> For branch-2.2+, HBASE-21217 will fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631354#comment-16631354
 ] 

Hadoop QA commented on HBASE-21247:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
3s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
16s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 2 new + 25 unchanged 
- 0 fixed = 27 total (was 25) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
10s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  6m 
36s{color} | {color:red} The patch causes 15 errors with Hadoop v3.0.0. {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}129m  
0s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21247 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941595/21247.v3.txt |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-143-generic #192-Ubuntu SMP Tue 
Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 86cb8e4 |
| maven | version: Apache Maven 3.0.5 
(r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 13:51:28+) |
| Default Java | 1.8.0_172 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14520/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| whitespace | 

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631345#comment-16631345
 ] 

stack commented on HBASE-20952:
---

bq. If you get a chance to look at the new doc that I Ted and I worked on, 
that'd be greatly appreciated: 

Thanks for the ping [~elserj]. I read the doc. IMO, it gives little to no 
inkling as to how hbase will be changed. I left some comments. Thanks.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20716) Unsafe access cleanup

2018-09-27 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631332#comment-16631332
 ] 

Anoop Sam John commented on HBASE-20716:


Sorry for the delay...   Its coming good.. I like the way you abstracted out 
the things.. Now we will have Bytes and BBUtils class as front end and within 
which we deal via Unsafe or pure java costly way.  
The concern is some times the static classes within say BBUtils can get loaded 
which might cause the static fields to get initialized.  You can see in class 
UnsafeConverter , there is static state of type Unsafe which can cause a class 
loading try for Unsafe class. And in some env this is not available!  We have 
seen such cases been reported in mailing list/jira.
See how we have handled the Best comparator thing in Bytes class..  The loading 
is based on a FQCN and by Class.forName.  
Also for BBUtils, all the access where via UnsafeAvailChecker (dont know things 
changed in later but when these classes were introduced it was this way).  
Within UnsafeAvailChecker , there is no direct ref to Unsafe class at all.
We need a similar way now.
Am I making it clear to you?  

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: stack
>Assignee: Sahil Aggarwal
>Priority: Critical
>  Labels: beginner
> Attachments: HBASE-20716.master.001.patch, 
> HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, 
> HBASE-20716.master.004.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section

2018-09-27 Thread Sahil Aggarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631331#comment-16631331
 ] 

Sahil Aggarwal commented on HBASE-21186:


Done.

> Document hbase.regionserver.executor.openregion.threads in MTTR section
> ---
>
> Key: HBASE-21186
> URL: https://issues.apache.org/jira/browse/HBASE-21186
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sahil Aggarwal
>Assignee: Sahil Aggarwal
>Priority: Minor
> Attachments: HBASE-21186.master.001.patch, 
> HBASE-21186.master.002.patch
>
>
> hbase.regionserver.executor.openregion.threads helps in improving MTTR by 
> increasing assign rpc processing rate at RS from HMaster but is not 
> documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section

2018-09-27 Thread Sahil Aggarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Aggarwal updated HBASE-21186:
---
Attachment: HBASE-21186.master.002.patch

> Document hbase.regionserver.executor.openregion.threads in MTTR section
> ---
>
> Key: HBASE-21186
> URL: https://issues.apache.org/jira/browse/HBASE-21186
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sahil Aggarwal
>Assignee: Sahil Aggarwal
>Priority: Minor
> Attachments: HBASE-21186.master.001.patch, 
> HBASE-21186.master.002.patch
>
>
> hbase.regionserver.executor.openregion.threads helps in improving MTTR by 
> increasing assign rpc processing rate at RS from HMaster but is not 
> documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-27 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21213:
--
Attachment: HBASE-21213.branch-2.1.007.patch

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-27 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21242:
--
Attachment: HBASE-21242.branch-2.1.001.patch

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section

2018-09-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631326#comment-16631326
 ] 

Ted Yu commented on HBASE-21186:


bq. where single region is holding

I think you meant single region server.

Please check the grammar of your addition.

Thanks

> Document hbase.regionserver.executor.openregion.threads in MTTR section
> ---
>
> Key: HBASE-21186
> URL: https://issues.apache.org/jira/browse/HBASE-21186
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sahil Aggarwal
>Assignee: Sahil Aggarwal
>Priority: Minor
> Attachments: HBASE-21186.master.001.patch
>
>
> hbase.regionserver.executor.openregion.threads helps in improving MTTR by 
> increasing assign rpc processing rate at RS from HMaster but is not 
> documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20716) Unsafe access cleanup

2018-09-27 Thread Sahil Aggarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631312#comment-16631312
 ] 

Sahil Aggarwal commented on HBASE-20716:


[~stack] Can you please have a look at the patch?

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: stack
>Assignee: Sahil Aggarwal
>Priority: Critical
>  Labels: beginner
> Attachments: HBASE-20716.master.001.patch, 
> HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, 
> HBASE-20716.master.004.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-27 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21249:
-

 Summary: Add jitter for ProcedureUtil.getBackoffTimeMs
 Key: HBASE-21249
 URL: https://issues.apache.org/jira/browse/HBASE-21249
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21244:
--
Attachment: HBASE-21244.patch

> Skip persistence when retrying for assignment related procedures
> 
>
> Key: HBASE-21244
> URL: https://issues.apache.org/jira/browse/HBASE-21244
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Performance, proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21244.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21244:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Skip persistence when retrying for assignment related procedures
> 
>
> Key: HBASE-21244
> URL: https://issues.apache.org/jira/browse/HBASE-21244
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21244.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21233:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+.

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21233.patch, HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS

2018-09-27 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631297#comment-16631297
 ] 

Duo Zhang commented on HBASE-21237:
---

So here I think we could just remove the different implementation of 
RemoteDispatcher... Just have one implementation, for open and close, we call 
openRegion and closeRegion, and for other remote procedures, we call 
executeProcedures.

> Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
> --
>
> Key: HBASE-21237
> URL: https://issues.apache.org/jira/browse/HBASE-21237
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21237.branch-2.0.001.patch
>
>
> As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use  
> CompatRemoteProcedureResolver  instead of ExecuteProceduresRemoteCall to 
> dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall  
> will group all the open/close operations in one call and execute them 
> sequentially on the target RS. If one operation fails, all the operation will 
> be marked as failure. Actually, some of the operations(like open region) is 
> already executing in the open region handler thread. But master thinks these 
> operations fails and reassign the regions to another RS. So when the previous 
> RS report to the master that the region is online, master will kill the RS 
> since it already assign the region to another RS.
> For branch-2.2+, HBASE-21217 will fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS

2018-09-27 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631296#comment-16631296
 ] 

Duo Zhang commented on HBASE-21237:
---

Wait a minute. Have you tried hadoop qa for branch-2.1? The procedure based 
replication peer modification need the executeProcedures call...

> Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
> --
>
> Key: HBASE-21237
> URL: https://issues.apache.org/jira/browse/HBASE-21237
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21237.branch-2.0.001.patch
>
>
> As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use  
> CompatRemoteProcedureResolver  instead of ExecuteProceduresRemoteCall to 
> dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall  
> will group all the open/close operations in one call and execute them 
> sequentially on the target RS. If one operation fails, all the operation will 
> be marked as failure. Actually, some of the operations(like open region) is 
> already executing in the open region handler thread. But master thinks these 
> operations fails and reassign the regions to another RS. So when the previous 
> RS report to the master that the region is online, master will kill the RS 
> since it already assign the region to another RS.
> For branch-2.2+, HBASE-21217 will fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21233:
--
Fix Version/s: 2.0.3

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21233.patch, HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631285#comment-16631285
 ] 

Duo Zhang commented on HBASE-21233:
---

Let me commit. Thanks [~allan163] for reviewing.

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21233.patch, HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631284#comment-16631284
 ] 

Hadoop QA commented on HBASE-21233:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21233 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941531/HBASE-21233.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 6f4c7bad6b75 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 86cb8e48ad |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14519/testReport/ |
| Max. process+thread count | 279 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14519/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Allow the procedure 

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631251#comment-16631251
 ] 

Hudson commented on HBASE-21228:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1167 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1167/])
HBASE-21228 Memory leak since AbstractFSWAL caches Thread object and (apurtell: 
rev ff29edc856c29fb6691f9c1798c344733383c7ee)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-27 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631250#comment-16631250
 ] 

Andrew Purtell commented on HBASE-21207:


I applied the v1 patch to master, launched the newly built version in-tree. 
Created a table. Browsed to the table details page. Visually the result looks 
like the attached PNG files. Clicking on the table header re-sorts the view. 

master patch does not apply to branch-2. The problem is table.jsp. It's a 
nontrivial reject. [~archana.katiyar] if you could provide a patch for branch-2 
that would be most appreciated. Going to need to commit this there before 
proceeding down the line to branch-1 etc. 



> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, 
> HBASE-21207-branch-1.v1.patch, HBASE-21207.patch, HBASE-21207.patch, 
> HBASE-21207.v1.patch, edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure

2018-09-27 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21248:
-

 Summary: Implement exponential backoff when retrying for 
ModifyPeerProcedure
 Key: HBASE-21248
 URL: https://issues.apache.org/jira/browse/HBASE-21248
 Project: HBase
  Issue Type: Bug
  Components: proc-v2, Replication
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 3.0.0, 2.2.0, 2.1.1






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631249#comment-16631249
 ] 

Hudson commented on HBASE-21228:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #485 (See 
[https://builds.apache.org/job/HBase-1.3-IT/485/])
HBASE-21228 Memory leak since AbstractFSWAL caches Thread object and (apurtell: 
rev 9321f7d86cb1eee4508d96f261189b90b85e714c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-09-27 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630724#comment-16630724
 ] 

Toshihiro Suzuki edited comment on HBASE-21200 at 9/28/18 1:24 AM:
---

It seems like a similar issue to HBASE-15871 occurs in case of the following 
steps.

1) Create a reversed store scanner.
2) Put a lot of cells that have sequenceID grater than the readPt of the 
reverse scanner into memstore.
3) Call the reverse scanner.next() and in this status, a lot of cells in 
memstore have sequenceID greater than the readPt of the reverse scanner because 
of 2). This condition causes that seekToPreviousRow() repeatedly search cells 
that are already searched. It's described in the following image in HBASE-15871:
https://issues.apache.org/jira/secure/attachment/12805207/memstore_backwardSeek%28%29.PNG
4) Flush a memstore, and wait until 3) process finished, to update store files 
in the same HStore after flushing.

I'm attaching a patch to reproduce this issue.


was (Author: brfrn169):
It seems like a similar issue to HBASE-15871 occurs in case of the following 
steps.

1) Create a reversed store scanner.
2) Put a lot of cells that have sequenceID grater than the readPt of the 
reverse scanner into memstore.
3) Call the reverse scanner.next() and in this status, a lot of cells in 
memstore have sequenceID greater than the readPt of the reverse scanner because 
of 2). This condition causes that seekToPreviousRow() repeatedly search cells 
that are already searched.
4) Flush a memstore, and wait until 3) process finished, to update store files 
in the same HStore after flushing.

I'm attaching a patch to reproduce this issue.

> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Priority: Major
> Attachments: HBASE-21200-UT.patch, RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631240#comment-16631240
 ] 

Andrew Purtell commented on HBASE-21228:


This applies to branch-1.2, branch-1.3, and branch-1.4 and would seem to be an 
issue in the respective releasing code lines. I have updated fix versions and 
pushed the change to those branches too. Ran WAL unit tests beforehand. 

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-09-27 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631237#comment-16631237
 ] 

Toshihiro Suzuki commented on HBASE-21200:
--

No error occurs in the test. Just the reversed scan is very slow. I will need 
to improve the test, but what I wanted to show in the test is that the similar 
issue to HBASE-15871 is reproduced by the above steps. And that causes very 
slow reversed scan.

> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Priority: Major
> Attachments: HBASE-21200-UT.patch, RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)*
>     at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>     at 
> 

[jira] [Commented] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631231#comment-16631231
 ] 

Hudson commented on HBASE-20766:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #484 (See 
[https://builds.apache.org/job/HBase-1.3-IT/484/])
HBASE-20766 Typo in VerifyReplication error. (apurtell: rev 
10e486882a79e722d44cb42259fbcad1a08a6342)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java


> Verify Replication Tool Has Typo "remove cluster"
> -
>
> Key: HBASE-20766
> URL: https://issues.apache.org/jira/browse/HBASE-20766
> Project: HBase
>  Issue Type: Bug
>Reporter: Clay B.
>Assignee: Ferran Fernandez Garrido
>Priority: Trivial
>  Labels: beginner
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8
>
> Attachments: HBASE-20766.master.001.patch
>
>
> The verify replication tool has a trivial typo "remove cluster" instead of 
> "remote cluster": 
> https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1

2018-09-27 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21220:
---
Status: Open  (was: Patch Available)

> Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and 
> ROWPREFIX_DELIMITED) to branch-1
> --
>
> Key: HBASE-21220
> URL: https://issues.apache.org/jira/browse/HBASE-21220
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21220-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1

2018-09-27 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21220:
---
Status: Patch Available  (was: Open)

> Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and 
> ROWPREFIX_DELIMITED) to branch-1
> --
>
> Key: HBASE-21220
> URL: https://issues.apache.org/jira/browse/HBASE-21220
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21220-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-09-27 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631228#comment-16631228
 ] 

Duo Zhang commented on HBASE-21246:
---

Why we have a getSize in WALIdentity?

And do we want to introduce some structures in the WALIdentity? If not, why not 
just use a String? You introduced a method which creates a WALIdentity from a 
String, then why not just use String as the identity?

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21246.HBASE-20952.001.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21228:
---
Fix Version/s: 1.2.8
   1.3.3
   1.5.0

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"

2018-09-27 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20766:
---
Fix Version/s: 1.4.8
   1.3.3

> Verify Replication Tool Has Typo "remove cluster"
> -
>
> Key: HBASE-20766
> URL: https://issues.apache.org/jira/browse/HBASE-20766
> Project: HBase
>  Issue Type: Bug
>Reporter: Clay B.
>Assignee: Ferran Fernandez Garrido
>Priority: Trivial
>  Labels: beginner
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8
>
> Attachments: HBASE-20766.master.001.patch
>
>
> The verify replication tool has a trivial typo "remove cluster" instead of 
> "remote cluster": 
> https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631216#comment-16631216
 ] 

Ted Yu commented on HBASE-21247:


>From https://builds.apache.org/job/PreCommit-HBASE-Build/14515/console :
{code}
22:32:24 [Thu Sep 27 22:32:24 UTC 2018 DEBUG]: jira_http_fetch: 
https://issues.apache.org/jira/browse/HBASE-21247 returned 4xx status code. 
Maybe incorrect username/password?
22:32:24 [Thu Sep 27 22:32:24 UTC 2018 DEBUG]: jira_locate_patch: not a JIRA.
{code}

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631100#comment-16631100
 ] 

Josh Elser commented on HBASE-21247:


You resubmit it? Maybe a problem with that specific host?

Otherwise, looks ok to me. Thanks for the extra tests. +1 on qa

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631058#comment-16631058
 ] 

Ted Yu commented on HBASE-21247:


>From https://builds.apache.org/job/PreCommit-HBASE-Build/14513/console :
{code}
20:10:21 ERROR: Unsure how to process HBASE-21247.
{code}

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631035#comment-16631035
 ] 

Ted Yu commented on HBASE-21247:


In patch v3, there are two new subtests.

* when only WALFactory.WAL_PROVIDER_CLASS is specified, verify that both wal 
provider and meta wal provider are of this class
* when only WALFactory.META_WAL_PROVIDER_CLASS is specified, verify that wal 
provider is default and that meta wal provider is of this class

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Attachment: 21247.v3.txt

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631000#comment-16631000
 ] 

Josh Elser commented on HBASE-21247:


{code:java}
+assertEquals(IOTestProvider.class.getName(), 
fshLogProvider.getName());{code}
nit: just test the {{Class}} objects instead of the String representation of 
the names.

Do we have a test coverage for the scenarios described in the release notes?
 # ...  If not specified, we fall back to the WAL provider enum specification. 
 # ... If not specified, we fall back to using the value for 
hbase.wal.provider.class . 
 # Fallback to enum specific if neither meta provider class nor provider class 
are set.

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630991#comment-16630991
 ] 

Josh Elser commented on HBASE-21247:


{quote}The user doesn't need to specify two WAL classes in the normal case.
{quote}
Thanks. This, along with the release notes, helps.

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Attachment: 21247.v2.txt

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Release Note: 
Two config parameters, hbase.wal.provider.class and 
hbase.wal.meta_provider.class are introduced.

hbase.wal.provider.class, when specified, configures the WAL provider class 
through its class name. If not specified, we fall back to the WAL provider enum 
specification.

hbase.wal.meta_provider.class, when specified, configures the WAL provider 
class for hbase:meta through its class name. If not specified, we fall back to 
using the value for hbase.wal.provider.class .

These new configs, when specified, override the enum WAL provider config.

  was:
Two config parameters, hbase.wal.provider.class and 
hbase.wal.meta_provider.class are introduced.

hbase.wal.provider.class, when specified, configures the WAL provider class 
through its class name. If not specified, we fall back to the WAL provider enum 
specification.

hbase.wal.meta_provider.class, when specified, configures the WAL provider 
class for hbase:meta through its class name. If not specified, we fall back to 
using the value for hbase.wal.provider.class .


> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Release Note: 
Two config parameters, hbase.wal.provider.class and 
hbase.wal.meta_provider.class are introduced.

hbase.wal.provider.class, when specified, configures the WAL provider class 
through its class name. If not specified, we fall back to the WAL provider enum 
specification.

hbase.wal.meta_provider.class, when specified, configures the WAL provider 
class for hbase:meta through its class name. If not specified, we fall back to 
using the value for hbase.wal.provider.class .

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630925#comment-16630925
 ] 

Ted Yu commented on HBASE-21247:


w.r.t. first comment above,
{code}
  boolean metaWALProvPresent = conf.get(META_WAL_PROVIDER_CLASS) != null;
  provider = createProvider(getProviderClass(
  metaWALProvPresent ? META_WAL_PROVIDER_CLASS : WAL_PROVIDER_CLASS,
{code}
when META_WAL_PROVIDER_CLASS is not specified, we fall back to the value for 
WAL_PROVIDER_CLASS (if present).
The user doesn't need to specify two WAL classes in the normal case.




> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630906#comment-16630906
 ] 

Josh Elser commented on HBASE-21247:


{code:java}
-  provider = createProvider(getProviderClass(META_WAL_PROVIDER,
-  conf.get(WAL_PROVIDER, DEFAULT_WAL_PROVIDER)));
+  boolean metaWALProvPresent = conf.get(META_WAL_PROVIDER_CLASS) != null;
+  provider = createProvider(getProviderClass(
+  metaWALProvPresent ? META_WAL_PROVIDER_CLASS : WAL_PROVIDER_CLASS,
+  META_WAL_PROVIDER, conf.get(WAL_PROVIDER, 
DEFAULT_WAL_PROVIDER)));{code}
I thought HBASE-20856 got rid of this duplicity around setting a WALProvider 
and a Meta WALProvider?
{code:java}
   public static final String WAL_PROVIDER = "hbase.wal.provider";
   static final String DEFAULT_WAL_PROVIDER = Providers.defaultProvider.name();
+  public static final String WAL_PROVIDER_CLASS = "hbase.wal.provider.class";
+  static final Class DEFAULT_WAL_PROVIDER_CLASS = 
AsyncFSWALProvider.class;{code}
What happens if I provide both the old configuration properties and the new 
ones? Can you summarize how this will work in release notes? We should have 
some test additions to cover that too.

How about adding a test to make sure you can create a WALProvider (i.e. as a 
static-inner class of the the test class) and HBase uses it as the WALProvider? 
I think it should be easy in TestWALFactory – I recall some prior art there.

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20951) Ratis LogService backed WALs

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630901#comment-16630901
 ] 

Josh Elser commented on HBASE-20951:


FYI HBASE-20952 was spun out into a standalone issue (from a child issue) to 
better support making small changes to the WAL API. Linked it off of this issue.

> Ratis LogService backed WALs
> 
>
> Key: HBASE-20951
> URL: https://issues.apache.org/jira/browse/HBASE-20951
> Project: HBase
>  Issue Type: New Feature
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> Umbrella issue for the Ratis+WAL work:
> Design doc: 
> [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit]
> The (over-simplified) goal is to re-think the current WAL APIs we have now, 
> ensure that they are de-coupled from the notion of being backed by HDFS, swap 
> the current implementations over to the new API, and then wire up the Ratis 
> LogService to the new WAL API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630893#comment-16630893
 ] 

Josh Elser commented on HBASE-21247:


[~sergey.soldatov] made the suggestion offline today that this one should just 
go into Master. Generally helpful outside of the efforts of HBASE-20951

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-21247:
---
Fix Version/s: 3.0.0

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-09-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630858#comment-16630858
 ] 

Ted Yu commented on HBASE-21200:


I ran the new test on master branch which passed.

What error is expected from the new test ?


> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Priority: Major
> Attachments: HBASE-21200-UT.patch, RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)*
>     at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>     at 
> 

[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Status: Patch Available  (was: Open)

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21247:
--

 Summary: Allow WAL Provider to be specified by configuration 
without explicit enum in Providers
 Key: HBASE-21247
 URL: https://issues.apache.org/jira/browse/HBASE-21247
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 21247.v1.txt

Currently all the WAL Providers acceptable to hbase are specified in Providers 
enum of WALFactory.
This restricts the ability for additional WAL Providers to be supplied - by 
class name.

This issue introduces additional config which allows the specification of new 
WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Attachment: 21247.v1.txt

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21247.v1.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630789#comment-16630789
 ] 

Hudson commented on HBASE-21237:


Results for branch branch-2.0
[build #872 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
> --
>
> Key: HBASE-21237
> URL: https://issues.apache.org/jira/browse/HBASE-21237
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21237.branch-2.0.001.patch
>
>
> As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use  
> CompatRemoteProcedureResolver  instead of ExecuteProceduresRemoteCall to 
> dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall  
> will group all the open/close operations in one call and execute them 
> sequentially on the target RS. If one operation fails, all the operation will 
> be marked as failure. Actually, some of the operations(like open region) is 
> already executing in the open region handler thread. But master thinks these 
> operations fails and reassign the regions to another RS. So when the previous 
> RS report to the master that the region is online, master will kill the RS 
> since it already assign the region to another RS.
> For branch-2.2+, HBASE-21217 will fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630782#comment-16630782
 ] 

Hudson commented on HBASE-21228:


Results for branch branch-2
[build #1310 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Status: Patch Available  (was: Open)

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21246.HBASE-20952.001.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-09-27 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630736#comment-16630736
 ] 

Toshihiro Suzuki commented on HBASE-21200:
--

It looks like this issue is reproduced in both master and branch-1.

> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Priority: Major
> Attachments: HBASE-21200-UT.patch, RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)*
>     at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>     at 
> 

[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.HBASE-20952.001.patch

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21246.HBASE-20952.001.patch
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21246) Introduce WALIdentity interface

2018-09-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21246:
--

 Summary: Introduce WALIdentity interface
 Key: HBASE-21246
 URL: https://issues.apache.org/jira/browse/HBASE-21246
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Ted Yu


We are introducing WALIdentity interface so that the WAL representation can be 
decoupled from distributed filesystem.

The interface provides getName method whose return value can represent filename 
in distributed filesystem environment or, the name of the stream when the WAL 
is backed by log stream.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20952) Re-visit the WAL API

2018-09-27 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20952:
---
Issue Type: Improvement  (was: Sub-task)
Parent: (was: HBASE-20951)

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-09-27 Thread Toshihiro Suzuki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiro Suzuki updated HBASE-21200:
-
Attachment: HBASE-21200-UT.patch

> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Priority: Major
> Attachments: HBASE-21200-UT.patch, RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)*
>     at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>     at 
> 

[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.

2018-09-27 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630724#comment-16630724
 ] 

Toshihiro Suzuki commented on HBASE-21200:
--

It seems like a similar issue to HBASE-15871 occurs in case of the following 
steps.

1) Create a reversed store scanner.
2) Put a lot of cells that have sequenceID grater than the readPt of the 
reverse scanner into memstore.
3) Call the reverse scanner.next() and in this status, a lot of cells in 
memstore have sequenceID greater than the readPt of the reverse scanner because 
of 2). This condition causes that seekToPreviousRow() repeatedly search cells 
that are already searched.
4) Flush a memstore, and wait until 3) process finished, to update store files 
in the same HStore after flushing.

I'm attaching a patch to reproduce this issue.

> Memstore flush doesn't finish because of seekToPreviousRow() in memstore 
> scanner.
> -
>
> Key: HBASE-21200
> URL: https://issues.apache.org/jira/browse/HBASE-21200
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: dongjin2193.jeon
>Priority: Major
> Attachments: RegionServerJstack.log
>
>
> The  issue of delaying memstore flush still occurs after backport hbase-15871.
> Reverse scan takes a long time to seek previous row in the memstore full of 
> deleted cells.
>  
> jstack :
> "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a 
> waiting on condition [0x7fa3b9b6b000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xa465fe60> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)*
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     at java.lang.Thread.run(Thread.java:748)
>  
> "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 
> os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818)
>     - locked <0xb45aa5b8> (a 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
>     at 
> org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000)
>     - locked <0xb45aa5b8> (a 
> 

[jira] [Commented] (HBASE-21234) Archive folder not getting cleaned due to SnapshotHFileCleaner error

2018-09-27 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630649#comment-16630649
 ] 

Josh Elser commented on HBASE-21234:


[~abhilater] is this actually Apache HBase 1.1.2 or some HDP release that is 
based on Apache HBase 1.1.2?

> Archive folder not getting cleaned due to SnapshotHFileCleaner error
> 
>
> Key: HBASE-21234
> URL: https://issues.apache.org/jira/browse/HBASE-21234
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.2
>Reporter: Abhishek Gupta
>Priority: Critical
>  Labels: cleanup, snapshot, snapshots
>
> Getting following exception during ChoreService runs in HBase Master logs. As 
> a result we are accumulating a lot of data in archive folder as archive is 
> not getting reclaimed. 
> {code:java}
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos at 
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest.internalGetFieldAccessorTable(SnapshotProtos.java:1190
> {code}
>  Complete stack-trace
> {code:java}
> 2018-09-26 10:15:06,188 ERROR [master01,16000,1536315941769_ChoreService_3] 
> snapshot.SnapshotHFileCleaner: Exception while checking if files were valid, 
> keeping them just in case. java.io.IOException: ExecutionException at 
> org.apache.hadoop.hbase.snapshot.SnapshotManifestV2.loadRegionManifests(SnapshotManifestV2.java:161)
>  at 
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:364)
>  at 
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:130)
>  at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:128)
>  at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:357)
>  at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:340)
>  at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:88)
>  at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:303)
>  at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:194)
>  at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:63)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:287)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:211)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206)
>  at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:130)
>  at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos at 
> 

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630580#comment-16630580
 ] 

Hudson commented on HBASE-21228:


Results for branch master
[build #514 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/514/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/514//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/514//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/514//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-17992) The snapShot TimeoutException causes the cleanerChore thread to fail to complete the archive correctly

2018-09-27 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17992:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Resolving as a duplicate of HBASE-16464

> The snapShot TimeoutException causes the cleanerChore thread to fail to 
> complete the archive correctly
> --
>
> Key: HBASE-17992
> URL: https://issues.apache.org/jira/browse/HBASE-17992
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.10, 1.3.0
>Reporter: Bo Cui
>Priority: Major
> Attachments: hbase-17992-0.98.patch, hbase-17992-1.3.patch, 
> hbase-17992-master.patch, hbase-17992.patch
>
>
> The problem is that when the snapshot occurs TimeoutException  or other 
> Exceptions, there is no correct delete /hbase/.hbase-snapshot/tmp, which 
> causes the cleanerChore to fail to complete the archive correctly.
> Modifying the configuration parameter (hbase.snapshot.master.timeout.millis = 
> 60) only reduces the probability of the problem occurring.
> So the solution to the problem is: multi-Threaded exceptions or 
> TimeoutExceptions, the Main-thread must wait until all the tasks are finished 
> or canceled, the Main-thread can be cleared 
> /hbase/.hbase-snapshot/tmp/snapshotName.Otherwise the task is likely to write 
> /hbase/.hbase-snapshot/tmp/snapshotName/region - mainfest
> The problem exists in disabledTableSnapshot and enabledTableSnapshot, because 
> I'm currently using the disabledTableSnapshot, so I provide the patch of 
> disabledTableSnapshot



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630356#comment-16630356
 ] 

Hudson commented on HBASE-20766:


Results for branch master
[build #513 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Verify Replication Tool Has Typo "remove cluster"
> -
>
> Key: HBASE-20766
> URL: https://issues.apache.org/jira/browse/HBASE-20766
> Project: HBase
>  Issue Type: Bug
>Reporter: Clay B.
>Assignee: Ferran Fernandez Garrido
>Priority: Trivial
>  Labels: beginner
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-20766.master.001.patch
>
>
> The verify replication tool has a trivial typo "remove cluster" instead of 
> "remote cluster": 
> https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630354#comment-16630354
 ] 

Hudson commented on HBASE-21227:


Results for branch master
[build #513 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Implement exponential retrying backoff for Assign/UnassignRegionHandler 
> introduced in HBASE-21217
> -
>
> Key: HBASE-21227
> URL: https://issues.apache.org/jira/browse/HBASE-21227
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, regionserver
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21227-v1.patch, HBASE-21227.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630353#comment-16630353
 ] 

Hudson commented on HBASE-21212:


Results for branch master
[build #513 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Wrong flush time when update flush metric
> -
>
> Key: HBASE-21212
> URL: https://issues.apache.org/jira/browse/HBASE-21212
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21212.branch-2.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21232) Show table state in Tables view on Master home page

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630355#comment-16630355
 ] 

Hudson commented on HBASE-21232:


Results for branch master
[build #513 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Show table state in Tables view on Master home page
> ---
>
> Key: HBASE-21232
> URL: https://issues.apache.org/jira/browse/HBASE-21232
> Project: HBase
>  Issue Type: Bug
>  Components: Operability, UI
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21232.branch-2.1.001.patch, table.pdf
>
>
> Add a column to the Tables panel on the Master home page. Useful when trying 
> to figure if table is enabled/disable/disabling/enabling...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630344#comment-16630344
 ] 

Hudson commented on HBASE-21228:


Results for branch branch-2.1
[build #384 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21245) Add exponential backoff when retrying for sync replication related procedures

2018-09-27 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21245:
-

 Summary: Add exponential backoff when retrying for sync 
replication related procedures
 Key: HBASE-21245
 URL: https://issues.apache.org/jira/browse/HBASE-21245
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-27 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21244:
-

 Summary: Skip persistence when retrying for assignment related 
procedures
 Key: HBASE-21244
 URL: https://issues.apache.org/jira/browse/HBASE-21244
 Project: HBase
  Issue Type: Sub-task
  Components: amv2, Performance, proc-v2
Reporter: Duo Zhang
 Fix For: 3.0.0, 2.2.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21233:
--
Attachment: HBASE-21233.patch

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21233.patch, HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630271#comment-16630271
 ] 

Hudson commented on HBASE-21228:


Results for branch branch-2.0
[build #871 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630242#comment-16630242
 ] 

Allan Yang commented on HBASE-21233:


+1 for the patch.

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630181#comment-16630181
 ] 

Hudson commented on HBASE-21212:


Results for branch branch-1.4
[build #480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Wrong flush time when update flush metric
> -
>
> Key: HBASE-21212
> URL: https://issues.apache.org/jira/browse/HBASE-21212
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21212.branch-2.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630177#comment-16630177
 ] 

Hudson commented on HBASE-21212:


Results for branch branch-1
[build #478 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Wrong flush time when update flush metric
> -
>
> Key: HBASE-21212
> URL: https://issues.apache.org/jira/browse/HBASE-21212
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21212.branch-2.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630178#comment-16630178
 ] 

Hudson commented on HBASE-20766:


Results for branch branch-1
[build #478 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Verify Replication Tool Has Typo "remove cluster"
> -
>
> Key: HBASE-20766
> URL: https://issues.apache.org/jira/browse/HBASE-20766
> Project: HBase
>  Issue Type: Bug
>Reporter: Clay B.
>Assignee: Ferran Fernandez Garrido
>Priority: Trivial
>  Labels: beginner
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-20766.master.001.patch
>
>
> The verify replication tool has a trivial typo "remove cluster" instead of 
> "remote cluster": 
> https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart

2018-09-27 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20727:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Persist FlushedSequenceId to speed up WAL split after cluster restart
> -
>
> Key: HBASE-20727
> URL: https://issues.apache.org/jira/browse/HBASE-20727
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20727.002.patch, HBASE-20727.003.patch, 
> HBASE-20727.004.patch, HBASE-20727.005.patch, HBASE-20727.patch
>
>
> We use flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion in 
> ServerManager to record the latest flushed seqids of regions and stores. So 
> during log split, we can use seqids stored in those maps to filter out the 
> edits which do not need to be replayed. But, those maps are not persisted. 
> After cluster restart or master restart, info of flushed seqids are all lost. 
> Here I offer a way to persist those info to HDFS, even if master restart, we 
> can still use those info to filter WAL edits and then to speed up replay.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21235) Rename the closed procedure wal files so that we do not need to call recoverLease when restarting

2018-09-27 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630120#comment-16630120
 ] 

Duo Zhang commented on HBASE-21235:
---

It is not easy as I expected... We have a ProcedureWALFile and the file name is 
stored in this class, and we will use this class to archive the old log files. 
So when renaming we also need to change the file name in this class and it may 
introduce races...

Maybe a once for all solution is to just get rid of the current wal based 
procedure store. Just use a HRegion...

Will be back later... 

> Rename the closed procedure wal files so that we do not need to call 
> recoverLease when restarting
> -
>
> Key: HBASE-21235
> URL: https://issues.apache.org/jira/browse/HBASE-21235
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> If there are lots of procedure wal files the recover lease will be a time 
> consuming operation. Renaming is a possible way to confirm that some files 
> are already closed when restarting so we do not need to call recoverLease on 
> them any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630078#comment-16630078
 ] 

Duo Zhang commented on HBASE-21233:
---

Introduce a 'persist' flag in Procedure to indicate whether we need to persist 
the procedure after exection. Default to true. The implementation can set it to 
false to avoid persisting. And before execution, we will reset it to true every 
time.

Notice that, this can reduce the number procedure wals but can not fix all the 
problem for retrying. For TRSP it is OK, as the holdLock is true, but for other 
procedures where holdLock is false, after the execution we have to release the 
lock and persist the 'release lock' action, so skipPersistence here can reduce 
one procedure wal record, but the 'release lock' one is still needed.

This is just a framework for skip persistence. Will open other issues to fix 
the retrying for specific procedures.

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21233:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-27 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21233:
--
Attachment: HBASE-21233.patch

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21243) Correct java-doc for the method RpcServer.getRemoteAddress()

2018-09-27 Thread Nihal Jain (JIRA)
Nihal Jain created HBASE-21243:
--

 Summary: Correct java-doc for the method 
RpcServer.getRemoteAddress()
 Key: HBASE-21243
 URL: https://issues.apache.org/jira/browse/HBASE-21243
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0, 3.0.0
Reporter: Nihal Jain


Correct the java-doc for the method {{RpcServer.getRemoteAddress()}}.
 Currently it look like as below:
{code:java}
  /**
   * @return Address of remote client if a request is ongoing, else null
   */
  public static Optional getRemoteAddress() {
return getCurrentCall().map(RpcCall::getRemoteAddress);
  }
{code}
Contrary to the doc the method will never return null. Rather it may return an 
empty Optional.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-09-27 Thread Reid Chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-20734:
--
  Resolution: Resolved
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20734.branch-1.001.patch, 
> HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, 
> HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, 
> HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, 
> HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, 
> HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, 
> HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, 
> HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, 
> HBASE-20734.master.011.patch, HBASE-20734.master.012.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-09-27 Thread Reid Chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-20734:
--
Release Note: Previously the recovered.edits directory was under the root 
directory. This JIRA moves the recovered.edits directory to be under the 
hbase.wal.dir if set. It also adds a check for any recovered.edits found under 
the root directory for backwards compatibility. This gives improvements when a 
faster media(like SSD) or more local FileSystem is used for the hbase.wal.dir 
than the root dir.  (was: Previously the recovered.edits directory was under 
the root directory. This JIRA moves the recovered.edits directory to be under 
the hbase.wal.dir if set. It also adds a check for any recovered.edits found 
under the root directory for backwards compatibility. This gives improvements 
when a faster or more local FileSystem is used for the hbase.wal.dir than the 
root dir.)

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20734.branch-1.001.patch, 
> HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, 
> HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, 
> HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, 
> HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, 
> HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, 
> HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, 
> HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, 
> HBASE-20734.master.011.patch, HBASE-20734.master.012.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21218) TableStateNotFoundException thrown from RSGroupAdminEndpoint#postCreateTable when creating table

2018-09-27 Thread Nihal Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal Jain resolved HBASE-21218.

Resolution: Duplicate

> TableStateNotFoundException thrown from RSGroupAdminEndpoint#postCreateTable 
> when creating table
> 
>
> Key: HBASE-21218
> URL: https://issues.apache.org/jira/browse/HBASE-21218
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-21218.master.001.patch
>
>
> Similar to HBASE-19509, I found the following logs in master log when 
> creating table
> {code}
> 2018-09-21 15:14:47,476 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=296,queue=26,port=16000] 
> master.TableStateManager: Unable to get table t3 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> t3
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:344)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:412)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:471)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:494)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:335)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:332)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:332)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1929)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at 
> org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1911)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:628)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
>   
> In fact, we only need to change the information of rsgroup without moving 
> region.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21228:
---
   Resolution: Fixed
Fix Version/s: 2.0.3
   2.1.1
   1.4.8
   3.0.0
   Status: Resolved  (was: Patch Available)

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-27 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629953#comment-16629953
 ] 

Allan Yang commented on HBASE-21228:


Pushed to branch-1+, thanks all for reviewing!

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch, 
> HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-09-27 Thread Zach York (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-20734:
--
Release Note: Previously the recovered.edits directory was under the root 
directory. This JIRA moves the recovered.edits directory to be under the 
hbase.wal.dir if set. It also adds a check for any recovered.edits found under 
the root directory for backwards compatibility. This gives improvements when a 
faster or more local FileSystem is used for the hbase.wal.dir than the root 
dir.  (was: Previously the recovered.edits directory was under the root 
directory. This JIRA moves the recovered.edits directory to be under the 
hbase.wal.dir if set. It also adds a check for any recovered.edits found under 
the root directory for backwards compatibility.)

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20734.branch-1.001.patch, 
> HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, 
> HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, 
> HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, 
> HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, 
> HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, 
> HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, 
> HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, 
> HBASE-20734.master.011.patch, HBASE-20734.master.012.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)