[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign
[ https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631375#comment-16631375 ] stack commented on HBASE-21213: --- An opinion please [~allan163]. In my testing doing stuff like purging all MasterProcWALs testing hbck2 fixup, I've manufactured a few odd cases where I want to be able to bypass a procedure even though it has children: i.e. in PE, I'd add something like this: if (!force) { if (procedure.hasChildren()) { LOG.info("{} has children, skipping bypass", procedure); return false; } } else { LOG.info("Bypassing child check!"); } Cases are a MoveProcedure that has a lock on a region but its UnassignProcedure is no longer in the record pushed out because millions of procedures have passed through the system since. In the meantime, this stuck MoveProcedure is making it so master proc wals are starting to backup. I can figure why it got stuck and fix the issue later but it illustrates a case where in hbck2 I will want to bypass a procedure even though it has children unfinished supposedly. It is dangerous bypassing such a procedure since it could make for hanging procedures... but in some cases I need to be able to do it. I should probably add a special flag... Just wondering if you have any thing you'd add here. Thanks. > [hbck2] bypass leaves behind state in RegionStates when assign/unassign > --- > > Key: HBASE-21213 > URL: https://issues.apache.org/jira/browse/HBASE-21213 > Project: HBase > Issue Type: Bug > Components: amv2, hbck2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.1.1 > > Attachments: HBASE-21213.branch-2.1.001.patch, > HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, > HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, > HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, > HBASE-21213.branch-2.1.007.patch > > > This is a follow-on from HBASE-21083 which added the 'bypass' functionality. > On bypass, there is more state to be cleared if we are allow new Procedures > to be scheduled. > For example, here is a bypass: > {code} > 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: > pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, > bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, > region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null > to finish it > 2018-09-20 05:45:44,022 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, > state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, > region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec > {code} > ... but then when I try to assign the bypassed region later, I get this: > {code} > 2018-09-20 05:46:31,435 WARN > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is > already another procedure running on this region this=pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, > location=ve1233.halxg.cloudera.com,22101,1537397961664 > 2018-09-20 05:46:31,510 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, > state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via > AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: > There is already another procedure running on this region this=pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > exec-time=473msec > {code} > ... which is a long-winded way of saying the Unassign Procedure still exists > still in RegionStateNodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs
[ https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21249: -- Fix Version/s: 2.0.3 2.1.1 2.2.0 3.0.0 > Add jitter for ProcedureUtil.getBackoffTimeMs > - > > Key: HBASE-21249 > URL: https://issues.apache.org/jira/browse/HBASE-21249 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
[ https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631367#comment-16631367 ] Allan Yang commented on HBASE-21237: {quote} Wait a minute. Have you tried hadoop qa for branch-2.1? The procedure based replication peer modification need the executeProcedures call... {quote} Sorry, I thought they are the same, so share we revert this on branch-2.1? > Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS > -- > > Key: HBASE-21237 > URL: https://issues.apache.org/jira/browse/HBASE-21237 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21237.branch-2.0.001.patch > > > As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use > CompatRemoteProcedureResolver instead of ExecuteProceduresRemoteCall to > dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall > will group all the open/close operations in one call and execute them > sequentially on the target RS. If one operation fails, all the operation will > be marked as failure. Actually, some of the operations(like open region) is > already executing in the open region handler thread. But master thinks these > operations fails and reassign the regions to another RS. So when the previous > RS report to the master that the region is online, master will kill the RS > since it already assign the region to another RS. > For branch-2.2+, HBASE-21217 will fix this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631354#comment-16631354 ] Hadoop QA commented on HBASE-21247: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 16s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 2 new + 25 unchanged - 0 fixed = 27 total (was 25) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 10s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 36s{color} | {color:red} The patch causes 15 errors with Hadoop v3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}129m 0s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}169m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-21247 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941595/21247.v3.txt | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 86cb8e4 | | maven | version: Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 13:51:28+) | | Default Java | 1.8.0_172 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14520/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | whitespace |
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631345#comment-16631345 ] stack commented on HBASE-20952: --- bq. If you get a chance to look at the new doc that I Ted and I worked on, that'd be greatly appreciated: Thanks for the ping [~elserj]. I read the doc. IMO, it gives little to no inkling as to how hbase will be changed. I left some comments. Thanks. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20716) Unsafe access cleanup
[ https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631332#comment-16631332 ] Anoop Sam John commented on HBASE-20716: Sorry for the delay... Its coming good.. I like the way you abstracted out the things.. Now we will have Bytes and BBUtils class as front end and within which we deal via Unsafe or pure java costly way. The concern is some times the static classes within say BBUtils can get loaded which might cause the static fields to get initialized. You can see in class UnsafeConverter , there is static state of type Unsafe which can cause a class loading try for Unsafe class. And in some env this is not available! We have seen such cases been reported in mailing list/jira. See how we have handled the Best comparator thing in Bytes class.. The loading is based on a FQCN and by Class.forName. Also for BBUtils, all the access where via UnsafeAvailChecker (dont know things changed in later but when these classes were introduced it was this way). Within UnsafeAvailChecker , there is no direct ref to Unsafe class at all. We need a similar way now. Am I making it clear to you? > Unsafe access cleanup > - > > Key: HBASE-20716 > URL: https://issues.apache.org/jira/browse/HBASE-20716 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack >Assignee: Sahil Aggarwal >Priority: Critical > Labels: beginner > Attachments: HBASE-20716.master.001.patch, > HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, > HBASE-20716.master.004.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png > > > We have two means of getting at unsafe; UnsafeAccess and then internal to the > Bytes class. They are effectively doing the same thing. We should have one > avenue to Unsafe only. > Many of our paths to Unsafe via UnsafeAccess traverse flags to check if > access is available, if it is aligned and the order in which words are > written on the machine. Each check costs -- especially if done millions of > times a second -- and on occasion adds bloat in hot code paths. The unsafe > access inside Bytes checks on startup what the machine is capable off and > then does a static assign of the appropriate class-to-use from there on out. > UnsafeAccess does not do this running the checks everytime. Would be good to > have the Bytes behavior pervasive. > The benefit of one access to Unsafe only is plain. The benefits we gain > removing checks will be harder to measure though should be plain when you > disassemble a hot-path; in a (very) rare case, the saved byte codes could be > the difference between inlining or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section
[ https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631331#comment-16631331 ] Sahil Aggarwal commented on HBASE-21186: Done. > Document hbase.regionserver.executor.openregion.threads in MTTR section > --- > > Key: HBASE-21186 > URL: https://issues.apache.org/jira/browse/HBASE-21186 > Project: HBase > Issue Type: Improvement > Components: documentation >Reporter: Sahil Aggarwal >Assignee: Sahil Aggarwal >Priority: Minor > Attachments: HBASE-21186.master.001.patch, > HBASE-21186.master.002.patch > > > hbase.regionserver.executor.openregion.threads helps in improving MTTR by > increasing assign rpc processing rate at RS from HMaster but is not > documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section
[ https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Aggarwal updated HBASE-21186: --- Attachment: HBASE-21186.master.002.patch > Document hbase.regionserver.executor.openregion.threads in MTTR section > --- > > Key: HBASE-21186 > URL: https://issues.apache.org/jira/browse/HBASE-21186 > Project: HBase > Issue Type: Improvement > Components: documentation >Reporter: Sahil Aggarwal >Assignee: Sahil Aggarwal >Priority: Minor > Attachments: HBASE-21186.master.001.patch, > HBASE-21186.master.002.patch > > > hbase.regionserver.executor.openregion.threads helps in improving MTTR by > increasing assign rpc processing rate at RS from HMaster but is not > documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign
[ https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21213: -- Attachment: HBASE-21213.branch-2.1.007.patch > [hbck2] bypass leaves behind state in RegionStates when assign/unassign > --- > > Key: HBASE-21213 > URL: https://issues.apache.org/jira/browse/HBASE-21213 > Project: HBase > Issue Type: Bug > Components: amv2, hbck2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.1.1 > > Attachments: HBASE-21213.branch-2.1.001.patch, > HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, > HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, > HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, > HBASE-21213.branch-2.1.007.patch > > > This is a follow-on from HBASE-21083 which added the 'bypass' functionality. > On bypass, there is more state to be cleared if we are allow new Procedures > to be scheduled. > For example, here is a bypass: > {code} > 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: > pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, > bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, > region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null > to finish it > 2018-09-20 05:45:44,022 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, > state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, > region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec > {code} > ... but then when I try to assign the bypassed region later, I get this: > {code} > 2018-09-20 05:46:31,435 WARN > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is > already another procedure running on this region this=pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, > location=ve1233.halxg.cloudera.com,22101,1537397961664 > 2018-09-20 05:46:31,510 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, > state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via > AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: > There is already another procedure running on this region this=pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > exec-time=473msec > {code} > ... which is a long-winded way of saying the Unassign Procedure still exists > still in RegionStateNodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements
[ https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21242: -- Attachment: HBASE-21242.branch-2.1.001.patch > [amv2] Miscellaneous minor log and assign procedure create improvements > --- > > Key: HBASE-21242 > URL: https://issues.apache.org/jira/browse/HBASE-21242 > Project: HBase > Issue Type: Bug > Components: amv2, Operability >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21242.branch-2.1.001.patch, > HBASE-21242.branch-2.1.001.patch > > > Some minor fixups: > {code} > For RIT Duration, do better than print ms/seconds. Remove redundant UI > column dedicated to duration when we log it in the status field too. > Make bypass log at INFO level -- when DEBUG we can miss important > fixup detail like why we failed. > Make it so on complete of subprocedure, we note count of outstanding > siblings so we have a clue how much further the parent has to go before > it is done (Helpful when hundreds of servers doing SCP). > Have the SCP run the AP preflight check before creating an AP; saves > creation of hundreds of thousands of APs during fixup of this big cluster > of mine. > Don't log tablename three times when reporting remote call failed. > If lock is held already, note who has it. Also log after we get lock > or if we have to wait rather than log on entrance though we may > later have to wait (or we may have just picked up the lock). > {code} > Posting patch in a sec but let me try it on cluster too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section
[ https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631326#comment-16631326 ] Ted Yu commented on HBASE-21186: bq. where single region is holding I think you meant single region server. Please check the grammar of your addition. Thanks > Document hbase.regionserver.executor.openregion.threads in MTTR section > --- > > Key: HBASE-21186 > URL: https://issues.apache.org/jira/browse/HBASE-21186 > Project: HBase > Issue Type: Improvement > Components: documentation >Reporter: Sahil Aggarwal >Assignee: Sahil Aggarwal >Priority: Minor > Attachments: HBASE-21186.master.001.patch > > > hbase.regionserver.executor.openregion.threads helps in improving MTTR by > increasing assign rpc processing rate at RS from HMaster but is not > documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20716) Unsafe access cleanup
[ https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631312#comment-16631312 ] Sahil Aggarwal commented on HBASE-20716: [~stack] Can you please have a look at the patch? > Unsafe access cleanup > - > > Key: HBASE-20716 > URL: https://issues.apache.org/jira/browse/HBASE-20716 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack >Assignee: Sahil Aggarwal >Priority: Critical > Labels: beginner > Attachments: HBASE-20716.master.001.patch, > HBASE-20716.master.002.patch, HBASE-20716.master.003.patch, > HBASE-20716.master.004.patch, Screen Shot 2018-06-26 at 11.37.49 AM.png > > > We have two means of getting at unsafe; UnsafeAccess and then internal to the > Bytes class. They are effectively doing the same thing. We should have one > avenue to Unsafe only. > Many of our paths to Unsafe via UnsafeAccess traverse flags to check if > access is available, if it is aligned and the order in which words are > written on the machine. Each check costs -- especially if done millions of > times a second -- and on occasion adds bloat in hot code paths. The unsafe > access inside Bytes checks on startup what the machine is capable off and > then does a static assign of the appropriate class-to-use from there on out. > UnsafeAccess does not do this running the checks everytime. Would be good to > have the Bytes behavior pervasive. > The benefit of one access to Unsafe only is plain. The benefits we gain > removing checks will be harder to measure though should be plain when you > disassemble a hot-path; in a (very) rare case, the saved byte codes could be > the difference between inlining or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs
Duo Zhang created HBASE-21249: - Summary: Add jitter for ProcedureUtil.getBackoffTimeMs Key: HBASE-21249 URL: https://issues.apache.org/jira/browse/HBASE-21249 Project: HBase Issue Type: Sub-task Components: proc-v2 Reporter: Duo Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21244) Skip persistence when retrying for assignment related procedures
[ https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21244: -- Attachment: HBASE-21244.patch > Skip persistence when retrying for assignment related procedures > > > Key: HBASE-21244 > URL: https://issues.apache.org/jira/browse/HBASE-21244 > Project: HBase > Issue Type: Sub-task > Components: amv2, Performance, proc-v2 >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21244.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21244) Skip persistence when retrying for assignment related procedures
[ https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21244: -- Assignee: Duo Zhang Status: Patch Available (was: Open) > Skip persistence when retrying for assignment related procedures > > > Key: HBASE-21244 > URL: https://issues.apache.org/jira/browse/HBASE-21244 > Project: HBase > Issue Type: Sub-task > Components: amv2, Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21244.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21233: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to branch-2.0+. > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21233.patch, HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
[ https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631297#comment-16631297 ] Duo Zhang commented on HBASE-21237: --- So here I think we could just remove the different implementation of RemoteDispatcher... Just have one implementation, for open and close, we call openRegion and closeRegion, and for other remote procedures, we call executeProcedures. > Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS > -- > > Key: HBASE-21237 > URL: https://issues.apache.org/jira/browse/HBASE-21237 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21237.branch-2.0.001.patch > > > As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use > CompatRemoteProcedureResolver instead of ExecuteProceduresRemoteCall to > dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall > will group all the open/close operations in one call and execute them > sequentially on the target RS. If one operation fails, all the operation will > be marked as failure. Actually, some of the operations(like open region) is > already executing in the open region handler thread. But master thinks these > operations fails and reassign the regions to another RS. So when the previous > RS report to the master that the region is online, master will kill the RS > since it already assign the region to another RS. > For branch-2.2+, HBASE-21217 will fix this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
[ https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631296#comment-16631296 ] Duo Zhang commented on HBASE-21237: --- Wait a minute. Have you tried hadoop qa for branch-2.1? The procedure based replication peer modification need the executeProcedures call... > Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS > -- > > Key: HBASE-21237 > URL: https://issues.apache.org/jira/browse/HBASE-21237 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21237.branch-2.0.001.patch > > > As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use > CompatRemoteProcedureResolver instead of ExecuteProceduresRemoteCall to > dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall > will group all the open/close operations in one call and execute them > sequentially on the target RS. If one operation fails, all the operation will > be marked as failure. Actually, some of the operations(like open region) is > already executing in the open region handler thread. But master thinks these > operations fails and reassign the regions to another RS. So when the previous > RS report to the master that the region is online, master will kill the RS > since it already assign the region to another RS. > For branch-2.2+, HBASE-21217 will fix this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21233: -- Fix Version/s: 2.0.3 > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21233.patch, HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631285#comment-16631285 ] Duo Zhang commented on HBASE-21233: --- Let me commit. Thanks [~allan163] for reviewing. > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21233.patch, HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631284#comment-16631284 ] Hadoop QA commented on HBASE-21233: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 59s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21233 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941531/HBASE-21233.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 6f4c7bad6b75 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 86cb8e48ad | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14519/testReport/ | | Max. process+thread count | 279 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14519/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Allow the procedure
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631251#comment-16631251 ] Hudson commented on HBASE-21228: SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1167 (See [https://builds.apache.org/job/HBase-1.2-IT/1167/]) HBASE-21228 Memory leak since AbstractFSWAL caches Thread object and (apurtell: rev ff29edc856c29fb6691f9c1798c344733383c7ee) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.
[ https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631250#comment-16631250 ] Andrew Purtell commented on HBASE-21207: I applied the v1 patch to master, launched the newly built version in-tree. Created a table. Browsed to the table details page. Visually the result looks like the attached PNG files. Clicking on the table header re-sorts the view. master patch does not apply to branch-2. The problem is table.jsp. It's a nontrivial reject. [~archana.katiyar] if you could provide a patch for branch-2 that would be most appreciated. Going to need to commit this there before proceeding down the line to branch-1 etc. > Add client side sorting functionality in master web UI for table and region > server details. > --- > > Key: HBASE-21207 > URL: https://issues.apache.org/jira/browse/HBASE-21207 > Project: HBase > Issue Type: Improvement > Components: master, monitoring, UI, Usability >Reporter: Archana Katiyar >Assignee: Archana Katiyar >Priority: Minor > Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, > 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, > HBASE-21207-branch-1.v1.patch, HBASE-21207.patch, HBASE-21207.patch, > HBASE-21207.v1.patch, edc5c812-b928-11e8-87e2-ce6396629bbc.png > > > In Master UI, we can see region server details like requests per seconds and > number of regions etc. Similarly, for tables also we can see online regions , > offline regions. > It will help ops people in determining hot spotting if we can provide sort > functionality in the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure
Duo Zhang created HBASE-21248: - Summary: Implement exponential backoff when retrying for ModifyPeerProcedure Key: HBASE-21248 URL: https://issues.apache.org/jira/browse/HBASE-21248 Project: HBase Issue Type: Bug Components: proc-v2, Replication Reporter: Duo Zhang Assignee: Duo Zhang Fix For: 3.0.0, 2.2.0, 2.1.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631249#comment-16631249 ] Hudson commented on HBASE-21228: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #485 (See [https://builds.apache.org/job/HBase-1.3-IT/485/]) HBASE-21228 Memory leak since AbstractFSWAL caches Thread object and (apurtell: rev 9321f7d86cb1eee4508d96f261189b90b85e714c) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
[ https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630724#comment-16630724 ] Toshihiro Suzuki edited comment on HBASE-21200 at 9/28/18 1:24 AM: --- It seems like a similar issue to HBASE-15871 occurs in case of the following steps. 1) Create a reversed store scanner. 2) Put a lot of cells that have sequenceID grater than the readPt of the reverse scanner into memstore. 3) Call the reverse scanner.next() and in this status, a lot of cells in memstore have sequenceID greater than the readPt of the reverse scanner because of 2). This condition causes that seekToPreviousRow() repeatedly search cells that are already searched. It's described in the following image in HBASE-15871: https://issues.apache.org/jira/secure/attachment/12805207/memstore_backwardSeek%28%29.PNG 4) Flush a memstore, and wait until 3) process finished, to update store files in the same HStore after flushing. I'm attaching a patch to reproduce this issue. was (Author: brfrn169): It seems like a similar issue to HBASE-15871 occurs in case of the following steps. 1) Create a reversed store scanner. 2) Put a lot of cells that have sequenceID grater than the readPt of the reverse scanner into memstore. 3) Call the reverse scanner.next() and in this status, a lot of cells in memstore have sequenceID greater than the readPt of the reverse scanner because of 2). This condition causes that seekToPreviousRow() repeatedly search cells that are already searched. 4) Flush a memstore, and wait until 3) process finished, to update store files in the same HStore after flushing. I'm attaching a patch to reproduce this issue. > Memstore flush doesn't finish because of seekToPreviousRow() in memstore > scanner. > - > > Key: HBASE-21200 > URL: https://issues.apache.org/jira/browse/HBASE-21200 > Project: HBase > Issue Type: Bug > Components: Scanners >Reporter: dongjin2193.jeon >Priority: Major > Attachments: HBASE-21200-UT.patch, RegionServerJstack.log > > > The issue of delaying memstore flush still occurs after backport hbase-15871. > Reverse scan takes a long time to seek previous row in the memstore full of > deleted cells. > > jstack : > "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a > waiting on condition [0x7fa3b9b6b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xa465fe60> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)* > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106) > at > org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:748) > > "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 > os_prio=0
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631240#comment-16631240 ] Andrew Purtell commented on HBASE-21228: This applies to branch-1.2, branch-1.3, and branch-1.4 and would seem to be an issue in the respective releasing code lines. I have updated fix versions and pushed the change to those branches too. Ran WAL unit tests beforehand. > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
[ https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631237#comment-16631237 ] Toshihiro Suzuki commented on HBASE-21200: -- No error occurs in the test. Just the reversed scan is very slow. I will need to improve the test, but what I wanted to show in the test is that the similar issue to HBASE-15871 is reproduced by the above steps. And that causes very slow reversed scan. > Memstore flush doesn't finish because of seekToPreviousRow() in memstore > scanner. > - > > Key: HBASE-21200 > URL: https://issues.apache.org/jira/browse/HBASE-21200 > Project: HBase > Issue Type: Bug > Components: Scanners >Reporter: dongjin2193.jeon >Priority: Major > Attachments: HBASE-21200-UT.patch, RegionServerJstack.log > > > The issue of delaying memstore flush still occurs after backport hbase-15871. > Reverse scan takes a long time to seek previous row in the memstore full of > deleted cells. > > jstack : > "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a > waiting on condition [0x7fa3b9b6b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xa465fe60> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)* > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106) > at > org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:748) > > "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 > os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)* > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at >
[jira] [Commented] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"
[ https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631231#comment-16631231 ] Hudson commented on HBASE-20766: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #484 (See [https://builds.apache.org/job/HBase-1.3-IT/484/]) HBASE-20766 Typo in VerifyReplication error. (apurtell: rev 10e486882a79e722d44cb42259fbcad1a08a6342) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Verify Replication Tool Has Typo "remove cluster" > - > > Key: HBASE-20766 > URL: https://issues.apache.org/jira/browse/HBASE-20766 > Project: HBase > Issue Type: Bug >Reporter: Clay B. >Assignee: Ferran Fernandez Garrido >Priority: Trivial > Labels: beginner > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8 > > Attachments: HBASE-20766.master.001.patch > > > The verify replication tool has a trivial typo "remove cluster" instead of > "remote cluster": > https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-21220: --- Status: Open (was: Patch Available) > Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and > ROWPREFIX_DELIMITED) to branch-1 > -- > > Key: HBASE-21220 > URL: https://issues.apache.org/jira/browse/HBASE-21220 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-21220-branch-1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-21220: --- Status: Patch Available (was: Open) > Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and > ROWPREFIX_DELIMITED) to branch-1 > -- > > Key: HBASE-21220 > URL: https://issues.apache.org/jira/browse/HBASE-21220 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-21220-branch-1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface
[ https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631228#comment-16631228 ] Duo Zhang commented on HBASE-21246: --- Why we have a getSize in WALIdentity? And do we want to introduce some structures in the WALIdentity? If not, why not just use a String? You introduced a method which creates a WALIdentity from a String, then why not just use String as the identity? > Introduce WALIdentity interface > --- > > Key: HBASE-21246 > URL: https://issues.apache.org/jira/browse/HBASE-21246 > Project: HBase > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 21246.HBASE-20952.001.patch > > > We are introducing WALIdentity interface so that the WAL representation can > be decoupled from distributed filesystem. > The interface provides getName method whose return value can represent > filename in distributed filesystem environment or, the name of the stream > when the WAL is backed by log stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-21228: --- Fix Version/s: 1.2.8 1.3.3 1.5.0 > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"
[ https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-20766: --- Fix Version/s: 1.4.8 1.3.3 > Verify Replication Tool Has Typo "remove cluster" > - > > Key: HBASE-20766 > URL: https://issues.apache.org/jira/browse/HBASE-20766 > Project: HBase > Issue Type: Bug >Reporter: Clay B. >Assignee: Ferran Fernandez Garrido >Priority: Trivial > Labels: beginner > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8 > > Attachments: HBASE-20766.master.001.patch > > > The verify replication tool has a trivial typo "remove cluster" instead of > "remote cluster": > https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631216#comment-16631216 ] Ted Yu commented on HBASE-21247: >From https://builds.apache.org/job/PreCommit-HBASE-Build/14515/console : {code} 22:32:24 [Thu Sep 27 22:32:24 UTC 2018 DEBUG]: jira_http_fetch: https://issues.apache.org/jira/browse/HBASE-21247 returned 4xx status code. Maybe incorrect username/password? 22:32:24 [Thu Sep 27 22:32:24 UTC 2018 DEBUG]: jira_locate_patch: not a JIRA. {code} > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631100#comment-16631100 ] Josh Elser commented on HBASE-21247: You resubmit it? Maybe a problem with that specific host? Otherwise, looks ok to me. Thanks for the extra tests. +1 on qa > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631058#comment-16631058 ] Ted Yu commented on HBASE-21247: >From https://builds.apache.org/job/PreCommit-HBASE-Build/14513/console : {code} 20:10:21 ERROR: Unsure how to process HBASE-21247. {code} > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631035#comment-16631035 ] Ted Yu commented on HBASE-21247: In patch v3, there are two new subtests. * when only WALFactory.WAL_PROVIDER_CLASS is specified, verify that both wal provider and meta wal provider are of this class * when only WALFactory.META_WAL_PROVIDER_CLASS is specified, verify that wal provider is default and that meta wal provider is of this class > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21247: --- Attachment: 21247.v3.txt > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631000#comment-16631000 ] Josh Elser commented on HBASE-21247: {code:java} +assertEquals(IOTestProvider.class.getName(), fshLogProvider.getName());{code} nit: just test the {{Class}} objects instead of the String representation of the names. Do we have a test coverage for the scenarios described in the release notes? # ... If not specified, we fall back to the WAL provider enum specification. # ... If not specified, we fall back to using the value for hbase.wal.provider.class . # Fallback to enum specific if neither meta provider class nor provider class are set. > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630991#comment-16630991 ] Josh Elser commented on HBASE-21247: {quote}The user doesn't need to specify two WAL classes in the normal case. {quote} Thanks. This, along with the release notes, helps. > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21247: --- Attachment: 21247.v2.txt > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt, 21247.v2.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21247: --- Release Note: Two config parameters, hbase.wal.provider.class and hbase.wal.meta_provider.class are introduced. hbase.wal.provider.class, when specified, configures the WAL provider class through its class name. If not specified, we fall back to the WAL provider enum specification. hbase.wal.meta_provider.class, when specified, configures the WAL provider class for hbase:meta through its class name. If not specified, we fall back to using the value for hbase.wal.provider.class . These new configs, when specified, override the enum WAL provider config. was: Two config parameters, hbase.wal.provider.class and hbase.wal.meta_provider.class are introduced. hbase.wal.provider.class, when specified, configures the WAL provider class through its class name. If not specified, we fall back to the WAL provider enum specification. hbase.wal.meta_provider.class, when specified, configures the WAL provider class for hbase:meta through its class name. If not specified, we fall back to using the value for hbase.wal.provider.class . > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21247: --- Release Note: Two config parameters, hbase.wal.provider.class and hbase.wal.meta_provider.class are introduced. hbase.wal.provider.class, when specified, configures the WAL provider class through its class name. If not specified, we fall back to the WAL provider enum specification. hbase.wal.meta_provider.class, when specified, configures the WAL provider class for hbase:meta through its class name. If not specified, we fall back to using the value for hbase.wal.provider.class . > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630925#comment-16630925 ] Ted Yu commented on HBASE-21247: w.r.t. first comment above, {code} boolean metaWALProvPresent = conf.get(META_WAL_PROVIDER_CLASS) != null; provider = createProvider(getProviderClass( metaWALProvPresent ? META_WAL_PROVIDER_CLASS : WAL_PROVIDER_CLASS, {code} when META_WAL_PROVIDER_CLASS is not specified, we fall back to the value for WAL_PROVIDER_CLASS (if present). The user doesn't need to specify two WAL classes in the normal case. > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630906#comment-16630906 ] Josh Elser commented on HBASE-21247: {code:java} - provider = createProvider(getProviderClass(META_WAL_PROVIDER, - conf.get(WAL_PROVIDER, DEFAULT_WAL_PROVIDER))); + boolean metaWALProvPresent = conf.get(META_WAL_PROVIDER_CLASS) != null; + provider = createProvider(getProviderClass( + metaWALProvPresent ? META_WAL_PROVIDER_CLASS : WAL_PROVIDER_CLASS, + META_WAL_PROVIDER, conf.get(WAL_PROVIDER, DEFAULT_WAL_PROVIDER)));{code} I thought HBASE-20856 got rid of this duplicity around setting a WALProvider and a Meta WALProvider? {code:java} public static final String WAL_PROVIDER = "hbase.wal.provider"; static final String DEFAULT_WAL_PROVIDER = Providers.defaultProvider.name(); + public static final String WAL_PROVIDER_CLASS = "hbase.wal.provider.class"; + static final Class DEFAULT_WAL_PROVIDER_CLASS = AsyncFSWALProvider.class;{code} What happens if I provide both the old configuration properties and the new ones? Can you summarize how this will work in release notes? We should have some test additions to cover that too. How about adding a test to make sure you can create a WALProvider (i.e. as a static-inner class of the the test class) and HBase uses it as the WALProvider? I think it should be easy in TestWALFactory – I recall some prior art there. > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20951) Ratis LogService backed WALs
[ https://issues.apache.org/jira/browse/HBASE-20951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630901#comment-16630901 ] Josh Elser commented on HBASE-20951: FYI HBASE-20952 was spun out into a standalone issue (from a child issue) to better support making small changes to the WAL API. Linked it off of this issue. > Ratis LogService backed WALs > > > Key: HBASE-20951 > URL: https://issues.apache.org/jira/browse/HBASE-20951 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > > Umbrella issue for the Ratis+WAL work: > Design doc: > [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit] > The (over-simplified) goal is to re-think the current WAL APIs we have now, > ensure that they are de-coupled from the notion of being backed by HDFS, swap > the current implementations over to the new API, and then wire up the Ratis > LogService to the new WAL API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630893#comment-16630893 ] Josh Elser commented on HBASE-21247: [~sergey.soldatov] made the suggestion offline today that this one should just go into Master. Generally helpful outside of the efforts of HBASE-20951 > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-21247: --- Fix Version/s: 3.0.0 > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
[ https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630858#comment-16630858 ] Ted Yu commented on HBASE-21200: I ran the new test on master branch which passed. What error is expected from the new test ? > Memstore flush doesn't finish because of seekToPreviousRow() in memstore > scanner. > - > > Key: HBASE-21200 > URL: https://issues.apache.org/jira/browse/HBASE-21200 > Project: HBase > Issue Type: Bug > Components: Scanners >Reporter: dongjin2193.jeon >Priority: Major > Attachments: HBASE-21200-UT.patch, RegionServerJstack.log > > > The issue of delaying memstore flush still occurs after backport hbase-15871. > Reverse scan takes a long time to seek previous row in the memstore full of > deleted cells. > > jstack : > "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a > waiting on condition [0x7fa3b9b6b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xa465fe60> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)* > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106) > at > org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:748) > > "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 > os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)* > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at >
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21247: --- Status: Patch Available (was: Open) > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
Ted Yu created HBASE-21247: -- Summary: Allow WAL Provider to be specified by configuration without explicit enum in Providers Key: HBASE-21247 URL: https://issues.apache.org/jira/browse/HBASE-21247 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 21247.v1.txt Currently all the WAL Providers acceptable to hbase are specified in Providers enum of WALFactory. This restricts the ability for additional WAL Providers to be supplied - by class name. This issue introduces additional config which allows the specification of new WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21247: --- Attachment: 21247.v1.txt > Allow WAL Provider to be specified by configuration without explicit enum in > Providers > -- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 21247.v1.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for additional WAL Providers to be supplied - by > class name. > This issue introduces additional config which allows the specification of new > WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS
[ https://issues.apache.org/jira/browse/HBASE-21237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630789#comment-16630789 ] Hudson commented on HBASE-21237: Results for branch branch-2.0 [build #872 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/872//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS > -- > > Key: HBASE-21237 > URL: https://issues.apache.org/jira/browse/HBASE-21237 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21237.branch-2.0.001.patch > > > As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use > CompatRemoteProcedureResolver instead of ExecuteProceduresRemoteCall to > dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall > will group all the open/close operations in one call and execute them > sequentially on the target RS. If one operation fails, all the operation will > be marked as failure. Actually, some of the operations(like open region) is > already executing in the open region handler thread. But master thinks these > operations fails and reassign the regions to another RS. So when the previous > RS report to the master that the region is online, master will kill the RS > since it already assign the region to another RS. > For branch-2.2+, HBASE-21217 will fix this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630782#comment-16630782 ] Hudson commented on HBASE-21228: Results for branch branch-2 [build #1310 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1310//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface
[ https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21246: --- Status: Patch Available (was: Open) > Introduce WALIdentity interface > --- > > Key: HBASE-21246 > URL: https://issues.apache.org/jira/browse/HBASE-21246 > Project: HBase > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 21246.HBASE-20952.001.patch > > > We are introducing WALIdentity interface so that the WAL representation can > be decoupled from distributed filesystem. > The interface provides getName method whose return value can represent > filename in distributed filesystem environment or, the name of the stream > when the WAL is backed by log stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
[ https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630736#comment-16630736 ] Toshihiro Suzuki commented on HBASE-21200: -- It looks like this issue is reproduced in both master and branch-1. > Memstore flush doesn't finish because of seekToPreviousRow() in memstore > scanner. > - > > Key: HBASE-21200 > URL: https://issues.apache.org/jira/browse/HBASE-21200 > Project: HBase > Issue Type: Bug > Components: Scanners >Reporter: dongjin2193.jeon >Priority: Major > Attachments: HBASE-21200-UT.patch, RegionServerJstack.log > > > The issue of delaying memstore flush still occurs after backport hbase-15871. > Reverse scan takes a long time to seek previous row in the memstore full of > deleted cells. > > jstack : > "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a > waiting on condition [0x7fa3b9b6b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xa465fe60> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)* > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106) > at > org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:748) > > "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 > os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)* > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at >
[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface
[ https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-21246: --- Attachment: 21246.HBASE-20952.001.patch > Introduce WALIdentity interface > --- > > Key: HBASE-21246 > URL: https://issues.apache.org/jira/browse/HBASE-21246 > Project: HBase > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 21246.HBASE-20952.001.patch > > > We are introducing WALIdentity interface so that the WAL representation can > be decoupled from distributed filesystem. > The interface provides getName method whose return value can represent > filename in distributed filesystem environment or, the name of the stream > when the WAL is backed by log stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21246) Introduce WALIdentity interface
Ted Yu created HBASE-21246: -- Summary: Introduce WALIdentity interface Key: HBASE-21246 URL: https://issues.apache.org/jira/browse/HBASE-21246 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu We are introducing WALIdentity interface so that the WAL representation can be decoupled from distributed filesystem. The interface provides getName method whose return value can represent filename in distributed filesystem environment or, the name of the stream when the WAL is backed by log stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-20952: --- Issue Type: Improvement (was: Sub-task) Parent: (was: HBASE-20951) > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
[ https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki updated HBASE-21200: - Attachment: HBASE-21200-UT.patch > Memstore flush doesn't finish because of seekToPreviousRow() in memstore > scanner. > - > > Key: HBASE-21200 > URL: https://issues.apache.org/jira/browse/HBASE-21200 > Project: HBase > Issue Type: Bug > Components: Scanners >Reporter: dongjin2193.jeon >Priority: Major > Attachments: HBASE-21200-UT.patch, RegionServerJstack.log > > > The issue of delaying memstore flush still occurs after backport hbase-15871. > Reverse scan takes a long time to seek previous row in the memstore full of > deleted cells. > > jstack : > "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a > waiting on condition [0x7fa3b9b6b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xa465fe60> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)* > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106) > at > org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:748) > > "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 > os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:136) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.next(StoreScanner.java:629)* > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at >
[jira] [Commented] (HBASE-21200) Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner.
[ https://issues.apache.org/jira/browse/HBASE-21200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630724#comment-16630724 ] Toshihiro Suzuki commented on HBASE-21200: -- It seems like a similar issue to HBASE-15871 occurs in case of the following steps. 1) Create a reversed store scanner. 2) Put a lot of cells that have sequenceID grater than the readPt of the reverse scanner into memstore. 3) Call the reverse scanner.next() and in this status, a lot of cells in memstore have sequenceID greater than the readPt of the reverse scanner because of 2). This condition causes that seekToPreviousRow() repeatedly search cells that are already searched. 4) Flush a memstore, and wait until 3) process finished, to update store files in the same HStore after flushing. I'm attaching a patch to reproduce this issue. > Memstore flush doesn't finish because of seekToPreviousRow() in memstore > scanner. > - > > Key: HBASE-21200 > URL: https://issues.apache.org/jira/browse/HBASE-21200 > Project: HBase > Issue Type: Bug > Components: Scanners >Reporter: dongjin2193.jeon >Priority: Major > Attachments: RegionServerJstack.log > > > The issue of delaying memstore flush still occurs after backport hbase-15871. > Reverse scan takes a long time to seek previous row in the memstore full of > deleted cells. > > jstack : > "MemStoreFlusher.0" #114 prio=5 os_prio=0 tid=0x7fa3d0729000 nid=0x486a > waiting on condition [0x7fa3b9b6b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xa465fe60> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hbase.regionserver.*StoreScanner.updateReaders(StoreScanner.java:695)* > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1127) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1106) > at > org.apache.hadoop.hbase.regionserver.HStore.access$600(HStore.java:130) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2455) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2519) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:748) > > "RpcServer.FifoWFPBQ.default.handler=27,queue=0,port=16020" #65 daemon prio=5 > os_prio=0 tid=0x7fa3e628 nid=0x4801 runnable [0x7fa3bd29a000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:780) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:826) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:818) > - locked <0xb45aa5b8> (a > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner) > at > org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:1000) > - locked <0xb45aa5b8> (a >
[jira] [Commented] (HBASE-21234) Archive folder not getting cleaned due to SnapshotHFileCleaner error
[ https://issues.apache.org/jira/browse/HBASE-21234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630649#comment-16630649 ] Josh Elser commented on HBASE-21234: [~abhilater] is this actually Apache HBase 1.1.2 or some HDP release that is based on Apache HBase 1.1.2? > Archive folder not getting cleaned due to SnapshotHFileCleaner error > > > Key: HBASE-21234 > URL: https://issues.apache.org/jira/browse/HBASE-21234 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.2 >Reporter: Abhishek Gupta >Priority: Critical > Labels: cleanup, snapshot, snapshots > > Getting following exception during ChoreService runs in HBase Master logs. As > a result we are accumulating a lot of data in archive folder as archive is > not getting reclaimed. > {code:java} > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos at > org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest.internalGetFieldAccessorTable(SnapshotProtos.java:1190 > {code} > Complete stack-trace > {code:java} > 2018-09-26 10:15:06,188 ERROR [master01,16000,1536315941769_ChoreService_3] > snapshot.SnapshotHFileCleaner: Exception while checking if files were valid, > keeping them just in case. java.io.IOException: ExecutionException at > org.apache.hadoop.hbase.snapshot.SnapshotManifestV2.loadRegionManifests(SnapshotManifestV2.java:161) > at > org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:364) > at > org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:130) > at > org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:128) > at > org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:357) > at > org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:340) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:88) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:303) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:194) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:63) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:287) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:211) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:234) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:206) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:130) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos at >
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630580#comment-16630580 ] Hudson commented on HBASE-21228: Results for branch master [build #514 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/514/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/514//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/514//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/514//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17992) The snapShot TimeoutException causes the cleanerChore thread to fail to complete the archive correctly
[ https://issues.apache.org/jira/browse/HBASE-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-17992: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Resolving as a duplicate of HBASE-16464 > The snapShot TimeoutException causes the cleanerChore thread to fail to > complete the archive correctly > -- > > Key: HBASE-17992 > URL: https://issues.apache.org/jira/browse/HBASE-17992 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.10, 1.3.0 >Reporter: Bo Cui >Priority: Major > Attachments: hbase-17992-0.98.patch, hbase-17992-1.3.patch, > hbase-17992-master.patch, hbase-17992.patch > > > The problem is that when the snapshot occurs TimeoutException or other > Exceptions, there is no correct delete /hbase/.hbase-snapshot/tmp, which > causes the cleanerChore to fail to complete the archive correctly. > Modifying the configuration parameter (hbase.snapshot.master.timeout.millis = > 60) only reduces the probability of the problem occurring. > So the solution to the problem is: multi-Threaded exceptions or > TimeoutExceptions, the Main-thread must wait until all the tasks are finished > or canceled, the Main-thread can be cleared > /hbase/.hbase-snapshot/tmp/snapshotName.Otherwise the task is likely to write > /hbase/.hbase-snapshot/tmp/snapshotName/region - mainfest > The problem exists in disabledTableSnapshot and enabledTableSnapshot, because > I'm currently using the disabledTableSnapshot, so I provide the patch of > disabledTableSnapshot -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"
[ https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630356#comment-16630356 ] Hudson commented on HBASE-20766: Results for branch master [build #513 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Verify Replication Tool Has Typo "remove cluster" > - > > Key: HBASE-20766 > URL: https://issues.apache.org/jira/browse/HBASE-20766 > Project: HBase > Issue Type: Bug >Reporter: Clay B. >Assignee: Ferran Fernandez Garrido >Priority: Trivial > Labels: beginner > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-20766.master.001.patch > > > The verify replication tool has a trivial typo "remove cluster" instead of > "remote cluster": > https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217
[ https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630354#comment-16630354 ] Hudson commented on HBASE-21227: Results for branch master [build #513 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Implement exponential retrying backoff for Assign/UnassignRegionHandler > introduced in HBASE-21217 > - > > Key: HBASE-21227 > URL: https://issues.apache.org/jira/browse/HBASE-21227 > Project: HBase > Issue Type: Sub-task > Components: amv2, regionserver >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21227-v1.patch, HBASE-21227.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric
[ https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630353#comment-16630353 ] Hudson commented on HBASE-21212: Results for branch master [build #513 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Wrong flush time when update flush metric > - > > Key: HBASE-21212 > URL: https://issues.apache.org/jira/browse/HBASE-21212 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Minor > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21212.branch-2.0.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21232) Show table state in Tables view on Master home page
[ https://issues.apache.org/jira/browse/HBASE-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630355#comment-16630355 ] Hudson commented on HBASE-21232: Results for branch master [build #513 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/513/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/513//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Show table state in Tables view on Master home page > --- > > Key: HBASE-21232 > URL: https://issues.apache.org/jira/browse/HBASE-21232 > Project: HBase > Issue Type: Bug > Components: Operability, UI >Affects Versions: 2.1.0 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21232.branch-2.1.001.patch, table.pdf > > > Add a column to the Tables panel on the Master home page. Useful when trying > to figure if table is enabled/disable/disabling/enabling... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630344#comment-16630344 ] Hudson commented on HBASE-21228: Results for branch branch-2.1 [build #384 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/384//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21245) Add exponential backoff when retrying for sync replication related procedures
Duo Zhang created HBASE-21245: - Summary: Add exponential backoff when retrying for sync replication related procedures Key: HBASE-21245 URL: https://issues.apache.org/jira/browse/HBASE-21245 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21244) Skip persistence when retrying for assignment related procedures
Duo Zhang created HBASE-21244: - Summary: Skip persistence when retrying for assignment related procedures Key: HBASE-21244 URL: https://issues.apache.org/jira/browse/HBASE-21244 Project: HBase Issue Type: Sub-task Components: amv2, Performance, proc-v2 Reporter: Duo Zhang Fix For: 3.0.0, 2.2.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21233: -- Attachment: HBASE-21233.patch > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21233.patch, HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630271#comment-16630271 ] Hudson commented on HBASE-21228: Results for branch branch-2.0 [build #871 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/871//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630242#comment-16630242 ] Allan Yang commented on HBASE-21233: +1 for the patch. > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric
[ https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630181#comment-16630181 ] Hudson commented on HBASE-21212: Results for branch branch-1.4 [build #480 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/480//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Wrong flush time when update flush metric > - > > Key: HBASE-21212 > URL: https://issues.apache.org/jira/browse/HBASE-21212 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Minor > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21212.branch-2.0.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric
[ https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630177#comment-16630177 ] Hudson commented on HBASE-21212: Results for branch branch-1 [build #478 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > Wrong flush time when update flush metric > - > > Key: HBASE-21212 > URL: https://issues.apache.org/jira/browse/HBASE-21212 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Minor > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21212.branch-2.0.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20766) Verify Replication Tool Has Typo "remove cluster"
[ https://issues.apache.org/jira/browse/HBASE-20766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630178#comment-16630178 ] Hudson commented on HBASE-20766: Results for branch branch-1 [build #478 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/478//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > Verify Replication Tool Has Typo "remove cluster" > - > > Key: HBASE-20766 > URL: https://issues.apache.org/jira/browse/HBASE-20766 > Project: HBase > Issue Type: Bug >Reporter: Clay B. >Assignee: Ferran Fernandez Garrido >Priority: Trivial > Labels: beginner > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-20766.master.001.patch > > > The verify replication tool has a trivial typo "remove cluster" instead of > "remote cluster": > https://github.com/apache/hbase/blob/a6eeb26cc0b4d0af3fff50b5b931b6847df1f9d2/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java#L355 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart
[ https://issues.apache.org/jira/browse/HBASE-20727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-20727: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Persist FlushedSequenceId to speed up WAL split after cluster restart > - > > Key: HBASE-20727 > URL: https://issues.apache.org/jira/browse/HBASE-20727 > Project: HBase > Issue Type: New Feature >Affects Versions: 2.0.0 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20727.002.patch, HBASE-20727.003.patch, > HBASE-20727.004.patch, HBASE-20727.005.patch, HBASE-20727.patch > > > We use flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion in > ServerManager to record the latest flushed seqids of regions and stores. So > during log split, we can use seqids stored in those maps to filter out the > edits which do not need to be replayed. But, those maps are not persisted. > After cluster restart or master restart, info of flushed seqids are all lost. > Here I offer a way to persist those info to HDFS, even if master restart, we > can still use those info to filter WAL edits and then to speed up replay. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21235) Rename the closed procedure wal files so that we do not need to call recoverLease when restarting
[ https://issues.apache.org/jira/browse/HBASE-21235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630120#comment-16630120 ] Duo Zhang commented on HBASE-21235: --- It is not easy as I expected... We have a ProcedureWALFile and the file name is stored in this class, and we will use this class to archive the old log files. So when renaming we also need to change the file name in this class and it may introduce races... Maybe a once for all solution is to just get rid of the current wal based procedure store. Just use a HRegion... Will be back later... > Rename the closed procedure wal files so that we do not need to call > recoverLease when restarting > - > > Key: HBASE-21235 > URL: https://issues.apache.org/jira/browse/HBASE-21235 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Priority: Major > > If there are lots of procedure wal files the recover lease will be a time > consuming operation. Renaming is a possible way to confirm that some files > are already closed when restarting so we do not need to call recoverLease on > them any more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630078#comment-16630078 ] Duo Zhang commented on HBASE-21233: --- Introduce a 'persist' flag in Procedure to indicate whether we need to persist the procedure after exection. Default to true. The implementation can set it to false to avoid persisting. And before execution, we will reset it to true every time. Notice that, this can reduce the number procedure wals but can not fix all the problem for retrying. For TRSP it is OK, as the holdLock is true, but for other procedures where holdLock is false, after the execution we have to release the lock and persist the 'release lock' action, so skipPersistence here can reduce one procedure wal record, but the 'release lock' one is still needed. This is just a framework for skip persistence. Will open other issues to fix the retrying for specific procedures. > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21233: -- Assignee: Duo Zhang Status: Patch Available (was: Open) > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution
[ https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21233: -- Attachment: HBASE-21233.patch > Allow the procedure implementation to skip persistence of the state after a > execution > - > > Key: HBASE-21233 > URL: https://issues.apache.org/jira/browse/HBASE-21233 > Project: HBase > Issue Type: Sub-task > Components: Performance, proc-v2 >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21233.patch > > > Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we > do not need to persist the procedure state every time, as the retry timeout > is not a critical stuff. It is OK that we loss this information and start > from 0 when after restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21243) Correct java-doc for the method RpcServer.getRemoteAddress()
Nihal Jain created HBASE-21243: -- Summary: Correct java-doc for the method RpcServer.getRemoteAddress() Key: HBASE-21243 URL: https://issues.apache.org/jira/browse/HBASE-21243 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 3.0.0 Reporter: Nihal Jain Correct the java-doc for the method {{RpcServer.getRemoteAddress()}}. Currently it look like as below: {code:java} /** * @return Address of remote client if a request is ongoing, else null */ public static Optional getRemoteAddress() { return getCurrentCall().map(RpcCall::getRemoteAddress); } {code} Contrary to the doc the method will never return null. Rather it may return an empty Optional. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan updated HBASE-20734: -- Resolution: Resolved Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, > HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, > HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, > HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, > HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, > HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, > HBASE-20734.master.011.patch, HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan updated HBASE-20734: -- Release Note: Previously the recovered.edits directory was under the root directory. This JIRA moves the recovered.edits directory to be under the hbase.wal.dir if set. It also adds a check for any recovered.edits found under the root directory for backwards compatibility. This gives improvements when a faster media(like SSD) or more local FileSystem is used for the hbase.wal.dir than the root dir. (was: Previously the recovered.edits directory was under the root directory. This JIRA moves the recovered.edits directory to be under the hbase.wal.dir if set. It also adds a check for any recovered.edits found under the root directory for backwards compatibility. This gives improvements when a faster or more local FileSystem is used for the hbase.wal.dir than the root dir.) > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, > HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, > HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, > HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, > HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, > HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, > HBASE-20734.master.011.patch, HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21218) TableStateNotFoundException thrown from RSGroupAdminEndpoint#postCreateTable when creating table
[ https://issues.apache.org/jira/browse/HBASE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain resolved HBASE-21218. Resolution: Duplicate > TableStateNotFoundException thrown from RSGroupAdminEndpoint#postCreateTable > when creating table > > > Key: HBASE-21218 > URL: https://issues.apache.org/jira/browse/HBASE-21218 > Project: HBase > Issue Type: Bug > Components: rsgroup >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng >Priority: Major > Attachments: HBASE-21218.master.001.patch > > > Similar to HBASE-19509, I found the following logs in master log when > creating table > {code} > 2018-09-21 15:14:47,476 ERROR > [RpcServer.default.FPBQ.Fifo.handler=296,queue=26,port=16000] > master.TableStateManager: Unable to get table t3 state > org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: > t3 > at > org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215) > at > org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:344) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:412) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:471) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:494) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:335) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:332) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:332) > at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1929) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131) > at > org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1911) > at > org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:628) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {code} > > In fact, we only need to change the information of rsgroup without moving > region. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21228: --- Resolution: Fixed Fix Version/s: 2.0.3 2.1.1 1.4.8 3.0.0 Status: Resolved (was: Patch Available) > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
[ https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629953#comment-16629953 ] Allan Yang commented on HBASE-21228: Pushed to branch-1+, thanks all for reviewing! > Memory leak since AbstractFSWAL caches Thread object and never clean later > -- > > Key: HBASE-21228 > URL: https://issues.apache.org/jira/browse/HBASE-21228 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.2, 1.4.7 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21228.branch-2.0.001.patch, > HBASE-21228.branch-2.0.002.patch, HBASE-21228.branch-2.0.003.patch > > > In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and > SyncFutures. > {code:java} > /** >* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse > SyncFutures. >* >* TODO: Reuse FSWALEntry's rather than create them anew each time as we do > SyncFutures here. >* >* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers > rather than have them get >* them from this Map? >*/ > private final ConcurrentMap syncFuturesByHandler; > {code} > A colleague of mine find a memory leak case caused by this map. > Every thread who writes WAL will be cached in this map, And no one will clean > the threads in the map even after the thread is dead. > In one of our customer's cluster, we noticed that even though there is no > requests, the heap of the RS is almost full and CMS GC was triggered every > second. > We dumped the heap and then found out there were more than 30 thousands > threads with Terminated state. which are all cached in this map above. > Everything referenced in these threads were leaked. Most of the threads are: > 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL > 2. hconnection-0x1f838e31-shared--pool, which are used to write index short > circuit(Phoenix), and WAL will be write and sync in these threads. > 3. Index writer thread(Phoenix), which referenced by > RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been > referenced by PostOpenDeployTasksThread. > We should turn this map into a thread local one, let JVM GC the terminated > thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20734: -- Release Note: Previously the recovered.edits directory was under the root directory. This JIRA moves the recovered.edits directory to be under the hbase.wal.dir if set. It also adds a check for any recovered.edits found under the root directory for backwards compatibility. This gives improvements when a faster or more local FileSystem is used for the hbase.wal.dir than the root dir. (was: Previously the recovered.edits directory was under the root directory. This JIRA moves the recovered.edits directory to be under the hbase.wal.dir if set. It also adds a check for any recovered.edits found under the root directory for backwards compatibility.) > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, > HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, > HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, > HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, > HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, > HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, > HBASE-20734.master.011.patch, HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)