[jira] [Created] (HBASE-22454) refactor WALSplitter
Jingyun Tian created HBASE-22454: Summary: refactor WALSplitter Key: HBASE-22454 URL: https://issues.apache.org/jira/browse/HBASE-22454 Project: HBase Issue Type: Improvement Reporter: Jingyun Tian Assignee: Jingyun Tian WALSplitter is more than 2000 lines right now. It's hard to read and understand. There are multiple inner classes which are not simple and many static methods. My plan is to separate these classes and move these static methods to a new util class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22234) Fix flaky TestHbck#testRecoverSplitAfterMetaUpdated
Jingyun Tian created HBASE-22234: Summary: Fix flaky TestHbck#testRecoverSplitAfterMetaUpdated Key: HBASE-22234 URL: https://issues.apache.org/jira/browse/HBASE-22234 Project: HBase Issue Type: Bug Reporter: Jingyun Tian -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22061) SplitTableRegionProcedure should hold the lock of its daughter regions
Jingyun Tian created HBASE-22061: Summary: SplitTableRegionProcedure should hold the lock of its daughter regions Key: HBASE-22061 URL: https://issues.apache.org/jira/browse/HBASE-22061 Project: HBase Issue Type: Bug Reporter: Jingyun Tian Currently SplitTableRegionProcedure only hold the region of parent region. But during processing of this procedure, after the daughter regions are updated to meta, other procedures can grab the lock of them, which is the situation we don't want to see. So I think SplitTableRegionProcedure should hold the lock of parent region and its daughter regions like MergeTableRegionsProcedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22049) getReopenStatus() didn't skip counting split parent region
Jingyun Tian created HBASE-22049: Summary: getReopenStatus() didn't skip counting split parent region Key: HBASE-22049 URL: https://issues.apache.org/jira/browse/HBASE-22049 Project: HBase Issue Type: Bug Reporter: Jingyun Tian Assignee: Jingyun Tian After we modify some attributes of table, hbaseAdmin will getAlterStatus to check if all region's attributes updated. It will skip opened region and split region as the following code shows. {code} for (RegionState regionState: states) { if (!regionState.isOpened() && !regionState.isSplit()) { ritCount++; } } {code} But since now the split procedure is to unassign the split parent region, thus the state is CLOSED, and the check will hang there until timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21966) Fix region holes, overlaps, and other region related errors
Jingyun Tian created HBASE-21966: Summary: Fix region holes, overlaps, and other region related errors Key: HBASE-21966 URL: https://issues.apache.org/jira/browse/HBASE-21966 Project: HBase Issue Type: Sub-task Reporter: Jingyun Tian Assignee: Jingyun Tian -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21965) Fix failed split and merge transactions that have failed to roll back
[ https://issues.apache.org/jira/browse/HBASE-21965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingyun Tian reopened HBASE-21965: -- > Fix failed split and merge transactions that have failed to roll back > - > > Key: HBASE-21965 > URL: https://issues.apache.org/jira/browse/HBASE-21965 > Project: HBase > Issue Type: Sub-task > Reporter: Jingyun Tian > Assignee: Jingyun Tian >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21965) Fix failed split and merge transactions that have failed to roll back
[ https://issues.apache.org/jira/browse/HBASE-21965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingyun Tian resolved HBASE-21965. -- Resolution: Invalid > Fix failed split and merge transactions that have failed to roll back > - > > Key: HBASE-21965 > URL: https://issues.apache.org/jira/browse/HBASE-21965 > Project: HBase > Issue Type: Task > Reporter: Jingyun Tian > Assignee: Jingyun Tian >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21965) Fix failed split and merge transactions that have failed to roll back
Jingyun Tian created HBASE-21965: Summary: Fix failed split and merge transactions that have failed to roll back Key: HBASE-21965 URL: https://issues.apache.org/jira/browse/HBASE-21965 Project: HBase Issue Type: Task Reporter: Jingyun Tian Assignee: Jingyun Tian -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21934) SplitWALProcedure get stuck during ITBLL
Jingyun Tian created HBASE-21934: Summary: SplitWALProcedure get stuck during ITBLL Key: HBASE-21934 URL: https://issues.apache.org/jira/browse/HBASE-21934 Project: HBase Issue Type: Sub-task Reporter: Jingyun Tian Assignee: Jingyun Tian I encounter the problem that when master assign a splitWALRemoteProcedure to a region server. The log of this region server says it failed to recover the lease of this file. Then this region server is killed by chaosMonkey. As the result, this procedure is not timeout and hang there forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC
Congratulations Peter! Best Regards, Jingyun Tian On Tue, Jan 22, 2019 at 10:15 AM Allan Yang wrote: > Congratulations Peter! > Best Regards > Allan Yang > > > Pankaj kr 于2019年1月22日周二 上午9:49写道: > > > > > Congratulations Peter...!!! > > > > Regards, > > Pankaj > > > > -- > > Pankaj Kumar > > M: +91-9535197664(India Contact Number) > > E: pankaj...@huawei.com<mailto:pankaj...@huawei.com> > > 2012实验室-班加罗尔研究所IT BU分部 > > 2012 Laboratories-IT BU Branch Dept.HTIPL > > From:Duo Zhang > > To:HBase Dev List ;hbase-user < > u...@hbase.apache.org > > > > > Date:2019-01-22 07:06:43 > > Subject:[ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC > > > > On behalf of the Apache HBase PMC I am pleased to announce that Peter > > Somogyi > > has accepted our invitation to become a PMC member on the Apache HBase > > project. > > We appreciate Peter stepping up to take more responsibility in the HBase > > project. > > > > Please join me in welcoming Peter to the HBase PMC! > > >
[jira] [Created] (HBASE-21730) Update hbase-book with the procedure based WAL splitting
Jingyun Tian created HBASE-21730: Summary: Update hbase-book with the procedure based WAL splitting Key: HBASE-21730 URL: https://issues.apache.org/jira/browse/HBASE-21730 Project: HBase Issue Type: Sub-task Reporter: Jingyun Tian -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21729) Extract ProcedureCoordinatorRpcs and ProcedureMemberRpcs from CoordinatedStateManager
Jingyun Tian created HBASE-21729: Summary: Extract ProcedureCoordinatorRpcs and ProcedureMemberRpcs from CoordinatedStateManager Key: HBASE-21729 URL: https://issues.apache.org/jira/browse/HBASE-21729 Project: HBase Issue Type: Sub-task Reporter: Jingyun Tian Assignee: Jingyun Tian If procedureV2 based WAL splitting is enabled, CoordinatedStateManager will not be initialized. Then ProcedureCoordinatorRpcs and ProcedureMemberRpcs will make backup not work. Let me extract these two method to another class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21647) Add status track for splitting WAL tasks
Jingyun Tian created HBASE-21647: Summary: Add status track for splitting WAL tasks Key: HBASE-21647 URL: https://issues.apache.org/jira/browse/HBASE-21647 Project: HBase Issue Type: Sub-task Reporter: Jingyun Tian add status track to help operator check the status of splitting WAL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21588) Procedure v2 wal splitting implementation
Jingyun Tian created HBASE-21588: Summary: Procedure v2 wal splitting implementation Key: HBASE-21588 URL: https://issues.apache.org/jira/browse/HBASE-21588 Project: HBase Issue Type: Sub-task Reporter: Jingyun Tian Assignee: Jingyun Tian create a sub task to submit the implementation of procedure v2 wal splitting -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server
Jingyun Tian created HBASE-21565: Summary: Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server Key: HBASE-21565 URL: https://issues.apache.org/jira/browse/HBASE-21565 Project: HBase Issue Type: Bug Reporter: Jingyun Tian Assignee: Jingyun Tian There are 2 kinds of SCP for a same server will be scheduled during cluster restart, one is ZK session timeout, the other one is new server report in will cause the stale one do fail over. The only barrier for these 2 kinds of SCP is check if the server is in the dead server list. {code} if (this.deadservers.isDeadServer(serverName)) { LOG.warn("Expiration called on {} but crash processing already in progress", serverName); return false; } {code} But the problem is when master finish initialization, it will delete all stale servers from dead server list. Thus when the SCP for ZK session timeout come in, the barrier is already removed. Here is the logs that how this problem occur. {code} 2018-12-07,11:42:37,589 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 2018-12-07,11:42:58,007 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false {code} Now we can see two SCP are scheduled for the same server. But the first procedure is finished after the second SCP starts. {code} 2018-12-07,11:43:08,038 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, state=SUCCESS, hasLock=false; ServerCrashProcedure server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false in 30.5340sec {code} Thus it will leads the problem that regions will be assigned twice. {code} 2018-12-07,12:16:33,039 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise {code} And here we can see the server is removed from dead server list before the second SCP starts. {code} 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3 {code} Thus we should not delete dead server from dead server list immediately. Patch to fix this problem will be upload later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] Allan Yang joins the Apache HBase PMC
Congratulations! Allan! On Thu, Nov 29, 2018 at 9:42 AM OpenInx wrote: > Congratulations, Allan! > > On Thu, Nov 29, 2018 at 6:58 AM Zach York > wrote: > > > Congratulations and welcome Allan! > > > > > > On Wed, Nov 28, 2018 at 10:21 AM Esteban Gutierrez > > wrote: > > > > > Congratulations, Allan! > > > > > > -- > > > Cloudera, Inc. > > > > > > > > > > > > On Wed, Nov 28, 2018 at 10:11 AM Yu Li wrote: > > > > > > > On behalf of the Apache HBase PMC I am pleased to announce that Allan > > > Yang > > > > has accepted our invitation to become a PMC member on the Apache > HBase > > > > project. We appreciate Allan stepping up to take more responsibility > in > > > the > > > > HBase project. > > > > > > > > Please join me in welcoming Allan to the HBase PMC! > > > > > > > > Best Regards, > > > > Yu > > > > > > > > > >
Re: [ANNOUNCE] New HBase committer Jingyun Tian
Thank you all! Sincerely, Jingyun Tian On Wed, Nov 14, 2018 at 8:59 AM stack wrote: > Welcome jingyun. > S > > On Mon, Nov 12, 2018, 11:54 PM 张铎(Duo Zhang) > > On behalf of the Apache HBase PMC, I am pleased to announce that Jingyun > > Tian has accepted the PMC's invitation to become a committer on the > > project. We appreciate all of Jingyun's generous contributions thus far > and > > look forward to his continued involvement. > > > > Congratulations and welcome, Jingyun! > > >
[jira] [Created] (HBASE-21437) Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT
Jingyun Tian created HBASE-21437: Summary: Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT Key: HBASE-21437 URL: https://issues.apache.org/jira/browse/HBASE-21437 Project: HBase Issue Type: Bug Reporter: Jingyun Tian Assignee: Jingyun Tian {code} 2018-11-05,18:25:52,735 WARN org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker terminating UNNATURALLY null java.lang.IllegalArgumentException: NOT RUNNABLE! pid=3, state=WAITING_TIMEOUT:REGION_STATE_TRANSITION_CLOSE, hasLock=true, bypass=true; TransitRegionStateProcedure table=test_fail over, region=1bb029ba4ec03b92061be5c4329d2096, UNASSIGN at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1620) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1384) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1948) 2018-11-05,18:25:52,736 TRACE org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker terminated. {code} Since when we bypassed a WAITING_TIMEOUT procedure and resubmit it, its state is still WAITING_TIMEOUT, then when executor run this procedure, it will throw exception and cause worker terminated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster
Jingyun Tian created HBASE-21413: Summary: Empty meta log doesn't get split when restart whole cluster Key: HBASE-21413 URL: https://issues.apache.org/jira/browse/HBASE-21413 Project: HBase Issue Type: Improvement Reporter: Jingyun Tian Assignee: Jingyun Tian Attachments: Screenshot from 2018-10-31 18-11-02.png, Screenshot from 2018-10-31 18-11-11.png After I restart whole cluster, there is a splitting directory still exists on hdfs. Then I found there is only an empty meta wal file in it. I'll dig into this later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21410) A helper page that help find all problematic regions and procedures
Jingyun Tian created HBASE-21410: Summary: A helper page that help find all problematic regions and procedures Key: HBASE-21410 URL: https://issues.apache.org/jira/browse/HBASE-21410 Project: HBase Issue Type: Improvement Reporter: Jingyun Tian Assignee: Jingyun Tian Attachments: Screenshot from 2018-10-30 19-06-21.png, Screenshot from 2018-10-30 19-06-42.png This page is mainly focus on finding the regions stuck in some state that cannot be assigned. My proposal of the page is as follows: !Screenshot from 2018-10-30 19-06-21.png! >From this page we can see all regions in RIT queue and their related >procedures. If we can determine that these regions' state are abnormal, we can >click the link 'Procedures as TXT' to get a full list of procedure IDs to >bypass them. Then click 'Regions as TXT' to get a full list of encoded region >names to assign. !Screenshot from 2018-10-30 19-06-42.png! Some region names are covered by the navigator bar, I'll fix it later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21407) Resolve NPE in backup Master UI
Jingyun Tian created HBASE-21407: Summary: Resolve NPE in backup Master UI Key: HBASE-21407 URL: https://issues.apache.org/jira/browse/HBASE-21407 Project: HBase Issue Type: Bug Components: UI Affects Versions: 2.1.0, 3.0.0, 2.2.0 Reporter: Jingyun Tian Assignee: Jingyun Tian Since some pages of our UI is using jsp instead of jamon, the fix of HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21393) Add an API ScheduleSCP() to HBCK2
Jingyun Tian created HBASE-21393: Summary: Add an API ScheduleSCP() to HBCK2 Key: HBASE-21393 URL: https://issues.apache.org/jira/browse/HBASE-21393 Project: HBase Issue Type: Bug Components: hbase-operator-tools, hbck2 Reporter: Jingyun Tian Add the API of ScheduleSCP() to hbase-operator-tools so that operators can schedule ServerCrashProcedures of specified regionservers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21378) checkHBCKSupport blocks assigning hbase:meta or hbase:namespace when master is not initialized
Jingyun Tian created HBASE-21378: Summary: checkHBCKSupport blocks assigning hbase:meta or hbase:namespace when master is not initialized Key: HBASE-21378 URL: https://issues.apache.org/jira/browse/HBASE-21378 Project: HBase Issue Type: Bug Components: hbase-operator-tools Reporter: Jingyun Tian Assignee: Jingyun Tian When I encounter the scenario that hbase:namespace is not online. {code} 2018-10-24,14:38:16,910 WARN org.apache.hadoop.hbase.master.HMaster: hbase:namespace,,1529933109115.7e0801c8232b2dc15face54532056076. is NOT online; state={7e0801c8232b2dc15face54532056076 state=OPEN, ts=1540363033384, server=c4-hadoop-tst-st30.bj,29100,1540348649479}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined. {code} Then I tried to assign it manually, but it throws PleaseHoldException. {code} Wed Oct 24 15:26:52 CST 2018, RpcRetryingCaller{globalStartTime=1540365754487, pause=200, maxAttempts=16}, org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3064) at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:934) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:144) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3133) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3125) at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterMetrics(HBaseAdmin.java:2161) at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:98) at org.apache.hbase.HBCK2.run(HBCK2.java:364) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hbase.HBCK2.main(HBCK2.java:447) Caused by: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3064) at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:934) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:361) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:349) at org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:101) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) {code} Then I check the code and found it is because of checkHBCKSupport(), I assign hbase:namespace successfully by skipping this check. Thus I think the tool need an option to skip this check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21335) Change the default wait time of HBCK2 tool
Jingyun Tian created HBASE-21335: Summary: Change the default wait time of HBCK2 tool Key: HBASE-21335 URL: https://issues.apache.org/jira/browse/HBASE-21335 Project: HBase Issue Type: Bug Reporter: Jingyun Tian Currently default wait time is 0 and I add a condition check before that wait time should more than 0. Thus the default wait time should be set to a number that more than 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21322) Add a scheduleServerCrashProcedure() API to HbckService
Jingyun Tian created HBASE-21322: Summary: Add a scheduleServerCrashProcedure() API to HbckService Key: HBASE-21322 URL: https://issues.apache.org/jira/browse/HBASE-21322 Project: HBase Issue Type: Task Reporter: Jingyun Tian Assignee: Jingyun Tian According to my test, if one RS is down, then all procedure logs are deleted, it will lead to that no ServerCrashProcedure is scheduled. And restarting master cannot help. Thus we need to schedule a ServerCrashProcedure manually to solve the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21291) Bypass doesn't work for state-machine procedures
Jingyun Tian created HBASE-21291: Summary: Bypass doesn't work for state-machine procedures Key: HBASE-21291 URL: https://issues.apache.org/jira/browse/HBASE-21291 Project: HBase Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Jingyun Tian {code} if (!procedure.isFailed()) { if (subprocs != null) { if (subprocs.length == 1 && subprocs[0] == procedure) { // Procedure returned itself. Quick-shortcut for a state machine-like procedure; // i.e. we go around this loop again rather than go back out on the scheduler queue. subprocs = null; reExecute = true; LOG.trace("Short-circuit to next step on pid={}", procedure.getProcId()); } else { // Yield the current procedure, and make the subprocedure runnable // subprocs may come back 'null'. subprocs = initializeChildren(procStack, procedure, subprocs); LOG.info("Initialized subprocedures=" + (subprocs == null? null: Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). collect(Collectors.toList()).toString())); } } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { LOG.debug("Added to timeoutExecutor {}", procedure); timeoutExecutor.add(procedure); } else if (!suspended) { // No subtask, so we are done procedure.setState(ProcedureState.SUCCESS); } } {code} Currently implementation of ProcedureExecutor will set the reExcecute to true for state machine like procedure. Then if this procedure is stuck at one certain state, it will loop forever. {code} IdLock.Entry lockEntry = procExecutionLock.getLockEntry(proc.getProcId()); try { executeProcedure(proc); } catch (AssertionError e) { LOG.info("ASSERT pid=" + proc.getProcId(), e); throw e; } finally { procExecutionLock.releaseLockEntry(lockEntry); {code} Since procedure will get the IdLock and release it after execution done, state machine procedure will never release IdLock until it is finished. Then bypassProcedure doesn't work because is will try to grab the IdLock at first. {code} IdLock.Entry lockEntry = procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21204) NPE when scan raw DELETE_FAMILY_VERSION and codec is not set
Jingyun Tian created HBASE-21204: Summary: NPE when scan raw DELETE_FAMILY_VERSION and codec is not set Key: HBASE-21204 URL: https://issues.apache.org/jira/browse/HBASE-21204 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 2.1.0, 2.2.0 Reporter: Jingyun Tian Assignee: Jingyun Tian Fix For: 2.2.0, 2.0.0, 2.1.0 There are 7 types of our Cell, Minimum((byte)0), Put((byte)4), Delete((byte)8), DeleteFamilyVersion((byte)10), DeleteColumn((byte)12), DeleteFamily((byte)14), Maximum((byte)255); But there are only 6 types of our CellType protobuf definition: enum CellType { MINIMUM = 0; PUT = 4; DELETE = 8; DELETE_FAMILY_VERSION = 10; DELETE_COLUMN = 12; DELETE_FAMILY = 14; // MAXIMUM is used when searching; you look from maximum on down. MAXIMUM = 255; } Thus if we scan raw data which is DELETE_FAMILY_VERSION,it will throw NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20986) Separate the config of block size when we do log splitting and write Hlog
Jingyun Tian created HBASE-20986: Summary: Separate the config of block size when we do log splitting and write Hlog Key: HBASE-20986 URL: https://issues.apache.org/jira/browse/HBASE-20986 Project: HBase Issue Type: Improvement Affects Versions: 2.1.0 Reporter: Jingyun Tian Fix For: 2.1.0 Since the block size of recovered edits and hlog are the same right now, if we set a large value to block size, name node may not able to assign enough space when we do log splitting. But set a large value to hlog block size can help reduce the number of region server asking for a new block. Thus I think separate the config of block size is necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20985) add two attributes when we do normalization
Jingyun Tian created HBASE-20985: Summary: add two attributes when we do normalization Key: HBASE-20985 URL: https://issues.apache.org/jira/browse/HBASE-20985 Project: HBase Issue Type: Improvement Affects Versions: 2.1.0 Reporter: Jingyun Tian Fix For: 2.1.0 Currently when we turn on normalization switch, it will help balance the whole table based on total region size / total region count. I add two attributes so that we can set total region count or average region size we want to achieve when normalization done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20855) PeerConfigTracker only support one listener will cause problem when there is a recovered replication queue
Jingyun Tian created HBASE-20855: Summary: PeerConfigTracker only support one listener will cause problem when there is a recovered replication queue Key: HBASE-20855 URL: https://issues.apache.org/jira/browse/HBASE-20855 Project: HBase Issue Type: Bug Affects Versions: 1.4.0, 1.3.0, 1.5.0 Reporter: Jingyun Tian Assignee: Jingyun Tian {code} public void init(Context context) throws IOException { this.ctx = context; if (this.ctx != null){ ReplicationPeer peer = this.ctx.getReplicationPeer(); if (peer != null){ peer.trackPeerConfigChanges(this); } else { LOG.warn("Not tracking replication peer config changes for Peer Id " + this.ctx.getPeerId() + " because there's no such peer"); } } } {code} As we know, replication source will set itself to the PeerConfigTracker in ReplicationPeer. When there is one or more recovered queue, each queue will generate a new replication source, But they share the same ReplicationPeer. Then when it calls setListener, the new generated one will cover the older one. Thus there will only has one ReplicationPeer that receive the peer config change notify. {code} public synchronized void setListener(ReplicationPeerConfigListener listener){ this.listener = listener; } {code} To solve this, PeerConfigTracker need to support multiple listener and listener should be removed when the replication endpoint terminated. I will upload a patch later with fix and UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20769) getSplits() has a out of bounds problem in TableSnapshotInputFormatImpl
Jingyun Tian created HBASE-20769: Summary: getSplits() has a out of bounds problem in TableSnapshotInputFormatImpl Key: HBASE-20769 URL: https://issues.apache.org/jira/browse/HBASE-20769 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.4.0, 1.3.0 Reporter: Jingyun Tian Assignee: Jingyun Tian Fix For: 2.0.0 When numSplits > 1, getSplits may create split that has start row smaller than user specified scan's start row or stop row larger than user specified scan's stop row. {code} byte[][] sp = sa.split(hri.getStartKey(), hri.getEndKey(), numSplits, true); for (int i = 0; i < sp.length - 1; i++) { if (PrivateCellUtil.overlappingKeys(scan.getStartRow(), scan.getStopRow(), sp[i], sp[i + 1])) { List hosts = calculateLocationsForInputSplit(conf, htd, hri, tableDir, localityEnabled); Scan boundedScan = new Scan(scan); boundedScan.setStartRow(sp[i]); boundedScan.setStopRow(sp[i + 1]); splits.add(new InputSplit(htd, hri, hosts, boundedScan, restoreDir)); } } {code} Since we split keys by the range of regions, when sp[i] < scan.getStartRow() or sp[i + 1] > scan.getStopRow(), the created bounded scan may contain range that over user defined scan. fix should be simple: {code} boundedScan.setStartRow( Bytes.compareTo(scan.getStartRow(), sp[i]) > 0 ? scan.getStartRow() : sp[i]); boundedScan.setStopRow( Bytes.compareTo(scan.getStopRow(), sp[i + 1]) < 0 ? scan.getStopRow() : sp[i + 1]); {code} I will also try to add UTs to help discover this problem -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20625) refactor some WALCellCodec related code
Jingyun Tian created HBASE-20625: Summary: refactor some WALCellCodec related code Key: HBASE-20625 URL: https://issues.apache.org/jira/browse/HBASE-20625 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 2.0.0 Reporter: Jingyun Tian Assignee: Jingyun Tian Fix For: 3.0.0 Currently I'm working on export HLog to another FileSystem, then I found the code of WALCellCodec and its related classes is not that clean. And there are several TODOs. Thus I tried to refactor the code based one these TODOs. e.g. {code} // TODO: it sucks that compression context is in WAL.Entry. It'd be nice if it was here. // Dictionary could be gotten by enum; initially, based on enum, context would create // an array of dictionaries. static class BaosAndCompressor extends ByteArrayOutputStream implements ByteStringCompressor { public ByteString toByteString() { // We need this copy to create the ByteString as the byte[] 'buf' is not immutable. We reuse // them. return ByteString.copyFrom(this.buf, 0, this.count); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20584) TestRestoreSnapshotFromClient failed
[ https://issues.apache.org/jira/browse/HBASE-20584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingyun Tian resolved HBASE-20584. -- Resolution: Duplicate > TestRestoreSnapshotFromClient failed > > > Key: HBASE-20584 > URL: https://issues.apache.org/jira/browse/HBASE-20584 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0 > Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: > org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=snaptb1-1526376636687 > table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } had > an error. Procedure snaptb1-1526376636687 { waiting=[] done=[] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:380) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1128) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via > Failed taking snapshot { ss=snaptb1-1526376636687 > table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } due > to exception:Regions moved during the snapshot '{ ss=snaptb1-1526376636687 > table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. > expected=6 > snapshotted=7.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: > Regions moved during the snapshot '{ ss=snaptb1-1526376636687 > table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. > expected=6 snapshotted=7. > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:311) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:369) > ... 6 more > Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: > Regions moved during the snapshot '{ ss=snaptb1-1526376636687 > table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. > expected=6 snapshotted=7. > at > org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:217) > at > org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:121) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:207) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:360) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:348) > at > org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:101) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3061) > at &g
[jira] [Created] (HBASE-20584) TestRestoreSnapshotFromClient failed
Jingyun Tian created HBASE-20584: Summary: TestRestoreSnapshotFromClient failed Key: HBASE-20584 URL: https://issues.apache.org/jira/browse/HBASE-20584 Project: HBase Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jingyun Tian Assignee: Jingyun Tian org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=snaptb1-1526376636687 table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } had an error. Procedure snaptb1-1526376636687 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:380) at org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1128) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=snaptb1-1526376636687 table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=snaptb1-1526376636687 table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. expected=6 snapshotted=7.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=snaptb1-1526376636687 table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. expected=6 snapshotted=7. at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:311) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:369) ... 6 more Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=snaptb1-1526376636687 table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. expected=6 snapshotted=7. at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:217) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:121) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:207) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:360) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:348) at org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:101) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3061) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3053) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2532) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2499) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2492) at org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testRestoreSnapshotAfterSplittingRegions(TestRestoreSnapshotFromClient.java:311) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method
[jira] [Created] (HBASE-20579) Improve copy snapshot manifest in ExportSnapshot
Jingyun Tian created HBASE-20579: Summary: Improve copy snapshot manifest in ExportSnapshot Key: HBASE-20579 URL: https://issues.apache.org/jira/browse/HBASE-20579 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: hbase-2.0.0-alpha-4 Reporter: Jingyun Tian Assignee: Jingyun Tian Fix For: hbase-2.0.0-alpha-4 ExportSnapshot need to copy snapshot manifest to destination cluster first, then setOwner and setPermission for those paths. But it's done with one thread, which lead to a long time to submit the job if your snapshot is big. I tried to make them processing in parallel, which can reduce the total time of submitting dramatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20194) Basic Replication WebUI - Master
Jingyun Tian created HBASE-20194: Summary: Basic Replication WebUI - Master Key: HBASE-20194 URL: https://issues.apache.org/jira/browse/HBASE-20194 Project: HBase Issue Type: Task Reporter: Jingyun Tian subtask of HBASE-15809. Implementation of Replication WebUI on Master webpage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20193) Basic Replication Web UI - Regionserver
Jingyun Tian created HBASE-20193: Summary: Basic Replication Web UI - Regionserver Key: HBASE-20193 URL: https://issues.apache.org/jira/browse/HBASE-20193 Project: HBase Issue Type: Task Reporter: Jingyun Tian Assignee: Jingyun Tian subtask of HBASE-15809 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19666) hadoop.hbase.regionserver.TestDefaultCompactSelection test failed
Jingyun Tian created HBASE-19666: Summary: hadoop.hbase.regionserver.TestDefaultCompactSelection test failed Key: HBASE-19666 URL: https://issues.apache.org/jira/browse/HBASE-19666 Project: HBase Issue Type: Bug Affects Versions: 2.0 Reporter: Jingyun Tian Priority: Critical hadoop.hbase.regionserver.TestDefaultCompactSelection [ERROR] Failures: [ERROR] TestDefaultCompactSelection.testCompactionRatio:74->TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201 expected:<[[4, 2, 1]]> but was:<[[]]> [ERROR] TestDefaultCompactSelection.testStuckStoreCompaction:145->TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201 expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]> -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19358) Improve the stability of splitting log when do fail over
Jingyun Tian created HBASE-19358: Summary: Improve the stability of splitting log when do fail over Key: HBASE-19358 URL: https://issues.apache.org/jira/browse/HBASE-19358 Project: HBase Issue Type: Improvement Components: MTTR Affects Versions: 0.98.24 Reporter: Jingyun Tian Now the way we split log is like the following figure: !previous-logic.png|thumbnail! The problem is the OutputSink will write the recovered edits during splitting log, which means it will create one WriterAndPath for each region. If the cluster is small and the number of regions per rs is large, it will create too many HDFS streams at the same time. Then it is prone to failure since each datanode need to handle too many streams. Thus I come up with a new way to split log. !attachment-name.jpg|thumbnail! We cached the recovered edits unless exceeds the memory limits we set or reach the end, then we have a thread pool to do the rest things: write them to files and move to the destination. The biggest benefit is we can control the number of streams we create during splitting log, it will not exceeds hbase.regionserver.wal.max.splitters * hbase.regionserver.hlog.splitlog.writer.threads, but before it is hbase.regionserver.wal.max.splitters * the number of region the hlog contains. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18619) Should we add a postOpenDeployTasks after open splited or merged region?
Jingyun Tian created HBASE-18619: Summary: Should we add a postOpenDeployTasks after open splited or merged region? Key: HBASE-18619 URL: https://issues.apache.org/jira/browse/HBASE-18619 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 1.1.11, 1.2.6, 0.98.6, 1.4.0 Reporter: Jingyun Tian Assignee: Jingyun Tian I have a question that why we skip postOpenDeployTasks() when we not using zk for assignment? {code:java} if (services != null) { try { if (useZKForAssignment) { // add 2nd daughter first (see HBASE-4335) services.postOpenDeployTasks(b); } else if (!services.reportRegionStateTransition(TransitionCode.SPLIT, parent.getRegionInfo(), hri_a, hri_b)) { throw new IOException("Failed to report split region to master: " + parent.getRegionInfo().getShortNameToLog()); } // Should add it to OnlineRegions services.addToOnlineRegions(b); if (useZKForAssignment) { services.postOpenDeployTasks(a); } services.addToOnlineRegions(a); } catch (KeeperException ke) { throw new IOException(ke); } } {code} It causes a new splitted region or new merged region will not compact their reference files. Then if the normalizer thread want to split this region, it will get stuck. {code:java} public boolean canSplit() { this.lock.readLock().lock(); try { // Not split-able if we find a reference store file present in the store. boolean result = !hasReferences(); if (!result && LOG.isDebugEnabled()) { LOG.debug("Cannot split region due to reference files being there"); } return result; } finally { this.lock.readLock().unlock(); } } {code} According to the code, should we add a services.postOpenDeployTasks after successfully _*reportRegionStateTransition(TransitionCode.SPLIT, parent.getRegionInfo(), hri_a, hri_b)*_ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18128) compaction marker could be skipped
Jingyun Tian created HBASE-18128: Summary: compaction marker could be skipped Key: HBASE-18128 URL: https://issues.apache.org/jira/browse/HBASE-18128 Project: HBase Issue Type: Improvement Components: Compaction, regionserver Reporter: Jingyun Tian The sequence for a compaction are as follows: 1. Compaction writes new files under region/.tmp directory (compaction output) 2. Compaction atomically moves the temporary file under region directory 3. Compaction appends a WAL edit containing the compaction input and output files. Forces sync on WAL. 4. Compaction deletes the input files from the region directory. But if a flush happened between 3 and 4, then the regionserver crushed. The compaction marker will be skipped when splitting log because the sequence id of compaction marker is smaller than lastFlushedSequenceId. {code} if (lastFlushedSequenceId >= entry.getKey().getLogSeqNum()) { editsSkipped++; continue; } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17993) delete a redundant log
Jingyun Tian created HBASE-17993: Summary: delete a redundant log Key: HBASE-17993 URL: https://issues.apache.org/jira/browse/HBASE-17993 Project: HBase Issue Type: Improvement Components: rpc Affects Versions: 1.0.0 Reporter: Jingyun Tian Priority: Trivial There is a log which is to track what current call is. It is used to debugging, we'd better delete it in released version. -- This message was sent by Atlassian JIRA (v6.3.15#6346)