[jira] [Created] (HBASE-22454) refactor WALSplitter

2019-05-22 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-22454:


 Summary: refactor WALSplitter
 Key: HBASE-22454
 URL: https://issues.apache.org/jira/browse/HBASE-22454
 Project: HBase
  Issue Type: Improvement
Reporter: Jingyun Tian
Assignee: Jingyun Tian


WALSplitter is more than 2000 lines right now. It's hard to read and 
understand. There are multiple inner classes which are not simple and many 
static methods. 

My plan is to separate these classes and move these static methods to a new 
util class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22234) Fix flaky TestHbck#testRecoverSplitAfterMetaUpdated

2019-04-13 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-22234:


 Summary: Fix flaky TestHbck#testRecoverSplitAfterMetaUpdated
 Key: HBASE-22234
 URL: https://issues.apache.org/jira/browse/HBASE-22234
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22061) SplitTableRegionProcedure should hold the lock of its daughter regions

2019-03-18 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-22061:


 Summary: SplitTableRegionProcedure should hold the lock of its 
daughter regions
 Key: HBASE-22061
 URL: https://issues.apache.org/jira/browse/HBASE-22061
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian


Currently SplitTableRegionProcedure only hold the region of parent region. But 
during processing of this procedure, after the daughter regions are updated to 
meta, other procedures can grab the lock of them,  which is the situation we 
don't want to see.
So I think SplitTableRegionProcedure should hold the lock of parent region and 
its daughter regions like MergeTableRegionsProcedure. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22049) getReopenStatus() didn't skip counting split parent region

2019-03-13 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-22049:


 Summary: getReopenStatus() didn't skip counting split parent region
 Key: HBASE-22049
 URL: https://issues.apache.org/jira/browse/HBASE-22049
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian
Assignee: Jingyun Tian


After we modify some attributes of table, hbaseAdmin will getAlterStatus to 
check if all region's attributes updated. It will skip opened region and split 
region as the following code shows.
{code}
for (RegionState regionState: states) {
  if (!regionState.isOpened() && !regionState.isSplit()) {
ritCount++;
  }
}
{code}

But since now the split procedure is to unassign the split parent region, thus 
the state is CLOSED, and the check will hang there until timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21966) Fix region holes, overlaps, and other region related errors

2019-02-26 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21966:


 Summary: Fix region holes, overlaps, and other region related 
errors
 Key: HBASE-21966
 URL: https://issues.apache.org/jira/browse/HBASE-21966
 Project: HBase
  Issue Type: Sub-task
Reporter: Jingyun Tian
Assignee: Jingyun Tian






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21965) Fix failed split and merge transactions that have failed to roll back

2019-02-26 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian reopened HBASE-21965:
--

> Fix failed split and merge transactions that have failed to roll back
> -
>
> Key: HBASE-21965
> URL: https://issues.apache.org/jira/browse/HBASE-21965
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Jingyun Tian
>    Assignee: Jingyun Tian
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21965) Fix failed split and merge transactions that have failed to roll back

2019-02-26 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian resolved HBASE-21965.
--
Resolution: Invalid

> Fix failed split and merge transactions that have failed to roll back
> -
>
> Key: HBASE-21965
> URL: https://issues.apache.org/jira/browse/HBASE-21965
> Project: HBase
>  Issue Type: Task
>    Reporter: Jingyun Tian
>    Assignee: Jingyun Tian
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21965) Fix failed split and merge transactions that have failed to roll back

2019-02-26 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21965:


 Summary: Fix failed split and merge transactions that have failed 
to roll back
 Key: HBASE-21965
 URL: https://issues.apache.org/jira/browse/HBASE-21965
 Project: HBase
  Issue Type: Task
Reporter: Jingyun Tian
Assignee: Jingyun Tian






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21934) SplitWALProcedure get stuck during ITBLL

2019-02-19 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21934:


 Summary: SplitWALProcedure get stuck during ITBLL
 Key: HBASE-21934
 URL: https://issues.apache.org/jira/browse/HBASE-21934
 Project: HBase
  Issue Type: Sub-task
Reporter: Jingyun Tian
Assignee: Jingyun Tian


I encounter the problem that when master assign a splitWALRemoteProcedure to a 
region server. The log of this region server says it failed to recover the 
lease of this file. Then this region server is killed by chaosMonkey. As the 
result, this procedure is not timeout and hang there forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC

2019-01-21 Thread Jingyun Tian
Congratulations Peter!
Best Regards,
Jingyun Tian

On Tue, Jan 22, 2019 at 10:15 AM Allan Yang  wrote:

> Congratulations Peter!
> Best Regards
> Allan Yang
>
>
> Pankaj kr  于2019年1月22日周二 上午9:49写道:
>
> >
> > Congratulations Peter...!!!
> >
> > Regards,
> > Pankaj
> >
> > --
> > Pankaj Kumar
> > M: +91-9535197664(India Contact Number)
> > E: pankaj...@huawei.com<mailto:pankaj...@huawei.com>
> > 2012实验室-班加罗尔研究所IT BU分部
> > 2012 Laboratories-IT BU Branch Dept.HTIPL
> > From:Duo Zhang 
> > To:HBase Dev List ;hbase-user <
> u...@hbase.apache.org
> > >
> > Date:2019-01-22 07:06:43
> > Subject:[ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC
> >
> > On behalf of the Apache HBase PMC I am pleased to announce that Peter
> > Somogyi
> > has accepted our invitation to become a PMC member on the Apache HBase
> > project.
> > We appreciate Peter stepping up to take more responsibility in the HBase
> > project.
> >
> > Please join me in welcoming Peter to the HBase PMC!
> >
>


[jira] [Created] (HBASE-21730) Update hbase-book with the procedure based WAL splitting

2019-01-15 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21730:


 Summary: Update hbase-book with the procedure based WAL splitting
 Key: HBASE-21730
 URL: https://issues.apache.org/jira/browse/HBASE-21730
 Project: HBase
  Issue Type: Sub-task
Reporter: Jingyun Tian






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21729) Extract ProcedureCoordinatorRpcs and ProcedureMemberRpcs from CoordinatedStateManager

2019-01-15 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21729:


 Summary: Extract ProcedureCoordinatorRpcs and ProcedureMemberRpcs 
from CoordinatedStateManager
 Key: HBASE-21729
 URL: https://issues.apache.org/jira/browse/HBASE-21729
 Project: HBase
  Issue Type: Sub-task
Reporter: Jingyun Tian
Assignee: Jingyun Tian


If procedureV2 based WAL splitting is enabled, CoordinatedStateManager will not 
be initialized. Then ProcedureCoordinatorRpcs and ProcedureMemberRpcs will make 
backup not work. Let me extract these two method to another class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21647) Add status track for splitting WAL tasks

2018-12-26 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21647:


 Summary: Add status track for splitting WAL tasks
 Key: HBASE-21647
 URL: https://issues.apache.org/jira/browse/HBASE-21647
 Project: HBase
  Issue Type: Sub-task
Reporter: Jingyun Tian


add status track to help operator check the status of splitting WAL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21588) Procedure v2 wal splitting implementation

2018-12-12 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21588:


 Summary: Procedure v2 wal splitting implementation
 Key: HBASE-21588
 URL: https://issues.apache.org/jira/browse/HBASE-21588
 Project: HBase
  Issue Type: Sub-task
Reporter: Jingyun Tian
Assignee: Jingyun Tian


create a sub task to submit the implementation of procedure v2 wal splitting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-06 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21565:


 Summary: Delete dead server from dead server list too early leads 
to concurrent Server Crash Procedures(SCP) for a same server
 Key: HBASE-21565
 URL: https://issues.apache.org/jira/browse/HBASE-21565
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian
Assignee: Jingyun Tian


There are 2 kinds of SCP for a same server will be scheduled during cluster 
restart, one is ZK session timeout, the other one is new server report in will 
cause the stale one do fail over. The only barrier for these 2 kinds of SCP is 
check if the server is in the dead server list.
{code}
if (this.deadservers.isDeadServer(serverName)) {
  LOG.warn("Expiration called on {} but crash processing already in 
progress", serverName);
  return false;
}
{code}
But the problem is when master finish initialization, it will delete all stale 
servers from dead server list. Thus when the SCP for ZK session timeout come 
in, the barrier is already removed.
Here is the logs that how this problem occur.
{code}
2018-12-07,11:42:37,589 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
2018-12-07,11:42:58,007 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
{code}
Now we can see two SCP are scheduled for the same server.
But the first procedure is finished after the second SCP starts.
{code}
2018-12-07,11:43:08,038 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
state=SUCCESS, hasLock=false; ServerCrashProcedure 
server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false in 
30.5340sec
{code}
Thus it will leads the problem that regions will be assigned twice.
{code}
2018-12-07,12:16:33,039 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
{code}
And here we can see the server is removed from dead server list before the 
second SCP starts.
{code}
2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
{code}

Thus we should not delete dead server from dead server list immediately.
Patch to fix this problem will be upload later.


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Allan Yang joins the Apache HBase PMC

2018-11-28 Thread Jingyun Tian
Congratulations! Allan!

On Thu, Nov 29, 2018 at 9:42 AM OpenInx  wrote:

> Congratulations, Allan!
>
> On Thu, Nov 29, 2018 at 6:58 AM Zach York 
> wrote:
>
> > Congratulations and welcome Allan!
> >
> >
> > On Wed, Nov 28, 2018 at 10:21 AM Esteban Gutierrez
> >  wrote:
> >
> > > Congratulations, Allan!
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > >
> > > On Wed, Nov 28, 2018 at 10:11 AM Yu Li  wrote:
> > >
> > > > On behalf of the Apache HBase PMC I am pleased to announce that Allan
> > > Yang
> > > > has accepted our invitation to become a PMC member on the Apache
> HBase
> > > > project. We appreciate Allan stepping up to take more responsibility
> in
> > > the
> > > > HBase project.
> > > >
> > > > Please join me in welcoming Allan to the HBase PMC!
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > >
> >
>


Re: [ANNOUNCE] New HBase committer Jingyun Tian

2018-11-13 Thread Jingyun Tian
Thank you all!

Sincerely,
Jingyun Tian

On Wed, Nov 14, 2018 at 8:59 AM stack  wrote:

> Welcome jingyun.
> S
>
> On Mon, Nov 12, 2018, 11:54 PM 张铎(Duo Zhang) 
> > On behalf of the Apache HBase PMC, I am pleased to announce that Jingyun
> > Tian has accepted the PMC's invitation to become a committer on the
> > project. We appreciate all of Jingyun's generous contributions thus far
> and
> > look forward to his continued involvement.
> >
> > Congratulations and welcome, Jingyun!
> >
>


[jira] [Created] (HBASE-21437) Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT

2018-11-05 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21437:


 Summary: Bypassed procedure throw IllegalArgumentException when 
its state is WAITING_TIMEOUT
 Key: HBASE-21437
 URL: https://issues.apache.org/jira/browse/HBASE-21437
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian
Assignee: Jingyun Tian


{code}
2018-11-05,18:25:52,735 WARN 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker terminating 
UNNATURALLY null
java.lang.IllegalArgumentException: NOT RUNNABLE! pid=3, 
state=WAITING_TIMEOUT:REGION_STATE_TRANSITION_CLOSE, hasLock=true, bypass=true; 
TransitRegionStateProcedure table=test_fail
over, region=1bb029ba4ec03b92061be5c4329d2096, UNASSIGN
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1620)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1384)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1948)
2018-11-05,18:25:52,736 TRACE 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker terminated.
{code}

Since when we bypassed a WAITING_TIMEOUT procedure and resubmit it, its state 
is still WAITING_TIMEOUT, then when executor run this procedure, it will throw 
exception and cause worker terminated.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster

2018-10-31 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21413:


 Summary: Empty meta log doesn't get split when restart whole 
cluster
 Key: HBASE-21413
 URL: https://issues.apache.org/jira/browse/HBASE-21413
 Project: HBase
  Issue Type: Improvement
Reporter: Jingyun Tian
Assignee: Jingyun Tian
 Attachments: Screenshot from 2018-10-31 18-11-02.png, Screenshot from 
2018-10-31 18-11-11.png

After I restart whole cluster, there is a splitting directory still exists on 
hdfs. Then I found there is only an empty meta wal file in it. I'll dig into 
this later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21410) A helper page that help find all problematic regions and procedures

2018-10-30 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21410:


 Summary: A helper page that help find all problematic regions and 
procedures
 Key: HBASE-21410
 URL: https://issues.apache.org/jira/browse/HBASE-21410
 Project: HBase
  Issue Type: Improvement
Reporter: Jingyun Tian
Assignee: Jingyun Tian
 Attachments: Screenshot from 2018-10-30 19-06-21.png, Screenshot from 
2018-10-30 19-06-42.png

This page is mainly focus on finding the regions stuck in some state that 
cannot be assigned. My proposal of the page is as follows: 
!Screenshot from 2018-10-30 19-06-21.png!
>From this page we can see all regions in RIT queue and their related 
>procedures. If we can determine that these regions' state are abnormal, we can 
>click the link 'Procedures as TXT' to get a full list of procedure IDs to 
>bypass them. Then click 'Regions as TXT' to get a full list of encoded region 
>names to assign.
!Screenshot from 2018-10-30 19-06-42.png!
Some region names are covered by the navigator bar, I'll fix it later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21407) Resolve NPE in backup Master UI

2018-10-30 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21407:


 Summary: Resolve NPE in backup Master UI 
 Key: HBASE-21407
 URL: https://issues.apache.org/jira/browse/HBASE-21407
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 2.1.0, 3.0.0, 2.2.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian


Since some pages of our UI is using jsp instead of jamon, the fix of 
HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21393) Add an API ScheduleSCP() to HBCK2

2018-10-26 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21393:


 Summary: Add an API  ScheduleSCP() to HBCK2
 Key: HBASE-21393
 URL: https://issues.apache.org/jira/browse/HBASE-21393
 Project: HBase
  Issue Type: Bug
  Components: hbase-operator-tools, hbck2
Reporter: Jingyun Tian


Add the API of ScheduleSCP() to hbase-operator-tools so that operators can 
schedule ServerCrashProcedures of specified regionservers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21378) checkHBCKSupport blocks assigning hbase:meta or hbase:namespace when master is not initialized

2018-10-24 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21378:


 Summary: checkHBCKSupport blocks assigning hbase:meta or 
hbase:namespace when master is not initialized
 Key: HBASE-21378
 URL: https://issues.apache.org/jira/browse/HBASE-21378
 Project: HBase
  Issue Type: Bug
  Components: hbase-operator-tools
Reporter: Jingyun Tian
Assignee: Jingyun Tian


When I encounter the scenario that hbase:namespace is not online.
{code}
2018-10-24,14:38:16,910 WARN org.apache.hadoop.hbase.master.HMaster: 
hbase:namespace,,1529933109115.7e0801c8232b2dc15face54532056076. is NOT online; 
state={7e0801c8232b2dc15face54532056076 state=OPEN, ts=1540363033384, 
server=c4-hadoop-tst-st30.bj,29100,1540348649479}; ServerCrashProcedures=false. 
Master startup cannot progress, in holding-pattern until region onlined.
{code}
Then I tried to assign it manually, but it throws PleaseHoldException.
{code}
Wed Oct 24 15:26:52 CST 2018, RpcRetryingCaller{globalStartTime=1540365754487, 
pause=200, maxAttempts=16}, org.apache.hadoop.hbase.PleaseHoldException: 
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at 
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3064)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:934)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)


at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:144)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3133)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3125)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.getClusterMetrics(HBaseAdmin.java:2161)
at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:98)
at org.apache.hbase.HBCK2.run(HBCK2.java:364)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hbase.HBCK2.main(HBCK2.java:447)
Caused by: org.apache.hadoop.hbase.PleaseHoldException: 
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at 
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3064)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:934)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100)
at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90)
at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:361)
at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:349)
at 
org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:101)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
{code}

Then I check the code and found it is because of checkHBCKSupport(), I assign 
hbase:namespace successfully by skipping this check. Thus I think the tool need 
an option to skip this check.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21335) Change the default wait time of HBCK2 tool

2018-10-18 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21335:


 Summary: Change the default wait time of HBCK2 tool
 Key: HBASE-21335
 URL: https://issues.apache.org/jira/browse/HBASE-21335
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian


Currently default wait time is 0 and I add a condition check before that wait 
time should more than 0. Thus the default wait time should be set to a number 
that more than 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21322) Add a scheduleServerCrashProcedure() API to HbckService

2018-10-16 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21322:


 Summary: Add a scheduleServerCrashProcedure() API to HbckService
 Key: HBASE-21322
 URL: https://issues.apache.org/jira/browse/HBASE-21322
 Project: HBase
  Issue Type: Task
Reporter: Jingyun Tian
Assignee: Jingyun Tian


According to my test, if one RS is down, then all procedure logs are deleted, 
it will lead to that no ServerCrashProcedure is scheduled. And restarting 
master cannot help. Thus we need to schedule a ServerCrashProcedure manually to 
solve the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21291) Bypass doesn't work for state-machine procedures

2018-10-11 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21291:


 Summary: Bypass doesn't work for state-machine procedures
 Key: HBASE-21291
 URL: https://issues.apache.org/jira/browse/HBASE-21291
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Jingyun Tian


{code}
  if (!procedure.isFailed()) {
if (subprocs != null) {
  if (subprocs.length == 1 && subprocs[0] == procedure) {
// Procedure returned itself. Quick-shortcut for a state 
machine-like procedure;
// i.e. we go around this loop again rather than go back out on the 
scheduler queue.
subprocs = null;
reExecute = true;
LOG.trace("Short-circuit to next step on pid={}", 
procedure.getProcId());
  } else {
// Yield the current procedure, and make the subprocedure runnable
// subprocs may come back 'null'.
subprocs = initializeChildren(procStack, procedure, subprocs);
LOG.info("Initialized subprocedures=" +
  (subprocs == null? null:
Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
collect(Collectors.toList()).toString()));
  }
} else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
  LOG.debug("Added to timeoutExecutor {}", procedure);
  timeoutExecutor.add(procedure);
} else if (!suspended) {
  // No subtask, so we are done
  procedure.setState(ProcedureState.SUCCESS);
}
  }
{code}
Currently implementation of ProcedureExecutor will set the reExcecute to true 
for state machine like procedure. Then if this procedure is stuck at one 
certain state, it will loop forever.
{code}
  IdLock.Entry lockEntry = 
procExecutionLock.getLockEntry(proc.getProcId());
  try {
executeProcedure(proc);
  } catch (AssertionError e) {
LOG.info("ASSERT pid=" + proc.getProcId(), e);
throw e;
  } finally {
procExecutionLock.releaseLockEntry(lockEntry);
{code}
Since procedure will get the IdLock and release it after execution done, state 
machine procedure will never release IdLock until it is finished.
Then bypassProcedure doesn't work because is will try to grab the IdLock at 
first.
{code}
IdLock.Entry lockEntry = 
procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21204) NPE when scan raw DELETE_FAMILY_VERSION and codec is not set

2018-09-17 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-21204:


 Summary: NPE when scan raw DELETE_FAMILY_VERSION and codec is not 
set
 Key: HBASE-21204
 URL: https://issues.apache.org/jira/browse/HBASE-21204
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0, 2.1.0, 2.2.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian
 Fix For: 2.2.0, 2.0.0, 2.1.0


There are 7 types of our Cell,
Minimum((byte)0),
Put((byte)4),
Delete((byte)8),
DeleteFamilyVersion((byte)10),
DeleteColumn((byte)12),
DeleteFamily((byte)14),
Maximum((byte)255);

But there are only 6 types of our CellType protobuf definition:
enum CellType {
MINIMUM = 0;
PUT = 4;

DELETE = 8;
DELETE_FAMILY_VERSION = 10;
DELETE_COLUMN = 12;
DELETE_FAMILY = 14;

// MAXIMUM is used when searching; you look from maximum on down.
MAXIMUM = 255;
}

Thus if we scan raw data which is DELETE_FAMILY_VERSION,it will throw NPE.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20986) Separate the config of block size when we do log splitting and write Hlog

2018-07-31 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20986:


 Summary: Separate the config of block size when we do log 
splitting and write Hlog
 Key: HBASE-20986
 URL: https://issues.apache.org/jira/browse/HBASE-20986
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.1.0
Reporter: Jingyun Tian
 Fix For: 2.1.0


Since the block size of recovered edits and hlog are the same right now, if we 
set a large value to block size, name node may not able to assign enough space 
when we do log splitting. But set a large value to hlog block size can help 
reduce the number of region server asking for a new block. Thus I think 
separate the config of block size is necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20985) add two attributes when we do normalization

2018-07-31 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20985:


 Summary: add two attributes when we do normalization
 Key: HBASE-20985
 URL: https://issues.apache.org/jira/browse/HBASE-20985
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.1.0
Reporter: Jingyun Tian
 Fix For: 2.1.0


Currently when we turn on normalization switch, it will help balance the whole 
table based on total region size / total region count. I add two attributes so 
that we can set total region count or average region size we want to achieve 
when normalization done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20855) PeerConfigTracker only support one listener will cause problem when there is a recovered replication queue

2018-07-06 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20855:


 Summary: PeerConfigTracker only support one listener will cause 
problem when there is a recovered replication queue
 Key: HBASE-20855
 URL: https://issues.apache.org/jira/browse/HBASE-20855
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.0, 1.3.0, 1.5.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian


{code}

public void init(Context context) throws IOException {
 this.ctx = context;

 if (this.ctx != null){
 ReplicationPeer peer = this.ctx.getReplicationPeer();
 if (peer != null){
 peer.trackPeerConfigChanges(this);
 } else {
 LOG.warn("Not tracking replication peer config changes for Peer Id " + 
this.ctx.getPeerId() +
 " because there's no such peer");
 }
 }
}

{code}

As we know, replication source will set itself to the PeerConfigTracker in 
ReplicationPeer. When there is one or more recovered queue, each queue will 
generate a new replication source, But they share the same ReplicationPeer. 

Then when it calls setListener, the new generated one will cover the older one. 
Thus there will only has one ReplicationPeer that receive the peer config 
change notify.

{code}

public synchronized void setListener(ReplicationPeerConfigListener listener){
 this.listener = listener;
}

{code}

 

To solve this,  PeerConfigTracker need to support multiple listener and 
listener should be removed when the replication endpoint terminated.

I will upload a patch later with fix and UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20769) getSplits() has a out of bounds problem in TableSnapshotInputFormatImpl

2018-06-21 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20769:


 Summary: getSplits() has a out of bounds problem in 
TableSnapshotInputFormatImpl
 Key: HBASE-20769
 URL: https://issues.apache.org/jira/browse/HBASE-20769
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.4.0, 1.3.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian
 Fix For: 2.0.0


When numSplits > 1, getSplits may create split that has start row smaller than 
user specified scan's start row or stop row larger than user specified scan's 
stop row.

{code}

byte[][] sp = sa.split(hri.getStartKey(), hri.getEndKey(), numSplits, 
true);
for (int i = 0; i < sp.length - 1; i++) {
  if (PrivateCellUtil.overlappingKeys(scan.getStartRow(), 
scan.getStopRow(), sp[i],
  sp[i + 1])) {
List hosts =
calculateLocationsForInputSplit(conf, htd, hri, tableDir, 
localityEnabled);

Scan boundedScan = new Scan(scan);
boundedScan.setStartRow(sp[i]);
boundedScan.setStopRow(sp[i + 1]);

splits.add(new InputSplit(htd, hri, hosts, boundedScan, 
restoreDir));
  }
}

{code}

Since we split keys by the range of regions, when sp[i] < scan.getStartRow() or 
sp[i + 1] > scan.getStopRow(), the created bounded scan may contain range that 
over user defined scan.

fix should be simple:

{code}

boundedScan.setStartRow(
 Bytes.compareTo(scan.getStartRow(), sp[i]) > 0 ? scan.getStartRow() : sp[i]);
 boundedScan.setStopRow(
 Bytes.compareTo(scan.getStopRow(), sp[i + 1]) < 0 ? scan.getStopRow() : sp[i + 
1]);

{code}

I will also try to add UTs to help discover this problem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20625) refactor some WALCellCodec related code

2018-05-23 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20625:


 Summary: refactor some WALCellCodec related code
 Key: HBASE-20625
 URL: https://issues.apache.org/jira/browse/HBASE-20625
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 2.0.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian
 Fix For: 3.0.0


Currently I'm working on export HLog to another FileSystem, then I found the 
code of WALCellCodec and  its related classes is not that clean. And there are 
several TODOs. Thus I tried to refactor the code based one these TODOs. e.g.
{code}
  // TODO: it sucks that compression context is in WAL.Entry. It'd be nice if 
it was here.
  //   Dictionary could be gotten by enum; initially, based on enum, 
context would create
  //   an array of dictionaries.
  static class BaosAndCompressor extends ByteArrayOutputStream implements 
ByteStringCompressor {
public ByteString toByteString() {
  // We need this copy to create the ByteString as the byte[] 'buf' is not 
immutable. We reuse
  // them.
  return ByteString.copyFrom(this.buf, 0, this.count);
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20584) TestRestoreSnapshotFromClient failed

2018-05-15 Thread Jingyun Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian resolved HBASE-20584.
--
Resolution: Duplicate

> TestRestoreSnapshotFromClient failed
> 
>
> Key: HBASE-20584
> URL: https://issues.apache.org/jira/browse/HBASE-20584
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>    Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
>
> org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
> org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
> ss=snaptb1-1526376636687 
> table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } had 
> an error.  Procedure snaptb1-1526376636687 { waiting=[] done=[] }
>   at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:380)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1128)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
> Failed taking snapshot { ss=snaptb1-1526376636687 
> table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } due 
> to exception:Regions moved during the snapshot '{ ss=snaptb1-1526376636687 
> table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. 
> expected=6 
> snapshotted=7.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
> Regions moved during the snapshot '{ ss=snaptb1-1526376636687 
> table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. 
> expected=6 snapshotted=7.
>   at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:311)
>   at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:369)
>   ... 6 more
> Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
> Regions moved during the snapshot '{ ss=snaptb1-1526376636687 
> table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. 
> expected=6 snapshotted=7.
>   at 
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:217)
>   at 
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:121)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:207)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100)
>   at 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:360)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:348)
>   at 
> org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:101)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3061)
>   at 
&g

[jira] [Created] (HBASE-20584) TestRestoreSnapshotFromClient failed

2018-05-15 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20584:


 Summary: TestRestoreSnapshotFromClient failed
 Key: HBASE-20584
 URL: https://issues.apache.org/jira/browse/HBASE-20584
 Project: HBase
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian





org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
ss=snaptb1-1526376636687 
table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } had 
an error.  Procedure snaptb1-1526376636687 { waiting=[] done=[] }
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:380)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1128)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
Failed taking snapshot { ss=snaptb1-1526376636687 
table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH } due 
to exception:Regions moved during the snapshot '{ ss=snaptb1-1526376636687 
table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. 
expected=6 
snapshotted=7.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
Regions moved during the snapshot '{ ss=snaptb1-1526376636687 
table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. 
expected=6 snapshotted=7.
at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82)
at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:311)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:369)
... 6 more
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions 
moved during the snapshot '{ ss=snaptb1-1526376636687 
table=testRestoreSnapshotAfterSplittingRegions-1526376636687 type=FLUSH }'. 
expected=6 snapshotted=7.
at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:217)
at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:121)
at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:207)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)


at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100)
at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90)
at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:360)
at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:348)
at 
org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:101)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3061)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3053)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2532)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2499)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2492)
at 
org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testRestoreSnapshotAfterSplittingRegions(TestRestoreSnapshotFromClient.java:311)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method

[jira] [Created] (HBASE-20579) Improve copy snapshot manifest in ExportSnapshot

2018-05-13 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20579:


 Summary: Improve copy snapshot manifest in ExportSnapshot
 Key: HBASE-20579
 URL: https://issues.apache.org/jira/browse/HBASE-20579
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: hbase-2.0.0-alpha-4
Reporter: Jingyun Tian
Assignee: Jingyun Tian
 Fix For: hbase-2.0.0-alpha-4


ExportSnapshot need to copy snapshot manifest to destination cluster first, 
then setOwner and setPermission for those paths. But it's done with one thread, 
which lead to a long time to submit the job if your snapshot is big. I tried to 
make them processing in parallel, which can reduce the total time of submitting 
dramatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20194) Basic Replication WebUI - Master

2018-03-14 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20194:


 Summary: Basic Replication WebUI - Master
 Key: HBASE-20194
 URL: https://issues.apache.org/jira/browse/HBASE-20194
 Project: HBase
  Issue Type: Task
Reporter: Jingyun Tian


subtask of HBASE-15809. Implementation of Replication WebUI on Master webpage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20193) Basic Replication Web UI - Regionserver

2018-03-14 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-20193:


 Summary: Basic Replication Web UI - Regionserver 
 Key: HBASE-20193
 URL: https://issues.apache.org/jira/browse/HBASE-20193
 Project: HBase
  Issue Type: Task
Reporter: Jingyun Tian
Assignee: Jingyun Tian


subtask of HBASE-15809



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19666) hadoop.hbase.regionserver.TestDefaultCompactSelection test failed

2017-12-29 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-19666:


 Summary: hadoop.hbase.regionserver.TestDefaultCompactSelection 
test failed
 Key: HBASE-19666
 URL: https://issues.apache.org/jira/browse/HBASE-19666
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0
Reporter: Jingyun Tian
Priority: Critical


hadoop.hbase.regionserver.TestDefaultCompactSelection
[ERROR] Failures: 
[ERROR]   
TestDefaultCompactSelection.testCompactionRatio:74->TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
 expected:<[[4, 2, 1]]> but was:<[[]]>
[ERROR]   
TestDefaultCompactSelection.testStuckStoreCompaction:145->TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
 expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-19358:


 Summary: Improve the stability of splitting log when do fail over
 Key: HBASE-19358
 URL: https://issues.apache.org/jira/browse/HBASE-19358
 Project: HBase
  Issue Type: Improvement
  Components: MTTR
Affects Versions: 0.98.24
Reporter: Jingyun Tian


Now the way we split log is like the following figure:
!previous-logic.png|thumbnail!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!attachment-name.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18619) Should we add a postOpenDeployTasks after open splited or merged region?

2017-08-17 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-18619:


 Summary: Should we add a postOpenDeployTasks after open splited or 
merged region?
 Key: HBASE-18619
 URL: https://issues.apache.org/jira/browse/HBASE-18619
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 1.1.11, 1.2.6, 0.98.6, 1.4.0
Reporter: Jingyun Tian
Assignee: Jingyun Tian


I have a question that why we skip postOpenDeployTasks() when we not using zk 
for assignment?
{code:java}
  if (services != null) {
try {
  if (useZKForAssignment) {
// add 2nd daughter first (see HBASE-4335)
services.postOpenDeployTasks(b);
  } else if (!services.reportRegionStateTransition(TransitionCode.SPLIT,
  parent.getRegionInfo(), hri_a, hri_b)) {
throw new IOException("Failed to report split region to master: "
  + parent.getRegionInfo().getShortNameToLog());
  }
  // Should add it to OnlineRegions
  services.addToOnlineRegions(b);
  if (useZKForAssignment) {
services.postOpenDeployTasks(a);
  }
  services.addToOnlineRegions(a);
} catch (KeeperException ke) {
  throw new IOException(ke);
}
  }
{code}
It causes a new splitted region or new merged region will not compact their 
reference files. Then if the normalizer thread want to split this region, it 
will get stuck. 

{code:java}
public boolean canSplit() {
this.lock.readLock().lock();
try {
  // Not split-able if we find a reference store file present in the store.
  boolean result = !hasReferences();
  if (!result && LOG.isDebugEnabled()) {
LOG.debug("Cannot split region due to reference files being there");
  }
  return result;
} finally {
  this.lock.readLock().unlock();
}
  }
{code}

According to the code, should we add a  services.postOpenDeployTasks after 
successfully _*reportRegionStateTransition(TransitionCode.SPLIT, 
parent.getRegionInfo(), hri_a, hri_b)*_








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18128) compaction marker could be skipped

2017-05-26 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-18128:


 Summary: compaction marker could be skipped 
 Key: HBASE-18128
 URL: https://issues.apache.org/jira/browse/HBASE-18128
 Project: HBase
  Issue Type: Improvement
  Components: Compaction, regionserver
Reporter: Jingyun Tian


The sequence for a compaction are as follows:
1. Compaction writes new files under region/.tmp directory (compaction output)
2. Compaction atomically moves the temporary file under region directory
3. Compaction appends a WAL edit containing the compaction input and output 
files. Forces sync on WAL.
4. Compaction deletes the input files from the region directory.

But if a flush happened between 3 and 4, then the regionserver crushed. The 
compaction marker will be skipped when splitting log because the sequence id of 
compaction marker is smaller than lastFlushedSequenceId.
{code}
if (lastFlushedSequenceId >= entry.getKey().getLogSeqNum()) {
  editsSkipped++;
  continue;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17993) delete a redundant log

2017-05-04 Thread Jingyun Tian (JIRA)
Jingyun Tian created HBASE-17993:


 Summary: delete a redundant log 
 Key: HBASE-17993
 URL: https://issues.apache.org/jira/browse/HBASE-17993
 Project: HBase
  Issue Type: Improvement
  Components: rpc
Affects Versions: 1.0.0
Reporter: Jingyun Tian
Priority: Trivial


There is a log which is to track what current call is. It is used to debugging, 
we'd better delete it in released version.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)