[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-14 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580372#comment-16580372
 ] 

Josh Elser commented on HBASE-20976:


Coming in late...
{quote}the worst case is that there is a race condition so we still schedule 
redundant SCPs, still better than now I think
{quote}
{quote}Yes. Could age them out instead... i.e. a deadserver needs to stick 
around for an hour at least?
{quote}
What's the downside of this: we run an SCP for a RS that already was processed 
or something worse?

As long as SCP is idempotent, we'd just want to reduce the likelihood that we 
do things multiple times (maybe I'm incorrectly assuming that SCP is idempotent 
;))

 

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-13 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579017#comment-16579017
 ] 

stack commented on HBASE-20976:
---

bq. IIRC, the deadservers are removed so that the master Web UI won't show a 
dead server foverever there...

Yes. Could age them out instead... i.e. a deadserver needs to stick around for 
an hour at least?

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576210#comment-16576210
 ] 

Allan Yang commented on HBASE-20976:


{quote}
I think we'd better do it a bit clean without adding too much checks...

I think here we need to make sure that the deadServers check can work and 
prevent scheduling redundant SCPs. We can do the SCPs check when restarting is 
that, we have not started the PE yet so it is safe, but during the execution, 
this is not a good idea as there is no fencing...
{quote}
Yes, there is no fence here... But the worst case is that there is a race 
condition so we still schedule redundant SCPs, still better than now I think.
Making deadServers working is indeed a better idea, but I can't think a better 
way to do it for now.  IIRC, the deadservers are removed so that the master Web 
UI won't show a dead server foverever there...

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576186#comment-16576186
 ] 

Hadoop QA commented on HBASE-20976:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
29s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
41s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
46s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}187m 
11s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}223m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-20976 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935097/HBASE-20976.branch-2.0.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 8949f767b952 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 7ee4aa459c |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14000/testReport/ |
| Max. process+thread count | 4466 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576095#comment-16576095
 ] 

Duo Zhang commented on HBASE-20976:
---

I think we'd better do it a bit clean without adding too much checks...

I think here we need to make sure that the deadServers check can work and 
prevent scheduling redundant SCPs. We can do the SCPs check when restarting is 
that, we have not started the PE yet so it is safe, but during the execution, 
this is not a good idea as there is no fencing...

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576082#comment-16576082
 ] 

Allan Yang commented on HBASE-20976:


{code}
I think there may still be races? As if the previous SCP has also been done and 
removed from ProcedureExecutor, and then the UnassignProcedure tries to expire 
the server...
{code}
Maybe I deleted some procedures wals which causing this. But whatever, I think 
a double check won't hurt.

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576066#comment-16576066
 ] 

Duo Zhang commented on HBASE-20976:
---

I think there may still be races? As if the previous SCP has also been done and 
removed from ProcedureExecutor, and then the UnassignProcedure tries to expire 
the server...

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575920#comment-16575920
 ] 

Allan Yang commented on HBASE-20976:


[~stack],[~Apache9]. Sorry that I have to reopen this again, since I find 
another case that SCP can be scheduled multiple times…… As you can see from the 
issue's description.

1. the RS is expired, and a SCP was submitted
{code}
2018-08-09 12:29:55,665 WARN  [PEWorker-11] master.ServerManager: Expiration of 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online
2018-08-09 12:29:55,665 INFO  [PEWorker-11] master.ServerManager: Processing 
expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on 
izbp1azj9xjvk1h9vioyvfz,16000,1533787159573

2018-08-09 12:29:55,815 DEBUG [PEWorker-11] assignment.AssignmentManager: 
Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted 
shutdown handler to be executed meta=false
{code}

2. the RS restarted on the same host, and the servername is removed from the 
deadserver's list
{code}
2018-08-09 12:29:58,034 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=157,queue=13,port=16000] 
master.ServerManager: REPORT: Server 
izbp1azj9xjvk1h9vioyvfz,16020,1533787086010 came back up, removed it fro
m the dead servers list
{code}

3. Another UnassinProcedure detect this one too, since it thinks no one is 
handling it, a SCP is submitted again
{code}
2018-08-09 12:29:58,061 WARN  [PEWorker-15] 
assignment.RegionTransitionProcedure: Remote call failed 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012, 
state=RUNNABLE:REGION_TRANSITION_D
ISPATCH, hasLock=true; UnassignProcedure table=randowmWrite15, 
region=e07c5ad01ce7b76b80a92e809fb98e26, 
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645; rit=CLOSING, 
location=izbp1azj9xjvk1h9vioyvfz
,16020,1533725024975; exception=NoServerDispatchException
org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedur
e table=randowmWrite15, region=e07c5ad01ce7b76b80a92e809fb98e26, 
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645
at 
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:263)
at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:207)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
2018-08-09 12:29:58,061 WARN  [PEWorker-15] assignment.UnassignProcedure: 
Expiring izbp1azj9xjvk1h9vioyvfz,16020,1533725024975, pid=4034, ppid=4012, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=t
rue; UnassignProcedure table=randowmWrite15, 
region=e07c5ad01ce7b76b80a92e809fb98e26, 
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645 rit=CLOSING, 
location=izbp1azj9xjvk1h9vioyvfz,16020,153372502497
5; exception=NoServerDispatchException
2018-08-09 12:29:58,061 WARN  [PEWorker-15] master.ServerManager: Expiration of 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online
2018-08-09 12:29:58,061 INFO  [PEWorker-15] master.ServerManager: Processing 
expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on 
izbp1azj9xjvk1h9vioyvfz,16000,1533787159573
2018-08-09 12:29:58,540 DEBUG [PEWorker-15] assignment.AssignmentManager: 
Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted 
shutdown handler to be executed meta=false
{code}

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before 

[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-07 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572619#comment-16572619
 ] 

Allan Yang commented on HBASE-20976:


[~stack], sure, let's resolve it. Thanks for backporting  HBASE-20708.

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569191#comment-16569191
 ] 

stack commented on HBASE-20976:
---

[~allan163] I backported HBASE-20708 yesterday. Should we close this now?

I've been backporting the main AMv2 changes up in branch-2.1 to branch-2.0 over 
the last few days after first trying patches on cluster Patches are big but 
I see them as bug-fixes, critical ones.. AMv2 has to work well when folks go to 
use hbase2, even for those who try hbase-2.0.x first. It is taking a while as I 
test before commit. I'm almost caught up.

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-31 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564653#comment-16564653
 ] 

Allan Yang commented on HBASE-20976:


{quote}
Agree with the reopen. My fault that HBASE-20708 was marked with 2.0.2. I 
should have opened a new issue for backport instead (see HBASE-20987). To be 
clear, HBASE-20987 is not in branch-2.0. It seems too big of a change for a 
branch-2.0 but if we are seeing issues like this one, then we should consider 
it.
{quote}
[~stack], We can make a decision here, if HBASE-20708 is too big to be 
back-ported, we can fix the issue here with a smaller patch 

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-31 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563621#comment-16563621
 ] 

stack commented on HBASE-20976:
---

Agree with the reopen. My fault that HBASE-20708 was marked with 2.0.2. I 
should have opened a new issue for backport instead (see HBASE-20987). To be 
clear, HBASE-20987 is not in branch-2.0. It seems too big of a change for a 
branch-2.0 but if we are seeing issues like this one, then we should consider 
it.

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-31 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563176#comment-16563176
 ] 

Duo Zhang commented on HBASE-20976:
---

But we set 2.0.2 as the fix version for HBASE-20708? [~stack]

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563120#comment-16563120
 ] 

Allan Yang commented on HBASE-20976:


Sorry,  HBASE-20708 has not been back-ported, so the issue still exists

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563099#comment-16563099
 ] 

Allan Yang commented on HBASE-20976:


Seem like HBASE-20708 has been back-ported to branch-2.0. Resolving this one as 
fixed

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561999#comment-16561999
 ] 

Ted Yu commented on HBASE-20976:


+1

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561955#comment-16561955
 ] 

Hadoop QA commented on HBASE-20976:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
46s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
43s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 9s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
9s{color} | {color:red} hbase-server: The patch generated 2 new + 29 unchanged 
- 0 fixed = 31 total (was 29) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 6s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 
48s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-20976 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933579/HBASE-20976.branch-2.0.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b07cad149a8c 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / e7eadd61d2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13850/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13850/testReport/ |
| Max. process+thread count | 4206 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |

[jira] [Commented] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561837#comment-16561837
 ] 

Allan Yang commented on HBASE-20976:


confirmed that this issue only exists in branch-2.0, Other 2.x branch is fixed 
by HBASE-20708.

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)