[jira] [Commented] (HBASE-22404) Open/Close region request may be executed twice when master restart

2020-12-09 Thread Sanjeet Nishad (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247055#comment-17247055
 ] 

Sanjeet Nishad commented on HBASE-22404:


Hi [~zhangduo] & [~zghao], we are using HBase-2.2.3. Recently we also faced 
similar problem where regionserver ignored a procedure (closeRegionProcedure) 
due to duplicate pid which lead region to stuck in RIT.

Analysis:
1. After Hmaster failover, master in-memory proc-id was reset. 
2. Upon new DisableTable client request, Master dispatched a 
closeRegionProcedure to RS and suspended the proc.
3. But RS ignored the current CloseRegionProcedure request without doing 
anything since RS had already executed a procedure with same id.

Since no UnAssignRegionHandler was created at Step-3, so RS did not send any 
reportRegionStateTransition to HM. And at HMaster side the procedure remain in 
suspended state because we awake the suspended procedure on 
reportRegionStateTransition. So region stuck in RIT forever until unless we 
restart HM & RS.

> Open/Close region request may be executed twice when master restart
> ---
>
> Key: HBASE-22404
> URL: https://issues.apache.org/jira/browse/HBASE-22404
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.0, 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.0, 2.3.0
>
>
> We found this problem when run ITBLL for our internal branch which based 
> branch-2.2.
>  # Master A schedule a TRSP which will reopen region1. And this TRSP firstly 
> schdule a sub remote procedure: CloseRegionProcedure and send the close 
> region request to RS.
>  # Master A shutdown and Master B is the new active master. And restore this 
> TRSP and the remote procedure CloseRegionProcedure.
>  # RS reported to the new Master B and the CloseRegionProcedure finished. 
> Then the TRSP schdule a new OpenRegionProcedure and send open region request 
> to RS.
>  # {color:#FF}But meanwhile Master B send the close region request to RS 
> again{color}.
>  # The open region request finished firstly and report to master succeed. The 
> master thought the region was opened on RS. But the RS excuted the close 
> region request again and closed the region1.
>  # The Master thought the region opened but the RS closed the region. Then 
> the new TRSP will stuck forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22404) Open/Close region request may be executed twice when master restart

2019-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841313#comment-16841313
 ] 

Hudson commented on HBASE-22404:


Results for branch master
[build #1010 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1010/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1010//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1010//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1010//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/master/1010//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Open/Close region request may be executed twice when master restart
> ---
>
> Key: HBASE-22404
> URL: https://issues.apache.org/jira/browse/HBASE-22404
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0, 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
>
> We found this problem when run ITBLL for our internal branch which based 
> branch-2.2.
>  # Master A schedule a TRSP which will reopen region1. And this TRSP firstly 
> schdule a sub remote procedure: CloseRegionProcedure and send the close 
> region request to RS.
>  # Master A shutdown and Master B is the new active master. And restore this 
> TRSP and the remote procedure CloseRegionProcedure.
>  # RS reported to the new Master B and the CloseRegionProcedure finished. 
> Then the TRSP schdule a new OpenRegionProcedure and send open region request 
> to RS.
>  # {color:#FF}But meanwhile Master B send the close region request to RS 
> again{color}.
>  # The open region request finished firstly and report to master succeed. The 
> master thought the region was opened on RS. But the RS excuted the close 
> region request again and closed the region1.
>  # The Master thought the region opened but the RS closed the region. Then 
> the new TRSP will stuck forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22404) Open/Close region request may be executed twice when master restart

2019-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841090#comment-16841090
 ] 

Hudson commented on HBASE-22404:


Results for branch branch-2.2
[build #262 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/262/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/262//console].




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/262//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/262//console].


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/262//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Open/Close region request may be executed twice when master restart
> ---
>
> Key: HBASE-22404
> URL: https://issues.apache.org/jira/browse/HBASE-22404
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0, 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
>
> We found this problem when run ITBLL for our internal branch which based 
> branch-2.2.
>  # Master A schedule a TRSP which will reopen region1. And this TRSP firstly 
> schdule a sub remote procedure: CloseRegionProcedure and send the close 
> region request to RS.
>  # Master A shutdown and Master B is the new active master. And restore this 
> TRSP and the remote procedure CloseRegionProcedure.
>  # RS reported to the new Master B and the CloseRegionProcedure finished. 
> Then the TRSP schdule a new OpenRegionProcedure and send open region request 
> to RS.
>  # {color:#FF}But meanwhile Master B send the close region request to RS 
> again{color}.
>  # The open region request finished firstly and report to master succeed. The 
> master thought the region was opened on RS. But the RS excuted the close 
> region request again and closed the region1.
>  # The Master thought the region opened but the RS closed the region. Then 
> the new TRSP will stuck forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22404) Open/Close region request may be executed twice when master restart

2019-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841014#comment-16841014
 ] 

Hudson commented on HBASE-22404:


Results for branch branch-2
[build #1894 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//console].


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Open/Close region request may be executed twice when master restart
> ---
>
> Key: HBASE-22404
> URL: https://issues.apache.org/jira/browse/HBASE-22404
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0, 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
>
> We found this problem when run ITBLL for our internal branch which based 
> branch-2.2.
>  # Master A schedule a TRSP which will reopen region1. And this TRSP firstly 
> schdule a sub remote procedure: CloseRegionProcedure and send the close 
> region request to RS.
>  # Master A shutdown and Master B is the new active master. And restore this 
> TRSP and the remote procedure CloseRegionProcedure.
>  # RS reported to the new Master B and the CloseRegionProcedure finished. 
> Then the TRSP schdule a new OpenRegionProcedure and send open region request 
> to RS.
>  # {color:#FF}But meanwhile Master B send the close region request to RS 
> again{color}.
>  # The open region request finished firstly and report to master succeed. The 
> master thought the region was opened on RS. But the RS excuted the close 
> region request again and closed the region1.
>  # The Master thought the region opened but the RS closed the region. Then 
> the new TRSP will stuck forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22404) Open/Close region request may be executed twice when master restart

2019-05-13 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838325#comment-16838325
 ] 

Duo Zhang commented on HBASE-22404:
---

Since we have already sent the procedure id to region server, I think we could 
record the recent processed open/close procedure ids and ignore it when 
receiving again...

> Open/Close region request may be executed twice when master restart
> ---
>
> Key: HBASE-22404
> URL: https://issues.apache.org/jira/browse/HBASE-22404
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Major
>
> We found this problem when run ITBLL for our internal branch which based 
> branch-2.2.
>  # Master A schedule a TRSP which will reopen region1. And this TRSP firstly 
> schdule a sub remote procedure: CloseRegionProcedure and send the close 
> region request to RS.
>  # Master A shutdown and Master B is the new active master. And restore this 
> TRSP and the remote procedure CloseRegionProcedure.
>  # RS reported to the new Master B and the CloseRegionProcedure finished. 
> Then the TRSP schdule a new OpenRegionProcedure and send open region request 
> to RS.
>  # {color:#FF}But meanwhile Master B send the close region request to RS 
> again{color}.
>  # The open region request finished firstly and report to master succeed. The 
> master thought the region was opened on RS. But the RS excuted the close 
> region request again and closed the region1.
>  # The Master thought the region opened but the RS closed the region. Then 
> the new TRSP will stuck forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)