[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-06-03 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125257#comment-17125257
 ] 

Huaxiang Sun commented on HBASE-24255:
--

My comments regarding with merge is wrong (merge is done manually), normalizer 
can merge regions as well. 

At this moment, no effort is being made regarding with this jira. [~timoha], we 
are going to resolve it with "cannot reproduce". If this pops again, we can 
reopen this Jira with more concrete steps/logs. Please speak out if you have 
different opinion, thanks.

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-29 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095968#comment-17095968
 ] 

Huaxiang Sun commented on HBASE-24255:
--

Thanks [~timoha]. GCRegionProcedure is only scheduled for merge and split cases 
by going the codebase. If there is no merge (which is done manually), then it 
could be for the split case, need to go through the split code path carefully 
to check if there is corner cases.

Patch for HBASE-24273 is almost ready, I am going to put it up to github and 
will ask you to review.

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-29 Thread Andrey Elenskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095956#comment-17095956
 ] 

Andrey Elenskiy commented on HBASE-24255:
-

[~huaxiangsun] yes, you got the idea right.

> but somehow the merge*** qualifers were not cleaned up from new merged child 
> region in meta table (maybe master crashed before 
> GCMultipleMergedRegionsProcedure is started)

That's due to HBASE-24273 actually, addMissingRegionsInMeta will read those 
"orphans" without checking that merge qualifier exists. I think fixing 
HBASE-24273 will resolve this particular instance.

But I'm still wondering if there are other situations where GCRegionProcedure 
should also make sure that region is unassigned from regionserver and it would 
be more geneirc as I've seen it happen even without region merges (I don't 
recall the case anymore).

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-29 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095598#comment-17095598
 ] 

Huaxiang Sun commented on HBASE-24255:
--

{quote}

 I think GCMultipleMergedRegionsProcedure no need to check parent region 
whether on online, for it should not be .And yes addMissingRegionsInMeta maybe 
has some bugs to reassign the parent merge/split region

 \{quote}

Agree, [~niuyulin], MissingRegionsInMeta should not reassign the parent 
merge/split region as the new child/children regions are already there. 
Reassign will cause another issues such as region overlap. 

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-29 Thread niuyulin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095442#comment-17095442
 ] 

niuyulin commented on HBASE-24255:
--

if region closed , recover from cluster meltdown (ServerCrashProcedure) will 
not reassign this region, no matter split/merge/normal region

[~huaxiangsun] I think GCMultipleMergedRegionsProcedure no need to check parent 
region whether on online, for it should not be

and yes addMissingRegionsInMeta maybe has some bugs to reassign the parent 
merge/split region

 

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-28 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095136#comment-17095136
 ] 

Huaxiang Sun commented on HBASE-24255:
--

Thanks [~timoha] for explaining. I was excited and jumped too quick to a 
conclusion, sorry for the noise. Here is my understanding what happened, please 
correct me if it is wrong.
 # parent regions were already merged, but somehow the merge*** qualifers were 
not cleaned up from new merged child region in meta table (maybe master crashed 
before GCMultipleMergedRegionsProcedure is started).
 # Hbck2's addMissingRegionsInMeta onlined parent regions and they got 
assigned/opened region servers.
 # Catalog Janitor's cleanMergeRegion() kicks off 
GCMultipleMergedRegionsProcedure, which assumes that parent regions are already 
closed and deletes entries from meta table/archive regions in fs.

      IMO, at step 2, addMissingRegionsInMeta, it needs to be check if a region 
is a merged parent region, if it is, it aborts the operation.

    At step 3, inside GCMultipleMergedRegionsProcedure, it also needs to do 
sanity check to make sure parent regions are not online (a bit ugly). 

 

 

 

 

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-28 Thread Andrey Elenskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094943#comment-17094943
 ] 

Andrey Elenskiy commented on HBASE-24255:
-

I don't really see how that addresses the issue in description. The problem is 
I was trying to describe can happen if I were to run HBCK2's 
addMissingRegionsInMeta which ends up readding parents of merged region into 
meta and assigns it to a RegionServer. Then, when GCRegionProcedure runs, it 
removes the region from hbase:meta and FS, but doesn't unassign the region from 
regionsserver. Hence, I'd like to see that GCRegionProcedure actually makes 
sure that the region is not assigned on any regionserver (leading to "Orphan 
Regions on RegionServer").

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24255) GCRegionProcedure doesn't assign region from RegionServer leading to orphans

2020-04-28 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094936#comment-17094936
 ] 

Huaxiang Sun commented on HBASE-24255:
--

Just came cross this one. Could it be addressed in 
[https://github.com/apache/hbase/pull/1584/commits/f5e00a23ddb3d60e76aa3316af8887fad4859f7e]
 ?
{code:java}
 --- a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
@@ -1846,6 +1846,16 @@ public class MetaTableAccessor {
       qualifiers.add(qualifier);
       delete.addColumns(getCatalogFamily(), qualifier, 
HConstants.LATEST_TIMESTAMP);
     }
+
+    // There will be race condition that a GCMultipleMergedRegionsProcedure is 
scheduled while
+    // the previous GCMultipleMergedRegionsProcedure is still going on, in 
this case, the second
+    // GCMultipleMergedRegionsProcedure could delete the merged region by 
accident!
+    if (qualifiers.isEmpty()) {
+      LOG.info("No merged qualifiers for region " + 
mergeRegion.getRegionNameAsString() +
+        " in meta table, they are cleaned up already, Skip.");
+      return;
+    }
+{code}

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> 
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment, regionserver
>Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
>Reporter: Andrey Elenskiy
>Assignee: niuyulin
>Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)