[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2023-03-06 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697104#comment-17697104
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2.5
[build #313 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/313/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/313/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/313/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/313/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/313/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2023-03-06 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697077#comment-17697077
 ] 

Hudson commented on HBASE-25709:


Results for branch master
[build #788 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/788/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/788/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/788/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/788/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2023-03-06 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697057#comment-17697057
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2
[build #760 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/760/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/760/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/760/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/760/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/760/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2023-03-06 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696783#comment-17696783
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2.5
[build #312 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/312/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/312/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/312/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/312/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/312/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/312//console].


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-14 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554354#comment-17554354
 ] 

Xiaolin Ha commented on HBASE-25709:


Thank, [~vjasani] . It's not so urgent, and it's worth taking some time to 
figure this out. I'll update the PR, and hope you can review at your 
convenience.

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 2.4.11, 3.0.0-alpha-4
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-14 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554336#comment-17554336
 ] 

Viraj Jasani commented on HBASE-25709:
--

[~Xiaolin Ha] Thanks for providing further resolution. I am quite occupied this 
week, if still not reviewed, let me take a look next week. Thanks!

[~bbeaudreault] At high level, we can say that if the rows are quite large, and 
if the row also has delete markers as well, they are also returned by the scan. 
The patch I added in my previous comment would help understand at low level but 
that patch is applicable on the test that is now reverted with [this 
commit|https://github.com/apache/hbase/commit/5e34cdf1ef914b7c5d60df0edebd2f32ba543d02].
 Basically the repro can be done by reducing 
HBASE_CELLS_SCANNED_PER_HEARTBEAT_CHECK in the test.

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 2.4.11, 3.0.0-alpha-4
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553837#comment-17553837
 ] 

Hudson commented on HBASE-25709:


Results for branch master
[build #611 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/611/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/611/General_20Nightly_20Build_20Report/]






(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/611/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/611/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553776#comment-17553776
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2.5
[build #142 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/142/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/142/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/142/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/142/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/142/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-13 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553539#comment-17553539
 ] 

Xiaolin Ha commented on HBASE-25709:


Thanks for the excellent digging, [~vjasani] .

The problem here is that, after returned by the heartbeat cells, the matcher 
was unexpectedly reset,
{code:java}
// If no limits exists in the scope LimitScope.Between_Cells then we are sure 
we are changing
// rows. Else it is possible we are still traversing the same row so we must 
perform the row
// comparison.
if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
matcher.currentRow() == null) {
  this.countPerRow = 0;
  matcher.setToNewRow(cell);
} {code}
then all the deletes of the row is cleared, and the next same row cells will be 
wrongly matched.

As in the test case you provided, when 
StoreScanner.HBASE_CELLS_SCANNED_PER_HEARTBEAT_CHECK=2 and mid-results are 
[q4,q5], and the heartbeat count reached after [q8/delete], then the [q8/put] 
will be wrongly added to results because matcher does not has [q8/delete].

I think for this issue, we can use the HRegion#checkInterrupt to avoid the 
stuck when closing region. Just as codes here,
{code:java}
         // when reaching the heartbeat cells, try to return from the loop.
         if (kvsScanned % cellsPerHeartbeatCheck == 0) {
-          return 
scannerContext.setScannerState(NextState.MORE_VALUES).hasMoreValues();
+          this.store.getHRegion().checkInterrupt();
         } {code}
But we can also fix the matcher issue, because I think the matcher reset show 
only happens when scanner reached new rows. And the early return when reached 
heartbeat cells can make user scanners return as soon as possible when 
encounters mass deletion. The fix codes are,
{code:java}
@@ -276,9 +278,12 @@ public abstract class ScanQueryMatcher implements 
ShipperListener {
    * @param currentRow
    */
   public void setToNewRow(Cell currentRow) {
-    this.currentRow = currentRow;
-    columns.reset();
-    reset();
+    if (this.currentRow == null
+      || this.rowComparator.compareRows(currentRow, this.currentRow) != 0) {
+      this.currentRow = currentRow;
+      columns.reset();
+      reset();
+    }
   } {code}
I prefer the second solution, what do you think? [~vjasani] [~apurtell] 

 

 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553388#comment-17553388
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2.4
[build #370 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/370/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/370/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/370/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/370/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/370/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553387#comment-17553387
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2
[build #567 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/567/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/567/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/567/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/567/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/567/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-12 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553317#comment-17553317
 ] 

Andrew Kyle Purtell commented on HBASE-25709:
-

Please don't remove the 2.4.11 fix version. 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-12 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553316#comment-17553316
 ] 

Andrew Kyle Purtell commented on HBASE-25709:
-

Phoenix saw it on PHOENIX-6702 . This impacts us as [~vjasani] has qualified it 
as a real problem blocking our builds that incorporate versions of HBase 
carrying this commit. It has been reverted by HBASE-27108. Let me reopen this 
issue for any potential re-do.

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-12 Thread Bryan Beaudreault (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553314#comment-17553314
 ] 

Bryan Beaudreault commented on HBASE-25709:
---

Can anyone comment how this bug might manifest in production? it's not super 
clear what "breaks large row results" means. We have this bug in prod and 
trying to determine impact.

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-10 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552954#comment-17552954
 ] 

Andrew Kyle Purtell commented on HBASE-25709:
-

[~vjasani] If there is a provable break here we can revert this in branch-2.4 
and branch-2.5 and leave it in branch-2 and master for further improvement. 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-10 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552935#comment-17552935
 ] 

Viraj Jasani commented on HBASE-25709:
--

Thanks [~Xiaolin Ha].

I did some more digging and I do see discrepancies in Scan results w.r.t cells 
scanned interval per heartbeat check.

 

If you apply this patch, the test passes whereas it should have failed. I am 
deleting the CQ after first Scan and before incrementing time:
{code:java}
diff --git 
a/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
 
b/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
index d41916ae3b..6736e12ad9 100644
--- 
a/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
+++ 
b/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
@@ -6955,6 +6955,7 @@ public class TestHRegion {
 
     // A query at time T+0 should return all cells
     checkScan(8);
+    region.delete(new Delete(row).addColumn(fam1, q8));
 
     // Increment time to T+ttlSecs seconds
     edge.incrementTime(ttlSecs * 1000); {code}
The expectation with this patch is that it should fail at *_checkScan(3)_* 
because it should have returned only 2 results instead of 3.

 

However, if I apply this patch i.e. increment cells scanned per heartbeat 
check, the test fails as expected:
{code:java}
diff --git 
a/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
 
b/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
index d41916ae3b..a52eaecc48 100644
--- 
a/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
+++ 
b/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
@@ -6928,7 +6928,7 @@ public class TestHRegion {
     Configuration conf = new Configuration(TEST_UTIL.getConfiguration());
     conf.setInt(HFile.FORMAT_VERSION_KEY, HFile.MIN_FORMAT_VERSION_WITH_TAGS);
     // using small heart beat cells
-    conf.setLong(StoreScanner.HBASE_CELLS_SCANNED_PER_HEARTBEAT_CHECK, 2);
+    conf.setLong(StoreScanner.HBASE_CELLS_SCANNED_PER_HEARTBEAT_CHECK, 2);
 
     region = HBaseTestingUtility.createRegionAndWAL(
       RegionInfoBuilder.newBuilder(tableDescriptor.getTableName()).build(),
@@ -6955,6 +6955,7 @@ public class TestHRegion {
 
     // A query at time T+0 should return all cells
     checkScan(8);
+    region.delete(new Delete(row).addColumn(fam1, q8));
 
     // Increment time to T+ttlSecs seconds
     edge.incrementTime(ttlSecs * 1000); {code}
Test failure logs:
{code:java}
java.lang.AssertionError: 
Expected :3
Actual   :2

    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.failNotEquals(Assert.java:835)
    at org.junit.Assert.assertEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:633)
    at 
org.apache.hadoop.hbase.regionserver.TestHRegion.checkScan(TestHRegion.java:6972)
    at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testTTLsUsingSmallHeartBeatCells(TestHRegion.java:6962)
 {code}
This is exactly the expected behaviour, which occurs only if we increment cells 
scanned per heartbeat check to very high number.

We should fix this.

 

FYI [~apurtell] for upcoming 2.5 and 2.4 releases.

 

FYI [~kadir] [~gjacoby] [~tkhurana] 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-09 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552539#comment-17552539
 ] 

Xiaolin Ha commented on HBASE-25709:


Hi, [~vjasani] , I looked at PHOENIX-6702 and your patch, do you doubt that 
this issue breaks large rows results? I added a UT 
TestHRegion#testTTLsUsingSmallHeartBeatCells to test verify the scan can return 
the whole row when row cells count is larger than 
StoreScanner.HBASE_CELLS_SCANNED_PER_HEARTBEAT_CHECK, please take a look. 
Thanks.

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552495#comment-17552495
 ] 

Viraj Jasani commented on HBASE-25709:
--

FYI, we see one regression in Phoenix indexing test (PHOENIX-6702) after this 
commit.

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-03-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505344#comment-17505344
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2.5
[build #61 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/61/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/61/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/61/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/61/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/61/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/61//console].


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-03-08 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502877#comment-17502877
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2
[build #477 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/477/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/477/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/477/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/477/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/477//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-03-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502776#comment-17502776
 ] 

Hudson commented on HBASE-25709:


Results for branch branch-2.4
[build #302 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/302/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/302/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/302/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/302/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/302/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2022-03-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502336#comment-17502336
 ] 

Hudson commented on HBASE-25709:


Results for branch master
[build #528 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/General_20Nightly_20Build_20Report/]








(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.10
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-14 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321036#comment-17321036
 ] 

Xiaolin Ha commented on HBASE-25709:


Yes, [~stack] , you are right. Setting it default on will make all scanners be 
aborted as soon as possible. While only compaction scanners be aborted is not 
enough, because the user-scanner also want to be preempted instead of a long 
timeout return. What's more, it will not bring correctness issues, because the 
KeyValueHeap only used the result set, the top cell on the store heap is for 
completing whole raw cells, eagerly close for store scanners and some other 
checks...

I have updated the PR, set default on, and add UTs to check both the semantic 
and the data correctness.

Would you mind reviewing the patch again? 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-10 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318654#comment-17318654
 ] 

Xiaolin Ha commented on HBASE-25709:


Yes, I'll also turn on this for user scanners to avoid unexpected long queries. 
Thanks, [~stack]. I got it.

Seems that we can check time limit for scanners when the cell should SKIP and 
before looping in next()?  

Codes are as follows,
{code:java}
case SKIP:
  if (scannerContext.checkTimeLimit(LimitScope.BETWEEN_CELLS)) {
return 
scannerContext.setScannerState(NextState.TIME_LIMIT_REACHED).hasMoreValues();
  }
  this.heap.next();
  break;
{code}
As a result,  we can use the `hbase.hstore.close.check.time.interval` as the 
time limit for compaction scanners, no need to add this 
`preventLoopReadEnabled` variable to ScannerContext? 

 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318085#comment-17318085
 ] 

Michael Stack commented on HBASE-25709:
---

Thank you [~Xiaolin Ha].  Would it help if we could distinguish compacting 
scanners from user-facing instances? A compacting scanner can be aborted on 
close but a user-scanner not? Will you turn on this feature even though it has 
the correctness issues you note above?

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-09 Thread Xiaolin Ha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317952#comment-17317952
 ] 

Xiaolin Ha commented on HBASE-25709:


Hi, [~stack], thanks for reviewing this issue.

The StoreScanner is shared by compaction and user scanners, I set default off 
to make it be compatible with the original logic. 

I thought carefully about your suggestion to set it default on, there may be 
some correctness issues.

Because for user scanners, matchers return SKIP will make the heap loop in 
polling cells until heap is empty or the top cell matches the scanner rules. 

If we set this default on, the method will return if it has get per heart beat 
cells, though the top cell of the heap may be invalid. Then outer scanners will 
peek incorrect data(Maybe not, because there are still filters before return 
the result).Such as in KeyValueHeap#next(List result, ScannerContext 
scannerContext), it just adds the top cell after the StoreScanner#next returns. 
But in user scanner context, the scanner will return util reach the limit it 
sets. As a result,  returning prematurely for user scanners may be unexpected.

 

 

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-08 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317366#comment-17317366
 ] 

Michael Stack commented on HBASE-25709:
---

Patch looks good. Defaults to off. Why would we not just have this flag enabled 
always [~Xiaolin Ha]? If a Region has been asked close, compactions should be 
preempted and put aside until we open in new location? Close should preempt 
everything I'd suggest except an ongoing user read?

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)