subject:"\[jira\] \[Commented\] \(HBASE\-5611\) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size"


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264228#comment-13264228
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-TRUNK #2823 (See 
[https://builds.apache.org/job/HBase-TRUNK/2823/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331681)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264238#comment-13264238
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-TRUNK-security #187 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/187/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331681)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264242#comment-13264242
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.92 #392 (See 
[https://builds.apache.org/job/HBase-0.92/392/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331684)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264340#comment-13264340
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #155 (See 
[https://builds.apache.org/job/HBase-0.94/155/])
HBASE-5611 Addendum fixes test failure in TestHeapSize#testSizes (Revision 
1331769)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264414#comment-13264414
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #158 (See 
[https://builds.apache.org/job/HBase-0.94/158/])
HBASE-5611 Revert due to consistent TestChangingEncoding failure in 0.94 
branch (Revision 1331826)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264420#comment-13264420
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94-security #24 (See 
[https://builds.apache.org/job/HBase-0.94-security/24/])
HBASE-5611 Revert due to consistent TestChangingEncoding failure in 0.94 
branch (Revision 1331826)
HBASE-5611 Addendum fixes test failure in TestHeapSize#testSizes (Revision 
1331769)
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331683)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java

tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Jieshan Bean (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264471#comment-13264471
 ] 

Jieshan Bean commented on HBASE-5611:
-

Thank you, Ted.
I will check that failure in 94 version.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263904#comment-13263904
 ] 

Ted Yu commented on HBASE-5611:
---

I ran 0.92 test suite over 0.92 patch and didn't find regression.

Will integrate tomorrow if there is no objection.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-27 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264210#comment-13264210
 ] 

Ted Yu commented on HBASE-5611:
---

Integrated to 0.92, 0.94 and trunk.

Thanks for the patch, Jieshan.

Thanks for the review, Lars.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264225#comment-13264225
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #153 (See 
[https://builds.apache.org/job/HBase-0.94/153/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331683)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262648#comment-13262648
 ] 

Zhihong Yu commented on HBASE-5611:
---

@Jieshan:
When you have multiple patches for different branches, attach patch for trunk 
apart from the other patches.
Otherwise Hadoop QA may pick up the wrong patch.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262651#comment-13262651
 ] 

Jieshan Bean commented on HBASE-5611:
-

Thank you, Ted. I will take care.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262652#comment-13262652
 ] 

Zhihong Yu commented on HBASE-5611:
---

Patch v2 looks good in general.
Comment on formatting:
{code}
+   * @param regionName
+   *  region name.
{code}
The line length is 100 chars. Please put javadoc for param on the same line as 
param name.

You can wait for Hadoop QA result to come back before attaching new patches.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262656#comment-13262656
 ] 

Jieshan Bean commented on HBASE-5611:
-

I use the formatter from HBASE-3678. Every time, it format my code like that.
I will change after QA result.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

[
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262673#comment-13262673
]

Hadoop QA commented on HBASE-5611:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12524416/HBASE-5611-trunk-v2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1656//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1656//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1656//console

This message is automatically generated.

Replayed edits from regions that failed to open during recovery aren't
removed from the global MemStore size

Key: HBASE-5611
URL: https://issues.apache.org/jira/browse/HBASE-5611
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch,
HBASE-5611-trunk.patch

This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think
it's still possible to hit it if a region fails to open for more obscure
reasons like HDFS errors.
Consider a region that just went through distributed splitting and that's now
being opened by a new RS. The first thing it does is to read the recovery
files and put the edits in the {{MemStores}}. If this process takes a long
time, the master will move that region away. At that point the edits are
still accounted for in the global {{MemStore}} size but they are dropped when
the {{HRegion}} gets cleaned up. It's completely invisible until the
{{MemStoreFlusher}} needs to force flush a region and that none of them have
edits:
{noformat}
2012-03-21 00:33:39,303 DEBUG
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
because memory above low water=5.9g
2012-03-21 00:33:39,303 ERROR
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed
for entry null
java.lang.IllegalStateException
at
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
at java.lang.Thread.run(Thread.java:662)
{noformat}
The {{null}} here is a region. In my case I had so many edits in the
{{MemStore}} during recovery that I'm over the low barrier although in fact
I'm at 0. It happened yesterday and it still printing this out.
To fix this we need to be able to decrease the global {{MemStore}} size when
the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262702#comment-13262702
 ] 

Zhihong Yu commented on HBASE-5611:
---

Tests were clear.

@Jieshan:
Please address formatting and prepare patches for each branch.

We should also run test suite for 0.90 and 0.92 once patches are available.

Good job.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

[
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263324#comment-13263324
]

Hadoop QA commented on HBASE-5611:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12524808/HBASE-5611-94.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1663//console

This message is automatically generated.

Replayed edits from regions that failed to open during recovery aren't
removed from the global MemStore size

Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch,
HBASE-5611-trunk.patch

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263326#comment-13263326
 ] 

Jieshan Bean commented on HBASE-5611:
-

Will upload the patches for 92 and 90 today:)

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263327#comment-13263327
 ] 

Zhihong Yu commented on HBASE-5611:
---

{code}
+   * Roll back the global MemStore size when a region can't open.
{code}
The above is not accurate: we're only rolling back the replay edits size for 
specified region from global MemStore size.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

[
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1326#comment-1326
]

Hadoop QA commented on HBASE-5611:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12524809/HBASE-5611-trunk-v2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
org.apache.hadoop.hbase.util.TestHBaseFsck

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1664//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1664//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1664//console

This message is automatically generated.

Replayed edits from regions that failed to open during recovery aren't
removed from the global MemStore size

Attachments: HBASE-5611-94.patch,
HBASE-5611-trunk-v2-minorchange.patch, HBASE-5611-trunk-v2.patch,
HBASE-5611-trunk.patch

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size


[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263337#comment-13263337
 ] 

Jieshan Bean commented on HBASE-5611:
-

This problem seems not present in 90 branch. RegionServerAccounting is not in 
90. And the global memstore size is calculated by below methods: 
{noformat}
  public long getGlobalMemStoreSize() {
long total = 0;
for (HRegion region : onlineRegions.values()) {
  total += region.memstoreSize.get();
}
return total;
  }
{noformat}

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

[
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263336#comment-13263336
]

Hadoop QA commented on HBASE-5611:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12524812/HBASE-5611-94-minorchange.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1667//console

This message is automatically generated.

Replayed edits from regions that failed to open during recovery aren't
removed from the global MemStore size

Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch,
HBASE-5611-trunk-v2-minorchange.patch

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size