[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-05-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268866#comment-13268866
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.92-security #106 (See 
[https://builds.apache.org/job/HBase-0.92-security/106/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331684)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-05-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266126#comment-13266126
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94-security #25 (See 
[https://builds.apache.org/job/HBase-0.94-security/25/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size - v2 (Jieshan) (Revision 1332344)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-05-01 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266313#comment-13266313
 ] 

Jieshan Bean commented on HBASE-5611:
-

Thank you very much, Ted.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264740#comment-13264740
 ] 

Jieshan Bean commented on HBASE-5611:
-

TestHeapSize failure on a 32-bit machine in 94 is caused by HBASE-5900. 

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264859#comment-13264859
 ] 

Jieshan Bean commented on HBASE-5611:
-

The new version patch for 94 will be uploaded after HBASE-5900 get fixed.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265127#comment-13265127
 ] 

Zhihong Yu commented on HBASE-5611:
---

Integrated 5611-94-v2.txt to 0.94 branch.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265383#comment-13265383
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #161 (See 
[https://builds.apache.org/job/HBase-0.94/161/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size - v2 (Jieshan) (Revision 1332344)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264228#comment-13264228
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-TRUNK #2823 (See 
[https://builds.apache.org/job/HBase-TRUNK/2823/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331681)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264238#comment-13264238
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-TRUNK-security #187 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/187/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331681)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264242#comment-13264242
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.92 #392 (See 
[https://builds.apache.org/job/HBase-0.92/392/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331684)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264340#comment-13264340
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #155 (See 
[https://builds.apache.org/job/HBase-0.94/155/])
HBASE-5611 Addendum fixes test failure in TestHeapSize#testSizes (Revision 
1331769)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264414#comment-13264414
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #158 (See 
[https://builds.apache.org/job/HBase-0.94/158/])
HBASE-5611 Revert due to consistent TestChangingEncoding failure in 0.94 
branch (Revision 1331826)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264420#comment-13264420
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94-security #24 (See 
[https://builds.apache.org/job/HBase-0.94-security/24/])
HBASE-5611 Revert due to consistent TestChangingEncoding failure in 0.94 
branch (Revision 1331826)
HBASE-5611 Addendum fixes test failure in TestHeapSize#testSizes (Revision 
1331769)
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331683)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java

tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264471#comment-13264471
 ] 

Jieshan Bean commented on HBASE-5611:
-

Thank you, Ted.
I will check that failure in 94 version.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263904#comment-13263904
 ] 

Ted Yu commented on HBASE-5611:
---

I ran 0.92 test suite over 0.92 patch and didn't find regression.

Will integrate tomorrow if there is no objection.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264210#comment-13264210
 ] 

Ted Yu commented on HBASE-5611:
---

Integrated to 0.92, 0.94 and trunk.

Thanks for the patch, Jieshan.

Thanks for the review, Lars.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264225#comment-13264225
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #153 (See 
[https://builds.apache.org/job/HBase-0.94/153/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size (Jieshan) (Revision 1331683)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262648#comment-13262648
 ] 

Zhihong Yu commented on HBASE-5611:
---

@Jieshan:
When you have multiple patches for different branches, attach patch for trunk 
apart from the other patches.
Otherwise Hadoop QA may pick up the wrong patch.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262651#comment-13262651
 ] 

Jieshan Bean commented on HBASE-5611:
-

Thank you, Ted. I will take care.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262652#comment-13262652
 ] 

Zhihong Yu commented on HBASE-5611:
---

Patch v2 looks good in general.
Comment on formatting:
{code}
+   * @param regionName
+   *  region name.
{code}
The line length is 100 chars. Please put javadoc for param on the same line as 
param name.

You can wait for Hadoop QA result to come back before attaching new patches.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262656#comment-13262656
 ] 

Jieshan Bean commented on HBASE-5611:
-

I use the formatter from HBASE-3678. Every time, it format my code like that.
I will change after QA result.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262673#comment-13262673
 ] 

Hadoop QA commented on HBASE-5611:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12524416/HBASE-5611-trunk-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 4 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1656//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1656//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1656//console

This message is automatically generated.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262702#comment-13262702
 ] 

Zhihong Yu commented on HBASE-5611:
---

Tests were clear.

@Jieshan:
Please address formatting and prepare patches for each branch.

We should also run test suite for 0.90 and 0.92 once patches are available.

Good job.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263324#comment-13263324
 ] 

Hadoop QA commented on HBASE-5611:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12524808/HBASE-5611-94.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1663//console

This message is automatically generated.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263326#comment-13263326
 ] 

Jieshan Bean commented on HBASE-5611:
-

Will upload the patches for 92 and 90 today:)

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263327#comment-13263327
 ] 

Zhihong Yu commented on HBASE-5611:
---

{code}
+   * Roll back the global MemStore size when a region can't open.
{code}
The above is not accurate: we're only rolling back the replay edits size for 
specified region from global MemStore size.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1326#comment-1326
 ] 

Hadoop QA commented on HBASE-5611:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12524809/HBASE-5611-trunk-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
  org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1664//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1664//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1664//console

This message is automatically generated.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-94.patch, 
 HBASE-5611-trunk-v2-minorchange.patch, HBASE-5611-trunk-v2.patch, 
 HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263337#comment-13263337
 ] 

Jieshan Bean commented on HBASE-5611:
-

This problem seems not present in 90 branch. RegionServerAccounting is not in 
90. And the global memstore size is calculated by below methods: 
{noformat}
  public long getGlobalMemStoreSize() {
long total = 0;
for (HRegion region : onlineRegions.values()) {
  total += region.memstoreSize.get();
}
return total;
  }
{noformat}

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263336#comment-13263336
 ] 

Hadoop QA commented on HBASE-5611:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12524812/HBASE-5611-94-minorchange.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1667//console

This message is automatically generated.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263347#comment-13263347
 ] 

Hadoop QA commented on HBASE-5611:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12524810/HBASE-5611-trunk-v2-minorchange.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1666//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1666//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1666//console

This message is automatically generated.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, 
 HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-25 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261799#comment-13261799
 ] 

Zhihong Yu commented on HBASE-5611:
---

{code}
+  // global memstore size once a region opening failed.
{code}
'region opening failed' - 'region failed opening'.
{code}
+  private final ConcurrentMapHRegionInfo, AtomicLong replayEditsPerRegion = 
{code}
Do we need HRegionInfo as the key to the Map ? Can we use region name ?
For rollbackRegionReplayEditsSize():
{code}
+  addAndGetGlobalMemstoreSize(-replayEdistsSize.get());
+  clearRegionReplayEditsSize(hri);
{code}
I suggest remembering the value of -replayEdistsSize.get() in a variable so 
that we can exchange the order of the two statements above and return directly 
from the if block.
If replayEdistsSize is null, would that indicate certain race condition ?

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261816#comment-13261816
 ] 

ramkrishna.s.vasudevan commented on HBASE-5611:
---

@Lars
Want this in 0.94.0? 

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-25 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262309#comment-13262309
 ] 

Jieshan Bean commented on HBASE-5611:
-

Thank you, Ted. I will update the patch immediately. And will make the patch 
for 94 today.


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-25 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262321#comment-13262321
 ] 

Jieshan Bean commented on HBASE-5611:
-

bq.If replayEdistsSize is null, would that indicate certain race condition ?
If no edits needed to replay, it should be null. Only during region openning 
process, those new methods will be called. It's should not be a race condition.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5611-trunk.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira