[jira] [Updated] (HBASE-4579) CST.requestCompaction semantics changed, logs are now spammed when too many store files

2011-10-14 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4579:
--

Attachment: HBASE-4579-v2.patch

Ted pointed out offline that setting the priority higher is actually putting it 
farther in the queue, and upon inspection of the code it seems like the 
blocking stuff is already taken care of.

I must say, the compaction priority code isn't easy to grok.

 CST.requestCompaction semantics changed, logs are now spammed when too many 
 store files
 ---

 Key: HBASE-4579
 URL: https://issues.apache.org/jira/browse/HBASE-4579
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4579-v2.patch, HBASE-4579.patch


 Another bug I'm not so sure what's going on. I see this in my log:
 {quote}
 2011-10-12 00:23:43,435 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:44,335 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:45,236 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:46,136 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:47,036 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:47,936 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 {quote}
 It spams for a while, and a little later instead I get:
 {quote}
 2011-10-12 00:26:52,139 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:53,040 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:53,940 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:54,840 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:55,741 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:56,641 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 {quote}
 I believe I also saw something like that for flushes, but the region was 
 closing so at least I know why it was spamming (would be nice if it just 
 unrequested the flush):
 {quote}
 2011-10-12 00:26:40,693 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,733 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,733 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,921 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,922 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 

[jira] [Updated] (HBASE-4579) CST.requestCompaction semantics changed, logs are now spammed when too many store files

2011-10-13 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4579:
--

Summary: CST.requestCompaction semantics changed, logs are now spammed when 
too many store files  (was: Compactions that can't run are requested forever)

I spent some time debugging this and the debug I added really helped:

{quote}
2011-10-13 22:04:16,633 DEBUG 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
requested: 
regionName=TestTable,0028803615,1318381132063.1724203bae0dda87cddfaac9700b9dc6.,
 storeName=info, fileCount=4, fileSize=219.6m (54.8m, 55.0m, 55.0m, 54.8m), 
priority=-1, time=6062002855079899; Because: regionserver62023.cacheFlusher; 
compaction_queue=(1:0), split_queue=0
2011-10-13 22:04:21,999 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Flush requested on 
TestTable,0028803615,1318381132063.1724203bae0dda87cddfaac9700b9dc6.
2011-10-13 22:04:21,999 WARN 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
TestTable,0028803615,1318381132063.1724203bae0dda87cddfaac9700b9dc6. has too 
many store files; delaying flush up to 9ms
2011-10-13 22:04:21,999 DEBUG org.apache.hadoop.hbase.regionserver.Store: info: 
no store files to compact 1724203bae0dda87cddfaac9700b9dc6
2011-10-13 22:04:22,000 ERROR org.apache.hadoop.hbase.regionserver.Store:
java.lang.Exception
at 
org.apache.hadoop.hbase.regionserver.Store.compactSelection(Store.java:1007)
at 
org.apache.hadoop.hbase.regionserver.Store.requestCompaction(Store.java:908)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:204)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:177)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:360)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:239)
...
2011-10-13 22:05:52,584 INFO 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90585ms on a 
compaction to clean up 'too many store files'; waited long enough... proceeding 
with flush of 
TestTable,0028803615,1318381132063.1724203bae0dda87cddfaac9700b9dc6.
{quote}

The exception is there just to show me where the call was coming from.

So what we see is that compaction is grabbing all the store files and during 
that time the MemStoreFlusher is trying to flush but hits the too many store 
files issue. It used to be that requesting a compaction was just putting 
something on a queue silently, but now it's actually going all the way down 
into Store and that's why it's now spammy.

Fixes I think are required:

 - For debugging purposes, show the encoded name when printing the info: no 
store files to compact message because it's pretty much useless if you don't 
know what it belongs to.
 - Require the compaction only once, the code already knows when it's the first 
time you spin so no need to ask it again and again.
 - Set a higher priority for the compaction

 CST.requestCompaction semantics changed, logs are now spammed when too many 
 store files
 ---

 Key: HBASE-4579
 URL: https://issues.apache.org/jira/browse/HBASE-4579
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0


 Another bug I'm not so sure what's going on. I see this in my log:
 {quote}
 2011-10-12 00:23:43,435 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:44,335 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:45,236 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:46,136 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:47,036 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:47,936 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 {quote}
 It spams for a while, and a little later instead I get:
 {quote}
 2011-10-12 00:26:52,139 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:53,040 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:53,940 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:54,840 DEBUG 

[jira] [Updated] (HBASE-4579) CST.requestCompaction semantics changed, logs are now spammed when too many store files

2011-10-13 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-4579:
--

Attachment: HBASE-4579.patch

Does the 3 things I listed above, tested on a cluster and verified it stopped 
spamming.

 CST.requestCompaction semantics changed, logs are now spammed when too many 
 store files
 ---

 Key: HBASE-4579
 URL: https://issues.apache.org/jira/browse/HBASE-4579
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4579.patch


 Another bug I'm not so sure what's going on. I see this in my log:
 {quote}
 2011-10-12 00:23:43,435 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:44,335 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:45,236 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:46,136 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:47,036 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 2011-10-12 00:23:47,936 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 info: no store files to compact
 {quote}
 It spams for a while, and a little later instead I get:
 {quote}
 2011-10-12 00:26:52,139 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:53,040 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:53,940 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:54,840 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:55,741 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 2011-10-12 00:26:56,641 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Skipped compaction of info.  Only 2 file(s) of size 176.4m have met 
 compaction criteria.
 {quote}
 I believe I also saw something like that for flushes, but the region was 
 closing so at least I know why it was spamming (would be nice if it just 
 unrequested the flush):
 {quote}
 2011-10-12 00:26:40,693 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,733 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,733 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,873 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,921 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5.
 2011-10-12 00:26:40,922 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 TestTable,0038168581,1318378894213.2beb8a1e29382a8d3e90a88b9662e5f5., 
 flushing=false, writesEnabled=false
 2011-10-12 00:26:40,923 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Flush requested on