hi,stack:
Do you have any ideas?We confused for a long time. ------------------ ???????? ------------------ ??????: "wgq.wu";<[email protected]>; ????????: 2016??6??3??(??????) ????11:13 ??????: "[email protected]"<[email protected]>; ????: "????"<[email protected]>; ????: Re: Memstore blocking HI STACK: 1. The log is very large,so I pick some of it. But it seems not provide valuable info.Here is the region named qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be can??t flush. When the flush Thread works well, The log is like this: 2016-05-24 12:38:27,071 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 16681 2016-05-24 12:38:37,071 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 8684 2016-05-24 12:38:43,753 INFO [MemStoreFlusher.2] regionserver.HRegion: Started memstore flush for qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., current region memstore size 305.3 K 2016-05-24 12:38:43,753 WARN [MemStoreFlusher.2] wal.FSHLog: Couldn't find oldest seqNum for the region we are about to flush: [dd8f92e3c161a8534b30ab17c28ae8be] 2016-05-24 12:38:43,816 INFO [MemStoreFlusher.2] regionserver.DefaultStoreFlusher: Flushed, sequenceid=54259, memsize=305.3 K, hasBloomFilter=true, into tmp file hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d 2016-05-24 12:38:43,822 DEBUG [MemStoreFlusher.2] regionserver.HRegionFileSystem: Committing store file hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d as hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d 2016-05-24 12:38:43,837 INFO [MemStoreFlusher.2] regionserver.HStore: Added hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d, entries=108, sequenceid=54259, filesize=68.3 K 2016-05-24 12:38:43,837 INFO [MemStoreFlusher.2] regionserver.HRegion: Finished memstore flush of ~305.3 K/312664, currentsize=0/0 for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. in 84ms, sequenceid=54259, compaction requested=true And when the flush Thread does not work well,The log just always shows : 2016-05-25 14:57:02,588 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 18068 2016-05-25 14:57:12,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 13165 2016-05-25 14:57:20,656 DEBUG [MemStoreFlusher.36] regionserver.HRegion: NOT flushing memstore for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, writesEnabled=true 2016-05-25 14:57:22,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 5526 2016-05-25 14:57:28,113 DEBUG [MemStoreFlusher.34] regionserver.HRegion: NOT flushing memstore for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, writesEnabled=true 2016-05-25 14:57:32,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 8178 2016-05-25 14:57:40,767 DEBUG [MemStoreFlusher.9] regionserver.HRegion: NOT flushing memstore for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, writesEnabled=true 2016-05-25 14:57:42,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 22068 2016-05-25 14:57:52,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 5492 2016-05-25 14:58:02,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 10472 2016-05-25 14:58:04,655 DEBUG [MemStoreFlusher.23] regionserver.HRegion: NOT flushing memstore for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, writesEnabled=true 2016-05-25 14:58:12,587 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 8435 I check the log before the region can??t flush, but find nothing special from usual. by the way, this table has a heavy load on writing, so I turn off the WAL. 2. It is the disk is filling faster than on other machines. Not the instat show. 3.this is the jstack log of the memstoreflusher: "MemStoreFlusher.14" prio=10 tid=0x00007fe69487d800 nid=0x387d waiting on condition [0x00007fce7118c000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fd09e031470> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) at java.lang.Thread.run(Thread.java:744) "MemStoreFlusher.13" prio=10 tid=0x00007fe69487b800 nid=0x387c runnable [0x00007fce7128c000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232) - locked <0x00007fcfe3ba1190> (a org.apache.hadoop.io.compress.snappy.SnappyCompressor) at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70) - locked <0x00007fd8d1778be8> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253) at java.lang.Thread.run(Thread.java:744) "MemStoreFlusher.12" prio=10 tid=0x00007fe694879800 nid=0x387b waiting on condition [0x00007fce7138e000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fd09e031470> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) at java.lang.Thread.run(Thread.java:744) ?? 2016??6??3????????12:08??Stack <[email protected]<mailto:[email protected]>> ?????? On Wed, Jun 1, 2016 at 3:03 AM, ??????wgq <[email protected]<mailto:[email protected]>> wrote: hi all: 1.Is region always on same machine or do you see this phenomenon on more than one machine? Not always on the same machine, but always on the machine which hold the first region of a table(the only table that its first region cannot flush,when restart the regionserver, the first region would move to another machine) 2.The RS is ABORTED? Because it can't flush? Is that what it says in the log? Can we see the log message around the ABORT? sorry??I did not express clear here. It is the MemStore of the first region can??t flush, not the RS. The RS Log is like this: INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a delay of 9452 DEBUG [MemStoreFlusher.39] regionserver.HRegion memstore for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, writesEnabled=true And the web UI shows: Aborted flushing .(Not flushing since already flushing.) But the Flusher thread never finish. Can you back up in the log for this region? It looks like the first flush attempt failed or never completed. Perhaps there is a clue in the log message from the first attempt as to why the failure. The above log seems to be from a later stage, when we have gotten into the unhealthy state where we can't flush the first region memstore. 3.100% disk only? The CPU does not go up too? Can we see a thread dump? Do jstack -l PID if you can Only the disk usage(command : df -h) increases faster than other machine, not the IO usage. The usage of CPU is very low. Sorry. Your disk is filling faster than on other machines? Or the iostat shows the disk is doing more seeks than on other machines? 4.Any other special configurations going on on this install? Phoenix or use of Coprocessors? NO, no phoenix. Only AccessController coprocessor. 5.If you thread dump a few times, it is always stuck here? Yes, always stuck here. here is the jstack log.(in this log, it is the MemStoreFlusher.13 can ??t flush) PS: As I see. I think this is because the first region cannot flush cause the problem. But I do not know why it can??t flush and why just the first region of the only table has this problem. Please include pointer to the stack trace. You can't have attachments on mail to this list. Thank you, St.Ack ?? 2016??6??1????????3:10??Stack <[email protected]<mailto:[email protected]>> ?????? On Mon, May 30, 2016 at 7:03 PM, ???? <[email protected]<mailto:[email protected]>> wrote: HI ALL: Recently,I met a strange problem, the first Region??s Memstore of one table (the only one) often blocked when flushing.(Both Version: hbase-0.98.6-cdh5.2.0 and 1.0.1.1, I updated 0.98 to 1.0.1.1,hope to solve the problem,But failed) Is region always on same machine or do you see this phenomenon on more than one machine? On the web UI, I can see the status shows: ABORTED(since XXsec ago), Not flushing since already flushing. The RS is ABORTED? Because it can't flush? Is that what it says in the log? Can we see the log message around the ABORT? But it will never flush success, and the usage of the disk will increase very high.Now other regionservers just use 30% of the disk capacity, the problematic region server will increase to 100%,unless killing the process. 100% disk only? The CPU does not go up too? Can we see a thread dump? Do jstack -l PID if you can. What??s more, the region server process cannot be shutdown normally,every time I have to use the KILL -9 command. I check the log,the reason why cannot flush is one of the MemstoreFlusher cannot exiting. The log is like blow: 2016-05-29 19:54:11,982 INFO [MemStoreFlusher.13] regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting 2016-05-29 19:54:13,016 INFO [MemStoreFlusher.6] regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting 2016-05-29 19:54:13,260 INFO [MemStoreFlusher.16] regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting 2016-05-29 19:54:16,032 INFO [MemStoreFlusher.33] regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting 2016-05-29 19:54:16,341 INFO [MemStoreFlusher.25] regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.31] regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.29] regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.23] regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.32] regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.38] regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.10] regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.7] regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.12] regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.21] regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.37] regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.24] regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting 2016-05-29 19:54:16,806 INFO [MemStoreFlusher.39] regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting 2016-05-29 19:54:16,908 INFO [MemStoreFlusher.17] regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.3] regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.27] regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.18] regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.4] regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.19] regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.22] regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.9] regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting 2016-05-29 19:54:17,137 INFO [MemStoreFlusher.36] regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting 2016-05-29 19:54:17,263 INFO [MemStoreFlusher.11] regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.14] regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.2] regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.26] regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.0] regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.30] regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.28] regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting here is the jstack log when I try to shutdown the region server process normally: we can see the MemstoreFlusher.34 meet some problems, it can not exiting normally (By the way, we can see ??locked at snappy??, but it not always locked at snappy,sometime it will locked at other place,so I think snappy is ok, Here I just want to show the MemStoreFlusher meet some problem). Any other special configurations going on on this install? Phoenix or use of Coprocessors? "MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable [0x00007f08941c3000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232) - locked <0x00007f0a5f85c430> (a org.apache.hadoop.io.compress.snappy.SnappyCompressor) at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70) - locked <0x00007f09fdcccda0> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253) at java.lang.Thread.run(Thread.java:744) If you thread dump a few times, it is always stuck here? Thanks, St.Ack "MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on condition [0x00007f08942c5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007f09b8e20f80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) at java.lang.Thread.run(Thread.java:744) I do not know why the memstore cannot flush. Is this a bug? ???????????????????? ??????: quan <[email protected]<mailto:[email protected]>> ????: memstore flush blocked ????: 2016??5??30?? GMT+8 ????10:47:43 ??????: [email protected]<mailto:[email protected]> ????: ???? <[email protected]<mailto:[email protected]>> HI ALL: Recently,I met a strange problem, the first Region??s Memstore of one table (the only one) often blocked when flushing.(Both Version: hbase-0.98.6-cdh5.2.0 and 1.0.1.1, I updated 0.98 to 1.0.1.1,hope to solve the problem,But failed) On the web UI, I can see the status shows: ABORTED(since XXsec ago), Not flushing since already flushing. But it will never flush success, and the usage of the disk will increase very high.Now other regionservers just use 30% of the disk capacity, the problematic region server will increase to 100%,unless killing the process. What??s more, the region server process cannot be shutdown normally,every time I have to use the KILL -9 command. I check the log,the reason why cannot flush is one of the MemstoreFlusher cannot exiting. The log is like blow: 2016-05-29 19:54:11,982 INFO [MemStoreFlusher.13] regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting 2016-05-29 19:54:13,016 INFO [MemStoreFlusher.6] regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting 2016-05-29 19:54:13,260 INFO [MemStoreFlusher.16] regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting 2016-05-29 19:54:16,032 INFO [MemStoreFlusher.33] regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting 2016-05-29 19:54:16,341 INFO [MemStoreFlusher.25] regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.31] regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.29] regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.23] regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.32] regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.38] regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.10] regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.7] regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.12] regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.21] regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.37] regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.24] regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting 2016-05-29 19:54:16,806 INFO [MemStoreFlusher.39] regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting 2016-05-29 19:54:16,908 INFO [MemStoreFlusher.17] regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.3] regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.27] regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.18] regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.4] regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.19] regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.22] regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.9] regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting 2016-05-29 19:54:17,137 INFO [MemStoreFlusher.36] regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting 2016-05-29 19:54:17,263 INFO [MemStoreFlusher.11] regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.14] regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.2] regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.26] regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.0] regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.30] regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.28] regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting here is the jstack log when I try to shutdown the region server process normally: we can see the MemstoreFlusher.34 meet some problems, it can not exiting normally (By the way, we can see ??locked at snappy??, but it not always locked at snappy,sometime it will locked at other place,so I think snappy is ok, Here I just want to show the MemStoreFlusher meet some problem). "MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable [0x00007f08941c3000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232) - locked <0x00007f0a5f85c430> (a org.apache.hadoop.io.compress.snappy.SnappyCompressor) at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70) - locked <0x00007f09fdcccda0> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253) at java.lang.Thread.run(Thread.java:744) "MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on condition [0x00007f08942c5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007f09b8e20f80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) at java.lang.Thread.run(Thread.java:744) I do not know why the memstore cannot flush. Is this a bug? ??????????????????QUNAR?????????????????????????????????????????????????????????????????????? secteam???????? ??????????????????QUNAR?????????????????????????????????????????????????????????????????????? secteam????????
