Hi The symptom reproduced again. I paste the log in http://paste2.org/D2N6ZDvk,http://paste2.org/a64LXD0X One is the regionserver jstack log. The other is regionserver log, which was grep and only include the unflush region.
Thanks -----邮件原件----- 发件人: sunweiwei [mailto:[email protected]] 发送时间: 2014年6月5日 14:51 收件人: [email protected] 主题: 答复: 答复: 答复: forcing flush not works I'm sorry but the regionserver log have been deleted... to Stack: Yes, Always the same two regions of Table BT_D_BF001_201406 always can't be flushed. Previously I have only saved a little log, When Table BT_D_BF001_201405 had lots of regions. 2014-05-27 22:40:52,025 DEBUG [regionserver60020.logRoller] regionserver.LogRoller: HLog roll requested 2014-05-27 22:40:52,039 DEBUG [regionserver60020.logRoller] wal.FSHLog: cleanupCurrentWriter waiting for transactions to get synced total 450500823 synced till here 450500779 2014-05-27 22:40:52,049 INFO [regionserver60020.logRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C1401173211108.1401201646659 with entries=94581, filesize=122.2 M; new WAL /apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C1401173211108.1401201652025 2014-05-27 22:40:52,049 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=156, maxlogs=32; forcing flush of 2 regions(s): a5b94272f0fdd477bf320e428059fe87, f1a60d3ea5820cb672832c59531de89d 2014-05-27 22:40:52,073 DEBUG [Thread-17] regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=6.1 G 2014-05-27 22:40:52,074 DEBUG [Thread-17] regionserver.MemStoreFlusher: Under global heap pressure: Region BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. has too many store files, but is 27.6 M vs best flushable region's 0. Choosing the bigger. 2014-05-27 22:40:52,075 INFO [Thread-17] regionserver.MemStoreFlusher: Flush of region BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. due to global heap pressure 2014-05-27 22:40:52,075 DEBUG [Thread-17] regionserver.HRegion: Started memstore flush for BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39., current region memstore size 27.6 M 2014-05-27 22:40:52,599 INFO [Thread-17] regionserver.DefaultStoreFlusher: Flushed, sequenceid=10069900941, memsize=27.6 M, hasBloomFilter=true, into tmp file hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 2014-05-27 22:40:52,608 DEBUG [Thread-17] regionserver.HRegionFileSystem: Committing store file hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 as hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88 2014-05-27 22:40:52,617 INFO [Thread-17] regionserver.HStore: Added hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88, entries=44962, sequenceid=10069900941, filesize=5.5 M 2014-05-27 22:40:52,618 INFO [Thread-17] regionserver.HRegion: Finished memstore flush of ~27.6 M/28933240, currentsize=43.6 K/44664 for region BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. in 542ms, sequenceid=10069900941, compaction requested=true 2014-05-27 22:40:52,618 DEBUG [Thread-17] regionserver.CompactSplitThread: Small Compaction requested: system; Because: Thread-17; compaction_queue=(4896:19152), split_queue=0, merge_queue=0 Thanks -----邮件原件----- 发件人: ramkrishna vasudevan [mailto:[email protected]] 发送时间: 2014年6月5日 13:43 收件人: [email protected] 主题: Re: 答复: 答复: forcing flush not works >>I still (highly)suspect that there is something wrong with the flush queue(some entry pushed into it can't be poll out). Ya I too have that suspect. May be any new logs may help to uncover the issue. On Thu, Jun 5, 2014 at 11:06 AM, Stack <[email protected]> wrote: > Always the same two regions that get stuck or does it vary? Another set of > example logs may help uncover the sequence of trouble-causing events. > > Thanks, > St.Ack > > > On Wed, Jun 4, 2014 at 7:31 PM, sunweiwei <[email protected]> > wrote: > > > my log is similar as HBASE-10499. > > > > Thanks > > > > -----邮件原件----- > > 发件人: [email protected] [mailto:[email protected]] 代表 Stack > > 发送时间: 2014年6月3日 23:10 > > 收件人: Hbase-User > > 主题: Re: 答复: forcing flush not works > > > > Mind posting link to your log? Sounds like HBASE-10499 as Honghua says. > > St.Ack > > > > > > On Tue, Jun 3, 2014 at 2:34 AM, sunweiwei <[email protected]> > > wrote: > > > > > Thanks. Maybe the same as HBase-10499. > > > I stop the regionserver then start it. Then hbase back to normal. > > > This is jstack log when 2 regions can not flush. > > > > > > "Thread-17" prio=10 tid=0x00007f6210383800 nid=0x6540 waiting on > > condition > > > [0x00007f61e0a26000] > > > java.lang.Thread.State: TIMED_WAITING (parking) > > > at sun.misc.Unsafe.park(Native Method) > > > - parking to wait for <0x000000041ae0e6b8> (a > > > java.util.concurrent. > > > locks.AbstractQueuedSynchronizer$ConditionObject) > > > at > > > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) > > > at > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN > > > anos(AbstractQueuedSynchronizer.java:2025) > > > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:201) > > > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:39) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemSto > > > reFlusher.java:228) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > -----邮件原件----- > > > 发件人: 冯宏华 [mailto:[email protected]] > > > 发送时间: 2014年6月3日 16:34 > > > 收件人: [email protected] > > > 主题: 答复: forcing flush not works > > > > > > The same symptom as HBase-10499? > > > > > > I still (highly)suspect that there is something wrong with the flush > > > queue(some entry pushed into it can't be poll out). > > > ________________________________________ > > > 发件人: sunweiwei [[email protected]] > > > 发送时间: 2014年6月3日 15:43 > > > 收件人: [email protected] > > > 主题: forcing flush not works > > > > > > Hi > > > > > > > > > > > > I'm using a heavy-write hbase0.96 . I find this in regionserver log: > > > > > > 2014-06-03 15:13:19,445 INFO [regionserver60020.logRoller] wal.FSHLog: > > Too > > > many hlogs: logs=33, maxlogs=32; forcing flush of 3 regions(s): > > > 1a7dda3c3815c19970ace39fd99abfe8, aff81bc46aa7d3ed51a01f11f23c8320, > > > d5666e003f598147b4dda509f173a779 > > > > > > 2014-06-03 15:13:23,869 INFO [regionserver60020.logRoller] wal.FSHLog: > > Too > > > many hlogs: logs=34, maxlogs=32; forcing flush of 2 regions(s): > > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779 > > > > > > ┇ > > > > > > ┇ > > > > > > 2014-06-03 15:18:14,778 INFO [regionserver60020.logRoller] wal.FSHLog: > > Too > > > many hlogs: logs=93, maxlogs=32; forcing flush of 2 regions(s): > > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779 > > > > > > > > > > > > > > > > > > It seems like 2 regions can’t be flushed and WALs Dir continue to > > increase > > > and Then I find this in client log: > > > > > > INFO | AsyncProcess-waitForMaximumCurrentTasks [2014-06-03 15:30:53] - > : > > > Waiting for the global number of running tasks to be equals or less > than > > 0, > > > tasksSent=15819, tasksDone=15818, currentTasksDone=15818, > > > tableName=BT_D_BF001_201406 > > > > > > > > > > > > Then write speed will become very slow. > > > > > > After I flush 2 regions manually , write speed can back to normal > only > > > a > > > short while. > > > > > > > > > > > > Any suggestion will be appreciated. Thanks. > > > > > > > > > > > > > >
